de.NBI Logo

SILVA FAQ

What does the database fields in the SILVA databases mean and how are they related to EMBL?

How to select and download sequences?

Before you are able to download your personal set of sequences, you have to select sequences or groups with the browser or the search functionality of SILVA.

How to select sequences in the Browser?

  • Click on Opens internal link in current windowBrowser
  • Navigate to the group (subnode) or sequences you are interested in by clicking on the names
  • Mouse-over shows more information e.g. how much sequences are present in a certain group (subnode)
  • To add a complete group (subnode) to the Cart click on
  • To remove a complete group (subnode) click on  (group completely in cart) or on  (group partially in cart)
  • Groups included in the Cart are shown in boldThe percentages shown in brackets indicate to which extent the group is included in the Cart.
  • To add a single sequence to your list works in the same way. Single sequences are indicated by a
  • All selected groups and sequences are shown in the List field below the browser
  • To generate an ARB or FASTA file for downloading click on

How to search for sequences?

With the  Opens internal link in current windowSearch functionalities you are able to perform complex queries by adding constraints to your query or by combining results from several queries (see below).

  • Click on Search
  • Type in your keyword(s) or value(s) in the "Search for" field(s) (wildcards will be added automatically).
  • Have a look at the Search Tutorial to get an overview about additional search functionalities.

Example: You would like to get all Gammaproteobacteria with a minimal length of 1400 bases and an alignment quality better than 90.

  • Search for 
    • Gammaproteobacteria in taxonomy
    • >=1400 in sequence length
    • >90 in alignment quality
  • Press return or click on Search
  • A list with search results will be shown
  • The list can be sorted by clicking on the blue headers of the respective columns (like Organism name)
  • Select your sequences of interest by ticking the box in front of each sequence ( will be shown) or add all sequences to the Cart by clicking on   Add found sequences to cart.

Tricks: Using < and > allows to search for sequences obove or below a certain value (length or quality). To get all sequences from a specific publication you can use the DOI or Pubmed ID in the field "publication"; try 9572969 as an example. Remember: Complex queries might take some time - please be patient.

  • To generate an ARB or FASTA file for downloading click onDownload
  • Select the output format and archive type
  • Click on Start Export

How to combine search queries?

You use the cart to combine queries. 

Example: You would like to get all Alphaproteobacteria or Betaproteobacteria with a minimal length of 1400 bases.

  • Search for 
    • Alphaproteobacteria in taxonomy
    • >=1400 in sequence length
  • Click on  Add found sequences to cart 
  • Search for 
    • Betaproteobacteria in taxonomy
    • >= 1400 in sequence length
  • Click on  Add found sequences to cart.
  • Click onShow to view your complete search result.

How to restrict the search to a certain dataset?

You can restrict your search results to certain datasets by selecting one or more of the checkboxes labeled  Sequences occur in all of these. Ref will restrict the search to the SILVA Reference dataset, RefNR to the non-redundant SILVA Reference dataset, LTP to the "The All-Species Living Tree" Project dataset, and my Cart to the contents of your cart. By selecting multiple datasets you restrict the search to those sequences which are part of all selected datasets.

What do the green, yellow and orange quality bars tell me?

The colored bars on the search page and in the short and detailed sequence views of the browser give a fast overview of the different quality aspects assigned to every sequence. The length of the bars is a graphical representation of the respective quality value.

The colors classify the information into four categories: A green bar () represents a value equal to or greater than 75. Yellow bars () stand for values equal to or greater than 50 but less than 75. Values less than 50 are expressed by an orange bar (). Red bars () are only used for scores of 0. Since “problematic” sequences, sequences of inadequate quality, as well as insufficiently aligned sequences were discarded from the databases only the Pintail scores can have 0.

The sequence quality score is a combination of the percentages of ambiguities, homopolymers longer 4 bases and possible vector contaminations. The overall score was normalized to fit into our unified scoring system ranging between 0 and 100 such as 100 is the best. The alignment quality is currently represented by the identity of a certain sequence, normalized between 0 and 100, to its next relatives in the SEED. The color of the Pintail bar represents the probability that the rRNA sequence contains anomalies or is a chimera, where 100 means that the probability for beeing anomalous or chimeric is low. If you like to know more about Pintail please have a look at the Opens external link in new windowPintail paper.

rRNA Sequences with less than 300 nucleotides and more than 2% of ambiguities, homopolymers or vector contamination have been rejected by the initial quality check procedure.

How can I get all type strains? How can I get all cultivated strains?

To get all type strains search for [T] in the strain field of the search page. To get all cultivated strains (including type strains) combine the search for [T] and s[C] (see above or Opens internal link in current windowthe search tutorial) . Searching for n[G] returns all ribosomal RNAs from genome sequences. More information can be found in the corresponding release background section.

Why is there a difference between the number of sequences shown in the Popups, the List view or Download page and the number of exported sequences?

The reason is that on the webpage the number of EMBL sequence entries (accession numbers) is shown. Since some sequence entries can have more than one "rRNA region" - just think about genome sequences with multiple rRNA operons  - the real number of "rRNA regions" is much higher. In the export you will get all the rRNA sequences from any sequence entry.

ARB/SILVA FAQs

SILVA Taxonomy and Classifications

Every sequence in the SILVA databases carries the ENA-EBI (EMBL) taxonomy assignment. Where available, the greengenes, RDP andd LTP taxonomies are added for comparison. The ENA (EMBL) taxonomy is retrieved simultaneously with the sequences, whereas the other taxonomies are assigned to the sequences based on accession numbers. For LSU rRNA sequences no additional up to date taxonomies are available.

For the SSU and LSU Ref(erence) databases guide trees are reconstructed. The trees are incrementally built using the ARB parsimony tool with filters to remove highly variable positions. Based on the guide trees, all phylogenetic assignments are manually curated.

The taxonomy for Bacteria and Archaea divisions are organized taking into account taxonomic information provided by Bergey’s Manual of Systematic Bacteriology (volumes 1 through 4), Bergey's Taxonomic Outlines (volume 5), and the List of Prokaryotic names with Standing in Nomenclature (Euzeby 1997) to supplement the Bergey’s resources with the latest information of validly described bacterial and archaeal taxa. Although we are conservative in which taxa to represent, and give the priority to valid taxa, we also include Opens external link in new windowCandidatus, and Opens external link in new windownames without standing in nomenclature. Furthermore, extensive effort is spent to represent prominent uncultured, environmental clades. The majority of these clades are annotated in the guide tree for the SSU Ref dataset based on literature surveys and personal communications. Taxonomic groups consisting only of sequences from uncultured organisms are named after the clone sequence submitted earliest. The taxonomic paths are standardized to six ranks; Domain, Phylum, Class, Order, Family and Genus. Only taxa that are exceptions to this rule are Myxococcales, which has suborders; and Thermomicrobia, which has subclass and suborder ranks. More information about the SILVA and LTP taxonomic frameworks can be found in the respective Opens external link in new windowpaper.

With SILVA release 111 the eukaryotic taxonomy has been significantly improved as part of the Opens internal link in current windowETWG project. Since SILVA is now part of the Opens external link in new windowUniEuk project we will adapt the Eukaryotic taxonomy as soon as the UniEuk experts have released the first version.

With SILVA release 138 the Opens external link in new windowGenome Taxonomy Database (GTDB) has been adopted. As a consequence of our efforts the following groups were prone to significant adaptations: Archaea, Enterobacterales, Deltaproteobacteria, Firmicutes, Clostridia. Betaproteobacteriales (formerly known as Betaproteobacteria) is now Burkholderiales, an order of Gammaproteobacteria. Epsilonproteobacteria vanishes within a new phylum Campilobacterota. Tenericutes are gone, they are now all part of Bacilli, inside Firmicutes.

Due to this exhaustive manual curation we believe that the SILVA rRNA databases contain the most up to date and detailed bacterial and archaeal taxonomic classification.

How to quickly get an unaliged FASTA sequence for an entry?

  • Browse or Search for your sequence entry of interest
  • If you are in the Browser click on Link to EMBL and copy the Fasta sequence from the EMBL entry
  • If your are on the Search or List page click on to jump to the Browser and then click on Link to EMBL

What kind of information is not available for the LSU dataset?

  • Pintail quality - currently Pintail can not be applied to LSU sequences
  • Greengenes and RDP taxonomy

How to "Scan for unknown fields" in ARB?

Scanning for unknown fields is necessary when you open your custom ARB database for the first time. The reason is that the SILVA database contains much more information assigned to each sequence than the original ARB databases. Please do the following steps:

  • Start ARB with your database
  • Go to Species -> Search and Query
  • When the Search and Query window pops up click on Search and select one sequence in the Hitlist
  • The Species Information window should pop up. Go to Fields and click on Scan unknown fields. An extended set of Databases fields should now be visible.

How to "Merge two ARB Databases"?

  • Start ARB and click on Merge Two ARB Databases in the ARB Intro window
  • Select Database I (source db) and Database II (target db) in the directories fields
  • Click on Go
  • The ARB Merge window pops up
  • To make sure that the name fields are unique and synchronized in both databases click on Check Names ...
  • When the Synchronize Names window appears, Rename Database I and II!!
  • Click on Transfer Species ...
  • Search for entries in the left database (I) you would like to transfer to the right database (II) using the Query menu
  • All db entries you would like to transfer should now be shown in the Hitlist
  • Click on Transfer Listed Species - Delete Duplicates in DB II
  • Click on Close
  • Click on Save Whole DB II ... or Save Changes of DB II as...
  • To see your new sequences in the guide tree, you have to add them first using the Parsimony 'Quick add' procedure in ARB

Show differences: The combination of Search species that don't match the query with no search string in the search field name shows all the sequences in the Hitlist which are different between DB I and DB II.

Preserve Alignment: No. Tick this box only in case the sequences in the two databases have different alignments and ARB should try to adjust the alignments according to a reference species which must be part of both databases while transferring the sequences.

Key ‘version’ exists, but has different type

If you encounter the error message Key ‘version’ exists, but has different "type" when merging two ARB databases, you need to change the "type" of a field in ARB using "convert fields" (next to where you can add/remove fields). It is disabled by default, so first "toggle expert mode". Unfortunately, ARB isn't very communicative regarding which fields have the wrong type. The "start" field is a recurring problem, as it was a "string" (or text in ARB lingo) field in older databases, but is now an "integer" (rounded numerical or number) field. This error is common when merging current SILVA databases with old (outdated) ones. To avoid further problems it is highly recommended to update to one of the latest SILVA datasets. 

Known problems with the old ARB version(s)

There were many problems with the older versions of ARB, especially with the Name- and PT-server.

They have all been solved in the new ARB releases (v6 or newer). We strongly recommend to upgrade your systems.

ARB 7 can be freely downloaded from: http://www.arb-home.de/downloads.html

What else?

If you have further questions related to ARB itself, have a closer look at our Opens internal link in current windowARB Support section.

Here, the following issues are addressed:

  • ARB on Mac OS X
  • ARB FAQs
  • ARB Bug Tracker