Before you are able to download your personal set of sequences, you have to select sequences or groups with the browser or the search functionality of SILVA.
With the Search functionalities you are able to perform complex queries by adding constraints to your query or by combining results from several queries (see below).
Example: You would like to get all Gammaproteobacteria with a minimal length of 1400 bases and an alignment quality better than 90.
Tricks: Using < and > allows to search for sequences obove or below a certain value (length or quality). To get all sequences from a specific publication you can use the DOI or Pubmed ID in the field "publication"; try 9572969 as an example. Remember: Complex queries might take some time - please be patient.
You use the cart to combine queries.
Example: You would like to get all Alphaproteobacteria or Betaproteobacteria with a minimal length of 1400 bases.
You can restrict your search results to certain datasets by selecting one or more of the checkboxes labeled Sequences occur in all of these. Ref will restrict the search to the SILVA Reference dataset, RefNR to the non-redundant SILVA Reference dataset, LTP to the "The All-Species Living Tree" Project dataset, and my Cart to the contents of your cart. By selecting multiple datasets you restrict the search to those sequences which are part of all selected datasets.
The colored bars on the search page and in the short and detailed sequence views of the browser give a fast overview of the different quality aspects assigned to every sequence. The length of the bars is a graphical representation of the respective quality value.
The colors classify the information into four categories: A green bar () represents a value equal to or greater than 75. Yellow bars () stand for values equal to or greater than 50 but less than 75. Values less than 50 are expressed by an orange bar (). Red bars () are only used for scores of 0. Since “problematic” sequences, sequences of inadequate quality, as well as insufficiently aligned sequences were discarded from the databases only the Pintail scores can have 0.
The sequence quality score is a combination of the percentages of ambiguities, homopolymers longer 4 bases and possible vector contaminations. The overall score was normalized to fit into our unified scoring system ranging between 0 and 100 such as 100 is the best. The alignment quality is currently represented by the identity of a certain sequence, normalized between 0 and 100, to its next relatives in the SEED. The color of the Pintail bar represents the probability that the rRNA sequence contains anomalies or is a chimera, where 100 means that the probability for beeing anomalous or chimeric is low. If you like to know more about Pintail please have a look at the Pintail paper.
rRNA Sequences with less than 300 nucleotides and more than 2% of ambiguities, homopolymers or vector contamination have been rejected by the initial quality check procedure.
To get all type strains search for [T] in the strain field of the search page. To get all cultivated strains (including type strains) combine the search for [T] and s[C] (see above or the search tutorial) . Searching for n[G] returns all ribosomal RNAs from genome sequences. More information can be found in the corresponding release background section.
The reason is that on the webpage the number of EMBL sequence entries (accession numbers) is shown. Since some sequence entries can have more than one "rRNA region" - just think about genome sequences with multiple rRNA operons - the real number of "rRNA regions" is much higher. In the export you will get all the rRNA sequences from any sequence entry.
Every sequence in the SILVA databases carries the ENA-EBI (EMBL) taxonomy assignment. Where available, the greengenes, RDP andd LTP taxonomies are added for comparison. The ENA (EMBL) taxonomy is retrieved simultaneously with the sequences, whereas the other taxonomies are assigned to the sequences based on accession numbers. For LSU rRNA sequences no additional up to date taxonomies are available.
For the SSU and LSU Ref(erence) databases guide trees are reconstructed. The trees are incrementally built using the ARB parsimony tool with filters to remove highly variable positions. Based on the guide trees, all phylogenetic assignments are manually curated.
The taxonomy for Bacteria and Archaea divisions are organized taking into account taxonomic information provided by Bergey’s Manual of Systematic Bacteriology (volumes 1 through 4), Bergey's Taxonomic Outlines (volume 5), and the List of Prokaryotic names with Standing in Nomenclature (Euzeby 1997) to supplement the Bergey’s resources with the latest information of validly described bacterial and archaeal taxa. Although we are conservative in which taxa to represent, and give the priority to valid taxa, we also include Candidatus, and names without standing in nomenclature. Furthermore, extensive effort is spent to represent prominent uncultured, environmental clades. The majority of these clades are annotated in the guide tree for the SSU Ref dataset based on literature surveys and personal communications. Taxonomic groups consisting only of sequences from uncultured organisms are named after the clone sequence submitted earliest. The taxonomic paths are standardized to six ranks; Domain, Phylum, Class, Order, Family and Genus. Only taxa that are exceptions to this rule are Myxococcales, which has suborders; and Thermomicrobia, which has subclass and suborder ranks. More information about the SILVA and LTP taxonomic frameworks can be found in the respective paper.
With SILVA release 111 the eukaryotic taxonomy has been significantly improved as part of the ETWG project. Since SILVA is now part of the UniEuk project we will adapt the Eukaryotic taxonomy as soon as the UniEuk experts have released the first version.
With SILVA release 138 the Genome Taxonomy Database (GTDB) has been adopted. As a consequence of our efforts the following groups were prone to significant adaptations: Archaea, Enterobacterales, Deltaproteobacteria, Firmicutes, Clostridia. Betaproteobacteriales (formerly known as Betaproteobacteria) is now Burkholderiales, an order of Gammaproteobacteria. Epsilonproteobacteria vanishes within a new phylum Campilobacterota. Tenericutes are gone, they are now all part of Bacilli, inside Firmicutes.
Due to this exhaustive manual curation we believe that the SILVA rRNA databases contain the most up to date and detailed bacterial and archaeal taxonomic classification.
Scanning for unknown fields is necessary when you open your custom ARB database for the first time. The reason is that the SILVA database contains much more information assigned to each sequence than the original ARB databases. Please do the following steps:
Show differences: The combination of Search species that don't match the query with no search string in the search field name shows all the sequences in the Hitlist which are different between DB I and DB II.
Preserve Alignment: No. Tick this box only in case the sequences in the two databases have different alignments and ARB should try to adjust the alignments according to a reference species which must be part of both databases while transferring the sequences.
If you encounter the error message Key ‘version’ exists, but has different "type" when merging two ARB databases, you need to change the "type" of a field in ARB using "convert fields" (next to where you can add/remove fields). It is disabled by default, so first "toggle expert mode". Unfortunately, ARB isn't very communicative regarding which fields have the wrong type. The "start" field is a recurring problem, as it was a "string" (or text in ARB lingo) field in older databases, but is now an "integer" (rounded numerical or number) field. This error is common when merging current SILVA databases with old (outdated) ones. To avoid further problems it is highly recommended to update to one of the latest SILVA datasets.
There were many problems with the older versions of ARB, especially with the Name- and PT-server.
They have all been solved in the new ARB releases (v6 or newer). We strongly recommend to upgrade your systems.
ARB 7 can be freely downloaded from: http://www.arb-home.de/downloads.html
If you have further questions related to ARB itself, have a closer look at our ARB Support section.
Here, the following issues are addressed: