Release information: SILVA 115

Release information of the SILVA SSU and LSU databases 115 as of August 23, 2013

	SSU 115		LSU 115
Parc	3,808,884	(+ 614,106)	361,874	(+ 73,157)
Ref	1,426,414	(+ 686,781)	39,412	(+ 10,106)
Ref NR 99	479,726	(+192,868)	with release 115, only SSU Ref NR 99 contains a guide tree

Former statistics:

SILVA 94, SILVA 95, Silva 96, SILVA 98, SILVA 100

SILVA 102 , SILVA 104, Opens internal link in current window SILVA 106, SILVA 108, SILVA 111

Small Subunit rRNA Database

SSU Parc (Web database only) contains all aligned sequences with an alignment identity value equal and above 50, an alignment quality value equal and above 40 as well as an basepair score or sequence quality equal and above 30. All sequences with a Pintail value < 50 or an alignment quality value < 75 have been assigned to color group 1 in ARB (red). All Living Tree Project or StrainInfo typestrains have been assigned to color group 2 in ARB (light blue). No further sequence curation has been applied.

To create SSU Ref (ARB file), all sequences below 1,200 bases for Bacteria and Eukarya and below 900 bases for Archaea or an alignment identity below 70 or an alignment quality value below 50 have been removed from SSU Parc. All sequences with a Pintail value < 50 or an alignment quality value < 75 have been assigned to color group 1 in ARB (red). All Living Tree Project or StrainInfo typestrains have been assigned to color group 2 in ARB (light blue).

To create SSU Ref NR 99 (Web database & ARB file), a 99% identity criterion to remove highly identical sequences using the UCLUST tool was applied. Sequences from cultivated species have been preserved in all cases. A guide tree was calculated by adding all sequences to the SSU Ref tree of SILVA release 111. For tree calculation, highly variable positions were removed for Bacteria, Archaea, and Eukarya with the respective position variability filters. Position variability filters for Bacteria, Archaea and Eukarya have been calculated and added to the dataset. Detailed information about the SSU Ref NR dataset is available Opens internal link in current window here.

All: Before using the alignment for extensive phylogenetic reconstructions all sequences should be checked carefully.

Large Subunit rRNA Databases

LSU Parc (Web database & ARB file) contains all aligned sequences with an alignment identity value equal and above 40 and an alignment quality value, a basepair score or a sequence quality equal and above 30. All sequences with an alignment quality value < 75 have been assigned to color group 1 in ARB (red). All Living Tree Project or StrainInfo typestrains have been assigned to color group 2 in ARB (light blue). No further curation has been applied.

Additionally, for LSU Ref (Web database & ARB file) all sequences below 1,900 bases or an alignment identity below 60 have been removed, a guide tree was calculated based on the LSU Ref tree of SILVA release 111, and basic filters have been added.

Please take into account that the LSU SEED consists only of around 2,800 sequences and there is no guaranty that well aligned close relatives have always been available. We would recommend additional manual curation before using it for extensive phylogenetic reconstructions.

Taxonomies, Names, Type Strain & Genome Information

Taxonomy

With SILVA release 102 the default taxonomy shown on the webpage (browser/search) is the SILVA taxonomy. Briefly, the tree for Bacteria and Archaea has been organized based on the Bergey's taxonomic outline, LPSN and the literature. Starting with SILVA release 111 extensive care has been taken to also improve the Opens internal link in current window eukaryotic taxonomy. The SILVA taxonomy is only available for the sequences that are part of the Ref(erence) datasets (SSU Ref & LSU Ref). To show the classification of all sequences (Parc) in the SILVA databases you have to switch to EMBL taxonomy.

Alternative Taxonomies

Besides the SILVA and EMBL taxonomy, alternative classifications taken from the greengenes, RDP II and LTP projects are also available in SILVA. On the webpage, the user can switch using the taxonomy menu. In ARB, the different taxonomies can be found in the fields: tax_slv, tax_embl, tax_gg, tax_rdp and tax_ltp for SILVA, EMBL, greengenes, RDP II and LTP, respectively. The corresponding *_name fields shows the respective sequence name for each entry. Please take into account that greengenes, RDP II and LTP provide only a subset of the sequences hosted by SILVA. If no taxonomic mapping to greengenes, RDP II or LTP was available they are assigned as "unclassified" and the respective sequence name equals EMBL. For the LSU datasets only SILVA, LTP and EMBL taxonomies are available.

Altenative Names

All names of validly described species in the SSU and LSU databases have been checked for changes (basonyms, synonyms and orthographical corrections) against the DSMZ " Opens external link in new window Nomenclature up to date" catalogue released in June 2013.

Cultured and Type strains

The information if a sequence originates from a cultured or type strain has been added to the field strain and is indicated by [T] and [C]. Several sources have been used to compile the information: The Opens external link in new window StrainInfo.net bioportal, The Ribosomal Database Project II (10.32) and the Living Tree Project which provides manually curated information compliant with Euzebys "List of Prokaryotic names with Standing in Nomenclature".

Genomes

The information if a sequence originates from a genome project has been taken from Opens external link in new window EMBL and added to the field strain. It is indicated by e[G].

Detailed information about the corresponding identifiers and target databases can be found in the table to the right.

The identifiers can be used for data retrieval by searching in the strain field see Opens internal link in current window FAQ.

Quality Values

The length and colours of the bars give a first indication on the sequence and alignment quality as well as the risk for sequence anomalies based on Pintail analysis. After downloading the sequences as an ARB file, sequences that need attention can be selected by searching for low quality (alignment, sequence) or Pintail values in the corresponding ARB database fields. A full description of the colour code and all database fields available in the ARB files can be found in the Opens internal link in current window FAQ section. Taking into account the rich set of sequence associated information that comes along with every SILVA sequence, user designed sub-databases can be easily generated.

SEEDs

All rRNA sequences have been aligned based on a completely manually re-checked SEED alignment of 59,235 rRNA sequences for SSU and 2,868 rRNA sequences for LSU. The SSU alignment is based on the official ssu_jan04 release of the ARB Project. The SSU SEED alignment has been considerably improved for Archaea by manual addition of more than 1,000 sequences. All SSU Eukaryotic sequences (18S) have been cross-checked by Wolfgang Ludwig before their addition to the SEED. Most of the bacterial sequences have also undergone a curation process carried out by the SILVA Team. We would rate our SSU SEED alignment for all Bacteria and Archaea as good and for Eukarya as reasonable.

The LSU alignment was provided by Wolfgang Ludwig and has not been released before SILVA. It was cross-checked by the SILVA Team before using it as the SEED for automatic alignment. Bacteria and Archaea could be rated as good. The Eukaryotes need definitely further attention.

Statistics

Sequence Retrieval and Processing

	SSU 115	LSU 115
candidates (total)	5,327,100	588,497
RNAmmer	57,976	22,935
< 300 bases	1,140,317	173,653
> 2% ambiguities	20,764	6160
> 2% homopolymers	66,625	12,802
> 2% vector contamination	2641	538
low alignment identity	340,111	50,948
total rejected by QC	1,518,216	226,623

Sequences have been retrieved from EMBL Opens external link in new window Release 115 (March 13) using a complex keyword search procedure and sequence based search with RNAmmer profiles. Cross checks with RDP II indicated no loss of primary data. Most of the sequences rejected by a low identity value after alignment with SINA were classified as not ribosomal RNA sequences by manual inspection.

1. Growth of the ribosomal RNA databases since 1992

Blue: RDP II, orange: SILVA SSUParc based on EMBL releases

2. Length Distribution (SSU & LSU)

Red: raw data, black: the quality checked & aligned SSUParc sequences

Red: raw data, black: the quality checked & aligned LSUParc sequences

Basic statistics for the SILVA databases

	SSUParc	SSURef	SSURef NR	LSUParc	LSURef
Version	115	115	115	115	115
Total	3,808,884	1,426,414	479,726	361,874	39,412
Bacteria	3,397,368	1,303,785	418,497	44,740	27,485
Archaea	156,088	42,081	17,530	598	503
Eukaryota	255,288	80,547	43,698	316,536	11,424
Cultured	38,069	30,231	30,127	19,001	5267
Typestrains	21,700	19,284	19,180	8755	5900

Strain Identifiers

Source	Information	Tag	Datasets
EMBL	Typestrains	(t)	SSU, LSU
EMBL	Genomes	e[G]	SSU, LSU
Straininfo.net	Cultured	s[C]	SSU, LSU
Straininfo.net	Typestrains	s[T]	SSU, LSU
Living Tree Project	Typestrains (curated)	l[T]	SSU
RDP II	Typestrains	r[T]	SSU

RNAmmer

RNAmmer is a computational predictorfor the major rRNA species (SSU, LSU) from all three domains of life. The program uses hiddenMarkov models trained on data from the European ribosomal RNA database project. SILVA runs the profiles of RNAmmer on all sequence entries of the EMBL archive to complement the existing predictions. All predictions are marked with RNAmmer in the ann_src_field. More information about RNAmmer can be found in the paper.

New in Release 115

Webpage
- All sequences are now automatically classified according to the SILVA Ref(NR) taxonomy.
- The taxonomy labeled as SILVA Ref (LSU) and SILVA Ref NR (SSU) in the browser has undergone manual curation.
- TestPrime has been updated to show specificity of primers.
- TestProbe 3.0 is now online.
- SINA alignment service has been updated with the latest version of SINA.
- Documentation of database fields updated.
- LTP updated to release 111.
ARB files
- SSU Parc ARB file is now only available in the File Archive.
- SSU Ref now contains all sequences that match the Ref quality criterion. HSM, MWM and GNHM are not longer removed.
- The guide tree is only available for SSU Ref NR 99 and LSU Ref.
Pipeline
- Several modifications to the pipeline were necessary to automatically produce Ref and Ref NR.
- Automatic classification of all sequences based on the SILVA taxonomy has been added.
Seed
- The SSU Seed was extended with the latest LTP version.
SILVA
- A new SILVA paper has been published.
Eukaryotic Taxonomy
- All groups have been significantly revised, see Eukaryotic Taxonomy

Known Bugs

SSUParc: 21,000 sequences have no Pintail values

Citations

Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucl. Acids Res. 41 (D1): D590-D596.

If you use SINA please cite:

Pruesse, E, Peplies, J and Glöckner, FO (2012) SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes. Bioinformatics, 28, 1823-1829