The SILVA Taxonomy

Overarching resource description

SILVA maintains a taxonomic resource for the SSU and LSU rRNA sequences of Bacteria, Archaea and Eukarya that are gathered from public nucleic acid sequence repositories. The SILVA taxonomy is mostly a secondary data resource that integrates information from several external resources of nomenclature and classification. The whole SILVA taxonomy is freely available for both academic and commercial usage under the Opens external link in new window CC BY 4.0 license and can be accessed through three different channels: the ARB databases, the formatted text files and the multiple functionalities of the SILVA webpage.

From release 138 on, Pablo Yarza on behalf of SILVA’s partner Opens external link in new window Ribocon GmbH, became the successor of Pelin Yilmaz for the tasks of curating the SILVA taxonomy. The previous intense activities focused on defining and naming clades of uncultured organisms have been re-allocated to keeping up-to-date with the increasing taxonomy changes, as well as on filling up the incomplete taxonomy paths for thousands of sequences with missing intermediate ranks. Since the former release 132, we integrated approximately two years of advances in classification and nomenclature together with a significant alignment with the Genome Taxonomy Database ( Opens external link in new window GTDB) and Universal taxonomic framework and integrated reference gene databases for Eukaryotic biology, ecology, and evolution (UniEuk) taxonomies, especially applied to the ranks of order and above.

Indeed, several discussions between Opens external link in new window Bergey’s Manual Trust, GTDB, and SILVA at the BISMiS Meeting 2016 in India and the ISME 2018 in Leipzig, led to the decision that SILVA and Bergey’s would follow and integrate the relevant updates from the GTDB taxonomy. This was not only an executive decision, but also an expression of our strong confidence in the phylogenomic methods GTDB employs. Before we adopt a change induced by GTDB taxonomy, we verify its consistency with our 16S rRNA phylogenetic schema. In general, both phylogenies are consistent, leading to a similar taxonomy in both resources. In cases of strong disagreement, we reject a GTDB taxonomy proposal and keep the former 16S rRNA schema. We are aware of the magnitude of taxonomy changes that GTDB has created and still creates. Therefore, we have mirrored, and highly welcomed all positive and negative opinions from our user community in this regard. We will carry on to do so, as we are committed to continuous improvement and high quality. Considering that GTDB taxonomy is progressing and that there are currently no strong indications against the overall adoption of the updates therein, we see no reason to stop our alignment with GTDB’s taxonomic point of view at this time.

Curation Process

The SILVA taxonomy is built with a semi-automatic data curation procedure to provide every sequence entry with a taxonomic classification down to genus level. The manual taxonomic curation process starts with the definition of a time point where we stop considering new changes in the external resources. These changes include the maintenance of the global taxonomy by fixing small errors on lineages, such as typos, misassignments, as well as the adoption of bigger taxonomy changes that occurred in the International Journal of Systematic and Evolutionary Microbiology ( Opens external link in new window IJSEM) up to the set time point. Sometimes there are wider, special efforts needed to synchronize the SILVA taxonomy with an external resource, as with the release 138; or when we attempt to fill gaps down to the genus level, wherever the information is missing. These taxonomy changes are applied to a local copy of the former SILVA’s REF NR 99 ARB databases (SSU and LSU), whose phylogenetic trees guide our taxonomic assessment. We use the rRNA phylogenetic schemas from these trees to revisit the taxonomic knowledge, and sometimes offer new classification perspectives. For example, we traditionally added new names for meaningful environmental clades, such as those traditionally provided for marine SAR clades, or rumen-relevant clades, or informed about notable polyphylies in early-described or well established groups. However, due to the recent and ever increasing amount of sequences being produced by new sequencing technologies, we have almost stopped adding new names to groups of sequences coming from uncultured prokaryotes and other microorganisms. Nevertheless, in order to provide complete taxonomic paths, down to the genus level, we are increasing the number of “Incertae Sedis” high-rank taxa. This latter task is especially time consuming in Eukaryotes because of the abundant single-sequence and or poly-/paraphyletic taxa.

For the majority of cultured taxa, SILVA integrates taxonomy from the following external authoritative and/or reference resources: GTDB, List of Prokaryotic names with Standing in Nomenclature ( Opens external link in new window LPSN), NCBI, Bergeys and UniEuk. LPSN is used as a proxy to IJSEM, recognized as the single authoritative provider of valid nomenclature for Bacteria and Archaea. In addition to these public resources, we accept a highly controlled amount of personal communications (a category that we refer to as Users) prior publication, when the data offered is of high impact, scientifically sound, and originated from authorities in the respective taxonomic area. This exceptional situation is a helpful feature for the majority of our users, as it allows them to stay up-to-date with groundbreaking taxonomic contributions during the timespan between two SILVA releases.

In order to solve potential discrepancies between all these resources, e.g. due to different updating speeds or alternative classifications, we give Bergey’s outlines (whenever available) the highest priority, as we believe it is the leading voice for the current classification of prokaryotes. This results in the following priority rules; in case of cultured Prokaryotes: Bergey’s > GTDB > LPSN > Users > NCBI, in case of uncultured clades: Users > SILVA. In contrast, for Eukaryotes we follow different references depending on the target group. In Fungi: NCBI, and for other Eukaryotes: UniEuk > NCBI. To compare taxonomies derived from different resources, the mapping can be done via accession number or via NCBI Taxonomy ID. For example, GTDB - SILVA mappings can only be solved via accession number.

alternate

Meanwhile, the SILVA databases to-be-released are prepared by the technical team, who runs the data and software pipeline. Once the new ARB databases are ready, the recently curated taxonomic data is mapped onto it and this new template is used to proceed with further revisions before public release. This tedious process is time consuming and explains the delay between the SILVA releases and those of primary resources such as IJSEM. This delay sometimes observed between SILVA and other external resources also results from the misalignment of the external resources’ release cycles. This may lead to differences between the SILVA taxonomy we publish, and the current taxonomic state of the other external resources we use. In such cases, and depending on the stage of our SILVA release cycle, integrating these differences can take up to several months.

Technology for taxonomic curation

The kind of meticulous data curation activities that SILVA provides are a well known bottleneck of many secondary resources. Over the last years, we have embarked on several ventures to optimize the curation process. Notably, in collaboration with the UniEuk project ( Opens external link in new window Berney et al. 2017) where a community approach decentralizes parts of the governance and maintenance of the biocuration job. Within this project, we developed a software platform which covers tools for taxonomy consultancy, submission of taxonomy proposals, and collaborative acceptance/curation of proposals based on determined expert curator roles. The GitHub of the project can be consulted for further Opens external link in new window information. The tests conducted under the umbrella of the protistologist community show that this concept and software could be extended to other lineages, if not all, provided that the taxonomic curation efforts are supported by a global and mature community of users and creators of taxonomic information.

Last update: 07.03.2022