Methods
How to Cite CosmosID
Reference for publications:
CosmosID Metagenomics Cloud, app.cosmosid.com, CosmosID Inc., www.cosmosid.com
Methods:
The system utilizes a high performance data-mining k-mer algorithm that rapidly disambiguates millions of short sequence reads into the discrete genomes engendering the particular sequences. The pipeline has two separable comparators: the first consists of a pre-computation phase for reference databases and the second is a per-sample computation. The input to the pre-computation phase are databases of reference genomes, virulence markers and antimicrobial resistance markers that are continuously curated by CosmosID scientists. The output of the pre-computational phase is a phylogeny tree of microbes, together with sets of variable length k-mer fingerprints (biomarkers) uniquely associated with distinct branches and leaves of the tree. The second per-sample computational phase searches the hundreds of millions of short sequence reads, or alternatively contigs from draft de novo assemblies, against the fingerprint sets. This query enables the sensitive yet highly precise detection and taxonomic classification of microbial NGS reads. The resulting statistics are analyzed to return the fine-grain taxonomic and relative abundance estimates for the microbial NGS datasets. To exclude false positive identifications the results are filtered using a filtering threshold derived based on internal statistical scores that are determined by analyzing a large number of diverse metagenomes. The same approach is applied to enable the sensitive and accurate detection of genetic markers for virulence and for resistance to antibiotics.