VirJenDB

Documentation

last update

2025-August-25

VirJenDB

v0.2

This page is up-to-date!

VirJenDB Metadata

The VirJenDB metadata are grouped into four categories: collection, sample, taxonomy tree and dbsource.

Metadata Sources

NCBI Virus

BV-BRC

ICTV

ENA

Fields

Tab-separated files containing the following field descriptions can be downloaded from GitHub. You can open them with your favourite spreadsheet application like LibreOffice or Excel.

Organizational

VirJenDB Field Name Description Sources
Accession GenBank Accession of the virus sequence. It equals NCBI Accession, ENA Accession, and DDBJ Accession, which are part of the INSDC. BVBRC,
NCBI Virus
Assembly Accession NCBI Assembly Accession of the sequence. BVBRC,
NCBI Virus
BV-BRC Accession Accession of the sequence in the BV-BRC database. BVBRC
Bioproject Accession References the BioProject Accession that the virus sequence is a part of (INSDC). Also BioProject Accession of the host of the prophages. BVBRC,
NCBI Virus
Biosample Accession References the BioSample Accession that the virus sequence is a part of (INSDC). BVBRC,
NCBI Virus
Release Date Date when the sequence was made public in a repository (e.g. GenBank, BV-BRC). NCBI Virus
SRA Accession NCBI Sequence Read Archive (SRA) accession ID. BVBRC,
NCBI Virus
Sequence Source The name of the source database or tool of the virus sequence. VJDB
Sequencing Center Institution that sequenced the collected sample. BVBRC
Submitter Submitter name(s) of the sequence. NCBI Virus
Submitter Organization Organization the submitter(s) is(are) affiliated with. NCBI Virus
VirJenDB Accession VirJenDB sequence ID of the virus sequence. VJDB
VirJenDB Metadata Version VirJenDB metadata version VJDB

Sample

VirJenDB Field Name Description Sources
Collection Date Date when the sample the sequence originates from was collected or sampled. BVBRC,
NCBI Virus
Collection Geo Location Country name and additional identifier like a state abbreviation from where the sample the sequence originates from was collected/sampled. BVBRC,
NCBI Virus
Collection Source Host Tissue Tissue, Specimen or Source of the host from which the sample the sequence originates from was collected or sampled. NCBI Virus
Country Country or sea area where the sample the sequence originates from was collected or sampled. BVBRC,
NCBI Virus
Genome Completeness Wether the sequence is a partial or complete sequence from NCBI. NCBI Virus
Genotype The genotype or subtype of a virus sequence, as provided by the sequence submitter. This field comes from the “/serotype” field of the GenBank record and is shown as submitted. Consistency and accuracy may vary. NCBI Virus
Isolate Individual isolate from which the sample was obtained. NCBI Virus
Molecule Type Molecule type of the sequence, e.g. ssRNA. Note that there are 15 molecule types corresponding to combinations of the Baltimore Classifications. NCBI Virus
NCBI Realm A ‘Realm’ is the highest taxonomic rank into which virus species can be classified. Defined by the NCBI. NCBI Taxonomy
Organism Name GenBank organism name, a taxonomic name at species level or below the species level. NCBI Virus
Segment Name Name of the virus segment. Can be a number. NCBI Virus
Sequence Length Length of the sequence in basepairs (bp) from NCBI Virus or BVBRC. BVBRC,
NCBI Virus
Strain Name of the strain from which the sample was obtained. BVBRC
Submitter Country Country of the submitter’s organization. NCBI Virus
Submitter Region Region or location of the organization of the submitters. NCBI Virus

Source

VirJenDB Field Name Description Sources
BV-BRC Sequence Determines if this sequence is available in BV-BRC. VJDB
GenBank Sequence Determines if this sequence is available in Genbank. NCBI Virus
ICTV Exemplar This accession is a representative of the species in the ICTV Taxonomy. ICTV
RefSeq Sequence Determines if this sequence is a reference sequence in the RefSeq database. BVBRC,
NCBI Virus

Host

VirJenDB Field Name Description Sources
BV-BRC Host Age Host Age and Unit from BV-BRC. BVBRC
BV-BRC Host Group The host’s broader group association in a taxonomic context from BV-BRC. BVBRC
Host Accession Host Accession from the GenBank that can refer to the host sequence (not the prophage). VJDB
Host Age Host age number only from “BVBRC Host Age”: not yet standardized! BVBRC
Host Assembly Accession Host Assembly Accession of the sequence (INSDC). IMG/VR,
PhD,
PhiSpy
Host Average Sequence Depth Average sequencing depth across the host sequence from IMG/VR, PhiSpy or PhD. IMG/VR,
PhD,
PhiSpy
Host BioProject Accession References the BioProject Accession that the host sequence is a part of (INSDC). IMG/VR,
PhD,
PhiSpy
Host Biosample Accession References the BioSample Accession that the host sequence is a part of (INSDC). IMG/VR,
PhD,
PhiSpy
Host Collection Date Date when the sample the host sequence originates from was collected or sampled. IMG/VR,
PhD,
PhiSpy
Host Common Name Common name of the host the virus sequence was collected from, e.g. human. BVBRC,
NCBI Virus
Host Country Country or sea area where the sample the host sequence originates from was collected/sampled. IMG/VR,
PhD,
PhiSpy
Host GTDB Species Name References the Host GTDB Species Name to which the NCBI host taxonomy ID of the virus sequence could be mapped. IMG/VR,
PhD,
PhiSpy
Host IMGM Taxon OID References the Host IMGM Taxon OID from which the virus sequence was extracted by IMG/VR. IMG/VR,
PhD,
PhiSpy
Host NCBI Tax ID The host sequence NCBI Taxonomy ID. Can be based on the mapping by GTDB of the GTDB Species Name to the NCBI Taxonomy ID. GTDB
Host Sequencing Platform Instrument platform used for host sequencing, multiple are seperated (only for assemblies). IMG/VR,
PhD,
PhiSpy
Host Sex The sampled hosts sex from the BV-BRC. BVBRC
Host Species The sampled host’s scientific species name. BVBRC
ICTV Host Group The ICTV Host source field content to be combined with other host group data from the other sources. ICTV

Analysis

VirJenDB Field Name Description Sources
Average Depth Average sequencing depth across the sequence from BVBRC. BVBRC
Cluster Reference The VirJenDB Accession of the representative of its cluster. See documentation for selection details. VJDB
Cluster Representative If the sequence is a representative sequence of a group of sequences, computed by VClust. VJDB
GC Content Percentage of G and C nucleotides in the sequence from BVBRC. BVBRC
Pangolin Lineage Lineage determined by Pangolin. BVBRC,
NCBI Virus
Platform Instrument platform used for sequencing, multiple are seperated (only for assemblies). BVBRC
Predicted The tool or database that predicted the sequence. No value means that it is not a predicted sequence. IMG/VR,
PhD,
PhiSpy
Unique Representative If the sequence is a representative for a group of identical VirJenDB sequences. VJDB

Taxonomy

VirJenDB Field Name Description Sources
ICTV Class A ‘Class’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Family A ‘Family’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Genus A ‘Genus’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Kingdom A ‘Kingdom’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Lineage Names Aggregation of the ICTV Taxonomy Names of the virus lineage. VJDB
ICTV Order A ‘Order’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Phylum A ‘Phylum’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Realm A ‘Realm’ is the highest taxonomic rank into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Species A Species is the lowest taxonomic rank in the hierarchy approved by the ICTV. While subspecies levels of classification may exist for some viruses (e.g. Hepatitis C virus), the ICTV does not classify viruses below the species level. Defined by the ICTV. ICTV
ICTV Subfamily A ‘Subfamily’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Subgenus A ‘Subgenus’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Suborder A ‘suborder’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
ICTV Subphylum A ‘Subphylum’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. ICTV
NCBI Class A ‘Class’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. BVBRC
NCBI Family A ‘Family’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. BVBRC,
NCBI Virus
NCBI Genus A ‘Genus’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. BVBRC,
NCBI Virus
NCBI Kingdom A ‘Kingdom’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. BVBRC
NCBI Lineage IDs NCBI TaxIDs of the virus’ taxonomy from BVBRC. BVBRC
NCBI Lineage Names Aggregation of the NCBI Taxonomy Names of the virus lineage. BVBRC
NCBI Order A ‘Order’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. BVBRC
NCBI Phylum A ‘Phylum’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. BVBRC
NCBI Species A Species is the lowest taxonomic rank in the hierarchy approved by the NCBI. While subspecies levels of classification may exist for some viruses (e.g. Hepatitis C virus), the NCBI does not classify viruses below the species level. Defined by the NCBI. BVBRC,
NCBI Virus
NCBI SpeciesTax ID The virus species NCBI Taxonomy ID. BVBRC
NCBI Subfamily A ‘Subfamily’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. NCBI Taxonomy
NCBI Subgenus A ‘Subgenus’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. NCBI 
NCBI Suborder A ‘suborder’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. NCBI Taxonomy
NCBI Subphylum A ‘Subphylum’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. NCBI Taxonomy

Internal

VirJenDB Field Name Description Sources
Unique Reference The VirJenDB Accession for the unique representative sequence. For the non-phages, the smallest Accession in the group. See documentation for selection details. VJDB

old

Collection

Sample

Tree

DBSource

Table of Contents