last update
2025-August-25
VirJenDB
v0.2
VirJenDB Metadata
The VirJenDB metadata are grouped into four categories: collection, sample, taxonomy tree and dbsource.
-
Contains everything related to sample collection
-
Contains everything that is related to the description of the specific entry
-
Contains the ICTV and NCBI taxonomy trees
-
Contains the information from which database the entry originates from.
Metadata Sources
NCBI Virus
- Virus.1.0
- NCBI help pages
- result table
BV-BRC
ICTV
ENA
- Data standards help page
- ENA Influenza virus reporting standard checklist
- GSC MIUVIGS
- ENA virus pathogen reporting standard checklist
Fields
Tab-separated files containing the following field descriptions can be downloaded from GitHub. You can open them with your favourite spreadsheet application like LibreOffice or Excel.
Organizational
VirJenDB Field Name | Description | Sources |
---|---|---|
Accession | GenBank Accession of the virus sequence. It equals NCBI Accession, ENA Accession, and DDBJ Accession, which are part of the INSDC. | BVBRC, NCBI Virus |
Assembly Accession | NCBI Assembly Accession of the sequence. | BVBRC, NCBI Virus |
BV-BRC Accession | Accession of the sequence in the BV-BRC database. | BVBRC |
Bioproject Accession | References the BioProject Accession that the virus sequence is a part of (INSDC). Also BioProject Accession of the host of the prophages. | BVBRC, NCBI Virus |
Biosample Accession | References the BioSample Accession that the virus sequence is a part of (INSDC). | BVBRC, NCBI Virus |
Release Date | Date when the sequence was made public in a repository (e.g. GenBank, BV-BRC). | NCBI Virus |
SRA Accession | NCBI Sequence Read Archive (SRA) accession ID. | BVBRC, NCBI Virus |
Sequence Source | The name of the source database or tool of the virus sequence. | VJDB |
Sequencing Center | Institution that sequenced the collected sample. | BVBRC |
Submitter | Submitter name(s) of the sequence. | NCBI Virus |
Submitter Organization | Organization the submitter(s) is(are) affiliated with. | NCBI Virus |
VirJenDB Accession | VirJenDB sequence ID of the virus sequence. | VJDB |
VirJenDB Metadata Version | VirJenDB metadata version | VJDB |
Sample
VirJenDB Field Name | Description | Sources |
---|---|---|
Collection Date | Date when the sample the sequence originates from was collected or sampled. | BVBRC, NCBI Virus |
Collection Geo Location | Country name and additional identifier like a state abbreviation from where the sample the sequence originates from was collected/sampled. | BVBRC, NCBI Virus |
Collection Source Host Tissue | Tissue, Specimen or Source of the host from which the sample the sequence originates from was collected or sampled. | NCBI Virus |
Country | Country or sea area where the sample the sequence originates from was collected or sampled. | BVBRC, NCBI Virus |
Genome Completeness | Wether the sequence is a partial or complete sequence from NCBI. | NCBI Virus |
Genotype | The genotype or subtype of a virus sequence, as provided by the sequence submitter. This field comes from the “/serotype” field of the GenBank record and is shown as submitted. Consistency and accuracy may vary. | NCBI Virus |
Isolate | Individual isolate from which the sample was obtained. | NCBI Virus |
Molecule Type | Molecule type of the sequence, e.g. ssRNA. Note that there are 15 molecule types corresponding to combinations of the Baltimore Classifications. | NCBI Virus |
NCBI Realm | A ‘Realm’ is the highest taxonomic rank into which virus species can be classified. Defined by the NCBI. | NCBI Taxonomy |
Organism Name | GenBank organism name, a taxonomic name at species level or below the species level. | NCBI Virus |
Segment Name | Name of the virus segment. Can be a number. | NCBI Virus |
Sequence Length | Length of the sequence in basepairs (bp) from NCBI Virus or BVBRC. | BVBRC, NCBI Virus |
Strain | Name of the strain from which the sample was obtained. | BVBRC |
Submitter Country | Country of the submitter’s organization. | NCBI Virus |
Submitter Region | Region or location of the organization of the submitters. | NCBI Virus |
Source
VirJenDB Field Name | Description | Sources |
---|---|---|
BV-BRC Sequence | Determines if this sequence is available in BV-BRC. | VJDB |
GenBank Sequence | Determines if this sequence is available in Genbank. | NCBI Virus |
ICTV Exemplar | This accession is a representative of the species in the ICTV Taxonomy. | ICTV |
RefSeq Sequence | Determines if this sequence is a reference sequence in the RefSeq database. | BVBRC, NCBI Virus |
Host
VirJenDB Field Name | Description | Sources |
---|---|---|
BV-BRC Host Age | Host Age and Unit from BV-BRC. | BVBRC |
BV-BRC Host Group | The host’s broader group association in a taxonomic context from BV-BRC. | BVBRC |
Host Accession | Host Accession from the GenBank that can refer to the host sequence (not the prophage). | VJDB |
Host Age | Host age number only from “BVBRC Host Age”: not yet standardized! | BVBRC |
Host Assembly Accession | Host Assembly Accession of the sequence (INSDC). | IMG/VR, PhD, PhiSpy |
Host Average Sequence Depth | Average sequencing depth across the host sequence from IMG/VR, PhiSpy or PhD. | IMG/VR, PhD, PhiSpy |
Host BioProject Accession | References the BioProject Accession that the host sequence is a part of (INSDC). | IMG/VR, PhD, PhiSpy |
Host Biosample Accession | References the BioSample Accession that the host sequence is a part of (INSDC). | IMG/VR, PhD, PhiSpy |
Host Collection Date | Date when the sample the host sequence originates from was collected or sampled. | IMG/VR, PhD, PhiSpy |
Host Common Name | Common name of the host the virus sequence was collected from, e.g. human. | BVBRC, NCBI Virus |
Host Country | Country or sea area where the sample the host sequence originates from was collected/sampled. | IMG/VR, PhD, PhiSpy |
Host GTDB Species Name | References the Host GTDB Species Name to which the NCBI host taxonomy ID of the virus sequence could be mapped. | IMG/VR, PhD, PhiSpy |
Host IMGM Taxon OID | References the Host IMGM Taxon OID from which the virus sequence was extracted by IMG/VR. | IMG/VR, PhD, PhiSpy |
Host NCBI Tax ID | The host sequence NCBI Taxonomy ID. Can be based on the mapping by GTDB of the GTDB Species Name to the NCBI Taxonomy ID. | GTDB |
Host Sequencing Platform | Instrument platform used for host sequencing, multiple are seperated (only for assemblies). | IMG/VR, PhD, PhiSpy |
Host Sex | The sampled hosts sex from the BV-BRC. | BVBRC |
Host Species | The sampled host’s scientific species name. | BVBRC |
ICTV Host Group | The ICTV Host source field content to be combined with other host group data from the other sources. | ICTV |
Analysis
VirJenDB Field Name | Description | Sources |
---|---|---|
Average Depth | Average sequencing depth across the sequence from BVBRC. | BVBRC |
Cluster Reference | The VirJenDB Accession of the representative of its cluster. See documentation for selection details. | VJDB |
Cluster Representative | If the sequence is a representative sequence of a group of sequences, computed by VClust. | VJDB |
GC Content | Percentage of G and C nucleotides in the sequence from BVBRC. | BVBRC |
Pangolin Lineage | Lineage determined by Pangolin. | BVBRC, NCBI Virus |
Platform | Instrument platform used for sequencing, multiple are seperated (only for assemblies). | BVBRC |
Predicted | The tool or database that predicted the sequence. No value means that it is not a predicted sequence. | IMG/VR, PhD, PhiSpy |
Unique Representative | If the sequence is a representative for a group of identical VirJenDB sequences. | VJDB |
Taxonomy
VirJenDB Field Name | Description | Sources |
---|---|---|
ICTV Class | A ‘Class’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Family | A ‘Family’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Genus | A ‘Genus’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Kingdom | A ‘Kingdom’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Lineage Names | Aggregation of the ICTV Taxonomy Names of the virus lineage. | VJDB |
ICTV Order | A ‘Order’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Phylum | A ‘Phylum’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Realm | A ‘Realm’ is the highest taxonomic rank into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Species | A Species is the lowest taxonomic rank in the hierarchy approved by the ICTV. While subspecies levels of classification may exist for some viruses (e.g. Hepatitis C virus), the ICTV does not classify viruses below the species level. Defined by the ICTV. | ICTV |
ICTV Subfamily | A ‘Subfamily’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Subgenus | A ‘Subgenus’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Suborder | A ‘suborder’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
ICTV Subphylum | A ‘Subphylum’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the ICTV. | ICTV |
NCBI Class | A ‘Class’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | BVBRC |
NCBI Family | A ‘Family’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | BVBRC, NCBI Virus |
NCBI Genus | A ‘Genus’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | BVBRC, NCBI Virus |
NCBI Kingdom | A ‘Kingdom’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | BVBRC |
NCBI Lineage IDs | NCBI TaxIDs of the virus’ taxonomy from BVBRC. | BVBRC |
NCBI Lineage Names | Aggregation of the NCBI Taxonomy Names of the virus lineage. | BVBRC |
NCBI Order | A ‘Order’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | BVBRC |
NCBI Phylum | A ‘Phylum’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | BVBRC |
NCBI Species | A Species is the lowest taxonomic rank in the hierarchy approved by the NCBI. While subspecies levels of classification may exist for some viruses (e.g. Hepatitis C virus), the NCBI does not classify viruses below the species level. Defined by the NCBI. | BVBRC, NCBI Virus |
NCBI SpeciesTax ID | The virus species NCBI Taxonomy ID. | BVBRC |
NCBI Subfamily | A ‘Subfamily’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | NCBI Taxonomy |
NCBI Subgenus | A ‘Subgenus’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | NCBI |
NCBI Suborder | A ‘suborder’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | NCBI Taxonomy |
NCBI Subphylum | A ‘Subphylum’ is a rank in the taxonomic hierarchy into which virus species can be classified. Defined by the NCBI. | NCBI Taxonomy |
Internal
VirJenDB Field Name | Description | Sources |
---|---|---|
Unique Reference | The VirJenDB Accession for the unique representative sequence. For the non-phages, the smallest Accession in the group. See documentation for selection details. | VJDB |
old
Collection
-
Collection Country
Origin country of the isolate
NCBI Virus: Country; BV-BRC: Isolation Country -
Collection Year
Collection time point of the isolate
NCBI Virus: Collection_Date; BV-BRC: Collection Date -
Host Name
Name of the host organism
NCBI Virus: Host; BV-BRC: Host Name -
Host Age
Age of the host organism on isolation
NCBI Virus: -; BV-BRC: Host Age -
Collection Tissue
Tissue from which the isolate was collected
NCBI Virus: Isolation_Source; BV-BRC: Isolation Source -
Organization
Organization that sequenced the given entry
NCBI Virus: -; BV-BRC: Sequencing Center -
Submitter
Authors that submitted the given entry
NCBI Virus: Submitters; BV-BRC: - -
Bio Sample
NCBI BioSample accession number
NCBI Virus: BioSample; BV-BRC: BioSample Accession -
Bio Project
NCBI BioSample accession number
NCBI Virus: BioProject; BV-BRC: BioProject Accession -
Release Date
Release date of the entry
NCBI Virus: Release_Date; BV-BRC: Date Inserted -
Sequencing Center
Institute at which the sample was sequenced
NCBI Virus: -; BV-BRC: Sequencing Center -
Assembly Accession
NCBI Assembly accession number
NCBI Virus: -; BV-BRC: AssemblyAccession
Sample
-
Sample ID
VirJenDB sample Identifier of the given entry
NCBI Virus: -; BV-BRC: - -
Completeness
Genome completeness of the entry (complete or partial)
NCBI Virus: Nuc_Completeness; BV-BRC: Genome Status -
BV-BRC ID
BV-BRC Genome Identifier of the given entry
NCBI Virus: -; BV-BRC: Genome ID -
NCBI Accession
NCBI Accession Identifier of the given entry
NCBI Virus: Accession; BV-BRC: Genbank Accession -
Sample Name
Name of the given entry
NCBI Virus: -; BV-BRC: - -
PM ID
PubMed ID of the given entry
NCBI Virus: Publications; BV-BRC: Publication -
Tax ID
NCBI Taxon Identifier of the given entry
NCBI Virus: NCBI TaxID; BV-BRC: NCBI Taxon ID -
GC
Percentage of G’s and C’s in the sequence of the given entry
NCBI Virus: -; BV-BRC: GC Content -
Length
Genome length of the given entry
NCBI Virus: Length; BV-BRC: Size -
Number of Contigs
Number of Contigs in the sequence of the given entry
NCBI Virus: -; BV-BRC: Contigs -
Segment Name
Segment Name if the virus is segmented
NCBI Virus: Segment; BV-BRC: Segment -
SRA Accession
NCBI Sequence Read Archive (SRA) accession number of the given entry
NCBI Virus: SRA_Accession; BV-BRC: SRA Accession -
Molecule Type
Molecular structure of the genome of the given entry (dsDNA, ssDNA, ssDNA+-, ssRNA, ssRNA+-, ssRNA-RT, dsDNA-RT, dsRNA, ssDNA+, ssDNA-, ssRNA+, ssRNA-, unknown)
NCBI Virus: Molecule_Type; BV-BRC: - -
Representative
Is the sequence of of the given entry a reference sequence?
NCBI Virus: Sequence type; BV-BRC: Reference -
Reference
Is the sequence of of the given entry a reference sequence?
NCBI Virus: Seqeunce Type; BV-BRC: - -
Genome Quality
Curation status of the sequence of the given entry (curated, not_curated)
NCBI Virus: -; BV-BRC: Genome Quality
Tree
-
Abbreviation
Abbreviation of the Sample nName of the given entry
NCBI Virus: -; BV-BRC: - -
Species
Virus species name of the given entry
NCBI Virus: Species; BV-BRC: Species -
Variant
Variant information of the given entry
NCBI Virus: -; BV-BRC: various
DBSource
DB Source
Source database of the given entry
NCBI Virus: -; BV-BRC: -