VirJenDB

Documentation

last update

2025-November-14

VirJenDB

v1.0

This page is up-to-date!

VirJenDB Metadata Categories

The VirJenDB metadata are grouped according to tags: Organizational, Sample, Source, Host, Analysis, Taxonomy, Internal, Workflow. Note that fields can overlap between these categories.

VirJenDB Metadata Fields

Tab-separated files containing the following field descriptions can be downloaded from GitHub in XLSX, JSON, CSV and TXT formats.

Organizational

VirJenDB Field Name Description Sources
Accession GenBank Accession of the virus sequence. It equals NCBI Accession, ENA Accession, and DDBJ Accession, which are part of the INSDC. BVBRC,
NCBI Virus
Assembly Accession NCBI Assembly Accession of the virus sequence. BVBRC,
NCBI Virus
BV-BRC Accession Accession of the sequence in the BV-BRC database. BVBRC
Bioproject Accession References the BioProject Accession that the virus sequence is a part of (INSDC). BVBRC,
NCBI Virus
Biosample Accession References the BioSample Accession that the virus sequence is a part of (INSDC). BVBRC,
NCBI Virus
Release Date Date when the sequence was made public in a repository (e.g. GenBank, BV-BRC). NCBI Virus
SRA Accession NCBI Sequence Read Archive (SRA) Accession. BVBRC,
NCBI Virus
Sequence Source The name of the source database or tool by which the virus sequence was added in VirJenDB; in the case of PhiSpy-predicted prophages, the value is PhiSpy. VJDB
Sequencing Center Institution that sequenced the collected sample. BVBRC
Submitter Name(s) of the submitter(s) of the virus sequence. NCBI Virus
Submitter Organization Organization the submitter(s) is(are) affiliated with. NCBI Virus
VirJenDB Accession Unique identifier of the virus sequence, assigned by VirJenDB. VJDB
VirJenDB Metadata Version Version of the virus sequence record, including metadata, assigned by VirJenDB. VJDB

Sample

VirJenDB Field Name Description Sources
Collection Date Date when the sample the virus sequence originates from was collected or sampled. BVBRC,
NCBI Virus
Collection Geo Location Country name and additional identifier like a state abbreviation from where the sample the sequence originates from was collected/sampled. BVBRC,
NCBI Virus
Collection Source Host Tissue Tissue, Specimen or Source of the superhost from which the sample the virus sequence originates from was collected or sampled. NCBI Virus
Country Country or sea area where the sample the virus sequence originates from was collected or sampled. BVBRC,
NCBI Virus
Genome Completeness Wether the sequence is a partial or complete sequence in NCBI GenBank. NCBI Virus
Genotype The genotype or subtype of a virus sequence, as provided by the sequence submitter. This field comes from the “/serotype” field of the GenBank record and is shown as submitted. Consistency and accuracy may vary. NCBI Virus
Isolate Name of the virus isolate associated with the sample from which the virus sequence was obtained. NCBI Virus
Molecule Type Molecule type of the sequence, e.g. ssRNA. Note that there are 15 molecule types corresponding to combinations of the Baltimore Classifications, from NCBI GenBank. NCBI Virus
NCBI Clade Sub-species designation of the virus sequence, from NCBI GenBank field “clade”. NCBI Taxonomy
NCBI Genotype Sub-species designation of the virus sequence, from NCBI GenBank field “genotype”. NCBI Taxonomy
NCBI Isolate Sub-species designation of the virus sequence, from NCBI GenBank field “isolate”. NCBI Taxonomy
NCBI Realm A ‘Realm’ is the highest taxonomic rank into which virus species can be classified. Defined by the NCBI. NCBI Taxonomy
NCBI Serotype Sub-species designation of the virus sequence, from NCBI GenBank field “serotype”. NCBI Taxonomy
Organism Name NCBI GenBank organism name: a taxonomic name at species level or below the species level. NCBI Virus
Segment Name Name of the virus segment. Can be a number. NCBI Virus
Sequence Length Length of the virus sequence in basepairs (bp). BVBRC,
NCBI Virus
Strain Name of the virus strain associated with the sample from which the virus sequence was obtained. BVBRC
Submitter Country Country of the submitter’s organization. NCBI Virus
Submitter Region Region or location of the organization of the submitters. NCBI Virus

Source

VirJenDB Field Name Description Sources
BV-BRC Sequence Flag indicating if this virus sequence is available in BV-BRC. VJDB
GenBank Sequence Determines if this virus sequence is available in Genbank. NCBI Virus
ICTV Exemplar Flag indicating yes if the sequence is an exemplar of the species in the ICTV Taxonomy. ICTV
RefSeq Sequence Determines if this virus sequence is a reference sequence in the RefSeq database. BVBRC,
NCBI Virus

Host

VirJenDB Field Name Description Sources
BV-BRC Host Age Host Age and Unit from BV-BRC. BVBRC
BV-BRC Host Group The host’s broader group association in a taxonomic context from BV-BRC. BVBRC
BVBRC Host Sex Sex of the sampled host from the BV-BRC. BVBRC
Host Accession Host Accession from NCBI GenBank that refers to the host sequence from which the prophage was predicted. VJDB
Host Age Host age number only from “BVBRC Host Age”. Caution: not yet standardized by unit! BVBRC
Host Assembly Accession Host Assembly Accession of the sequence (INSDC). IMG/VR,
PhD,
PhiSpy
Host Average Sequence Depth Average sequencing depth across the host sequence. Multiple host associations/predictions separated by semicolons. IMG/VR,
PhD,
PhiSpy
Host BioProject Accession References the BioProject Accession associated with the host sequence. IMG/VR,
PhD,
PhiSpy
Host Biosample Accession References the BioSample Accession associated with the host sequence. IMG/VR,
PhD,
PhiSpy
Host Collection Date Date of collection of the sample from which the host sequence originates. IMG/VR,
PhD,
PhiSpy
Host Common Name Common name of the host the virus sequence was collected from, e.g. human. Currently (v1.0) no ontology is applied. BVBRC,
NCBI Virus
Host Country Country or sea area origin of the sample from which the host sequence originates. IMG/VR,
PhD,
PhiSpy
Host GTDB Species Name References the Host GTDB Species Name to which the NCBI host taxonomy ID of the virus sequence could be mapped. See the mapping file on the Datasets page. IMG/VR,
PhD,
PhiSpy
Host IMGM Taxon OID Associated Host IMGM Taxon OID from which the virus sequence was extracted by IMG/VR. IMG/VR,
PhD,
PhiSpy
Host NCBI Tax ID The host sequence NCBI Taxonomy ID. Can be based on the mapping by GTDB of the GTDB Species Name to the NCBI Taxonomy ID. GTDB
Host Species The scientific species name of the sampled host. BVBRC
ICTV Host Group The ICTV Host source field content to be combined with other host group data from the other sources. ICTV

Analysis

VirJenDB Field Name Description Sources
Average Depth Average sequencing depth across the sequence from BVBRC. BVBRC
Cluster Reference The VirJenDB ID of the representative of its cluster. See documentation for selection details. VJDB
Cluster Representative Flag indicating yes if the sequence is a representative sequence of a group of sequences, computed by VClust. See documentation for selection details. VJDB
GC Content Percentage of G and C nucleotides in the sequence from BVBRC. BVBRC
Pangolin Lineage Lineage determined by Pangolin. BVBRC,
NCBI Virus
Platform Instrument platform used for sequencing the virus; multiple values are seperated by a semicolon (only for assemblies). BVBRC
Predicted Method The tool used to predict the virus sequence. No value means that it is not a predicted virus sequence. IMG/VR,
PhD,
PhiSpy
Unique Representative Flag indicating yes if the sequence is a representative for a group of identical VirJenDB sequences. VJDB

Taxonomy

VirJenDB Field Name Description Sources
ICTV Class A ‘Class’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Family A ‘Family’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Genus A ‘Genus’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Kingdom A ‘Kingdom’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Lineage Names Aggregation of the ICTV Taxonomy Names associated with the virus sequence. VJDB
ICTV Order A ‘Order’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Phylum A ‘Phylum’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Realm A ‘Realm’ is the highest taxonomic rank into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Species A Species is the lowest taxonomic rank in the hierarchy approved by the ICTV. While subspecies levels of classification may exist for some viruses (e.g. Hepatitis C virus), the ICTV does not classify viruses below the species level. Defined by the ICTV. ICTV
ICTV Subfamily A ‘Subfamily’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Subgenus A ‘Subgenus’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Suborder A ‘suborder’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Subphylum A ‘Subphylum’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
NCBI Class A ‘Class’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC
NCBI Family A ‘Family’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC,
NCBI Virus
NCBI Genus A ‘Genus’ is a rank in the taxonomic hierarchy into which virus a sequence can be classified. Defined by the NCBI. BVBRC,
NCBI Virus
NCBI Kingdom A ‘Kingdom’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC
NCBI Lineage IDs NCBI TaxIDs assigned to the virus sequence, from BVBRC. BVBRC
NCBI Lineage Names Aggregation of the NCBI Taxonomy Names associated with the virus sequence. BVBRC
NCBI Order A ‘Order’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC
NCBI Phylum A ‘Phylum’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC
NCBI Species A Species is the lowest taxonomic rank in the hierarchy approved by the NCBI. While subspecies levels of classification may exist for some viruses (e.g. Hepatitis C virus), the NCBI does not classify viruses below the species level. Defined by the NCBI. BVBRC,
NCBI Virus
NCBI SpeciesTax ID The virus species NCBI Taxonomy ID. BVBRC
NCBI Subfamily A ‘Subfamily’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. NCBI Taxonomy
NCBI Subgenus A ‘Subgenus’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. NCBI 
NCBI Suborder A ‘suborder’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. NCBI Taxonomy
NCBI Subphylum A ‘Subphylum’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. NCBI Taxonomy

Internal

VirJenDB Field Name Description Sources
Unique Reference The VirJenDB ID for the unique representative sequence. For the non-phages, the smallest Accession in the group. See documentation for selection details. VJDB

Workflows

VirJenDB Field Name Description Sources
Host Association Method Method by which the fields “Host Accession ID” from NCBI GenBank or “NCBI Taxonomy Lineage” was associated to the virus sequence. Currently (v1.0) either a host association from the data source or PhiSpy prophage prediction. Multiple host associations/predictions separated by semicolons. VJDB
Host Sequencing Platform Instrument or sequencing platform used for host sequencing. Multiple host associations/predictions separated by semicolons. IMG/VR,
PhD,
PhiSpy

Metadata Field Mappings

Organizational

VirJenDB Field Name Description Sources
Accession GenBank Accession of the virus sequence. It equals NCBI Accession, ENA Accession, and DDBJ Accession, which are part of the INSDC. BVBRC,
NCBI Virus
Assembly Accession NCBI Assembly Accession of the virus sequence. BVBRC,
NCBI Virus
BV-BRC Accession Accession of the sequence in the BV-BRC database. BVBRC
Bioproject Accession References the BioProject Accession that the virus sequence is a part of (INSDC). BVBRC,
NCBI Virus
Biosample Accession References the BioSample Accession that the virus sequence is a part of (INSDC). BVBRC,
NCBI Virus
Release Date Date when the sequence was made public in a repository (e.g. GenBank, BV-BRC). NCBI Virus
SRA Accession NCBI Sequence Read Archive (SRA) Accession. BVBRC,
NCBI Virus
Sequence Source The name of the source database or tool by which the virus sequence was added in VirJenDB; in the case of PhiSpy-predicted prophages, the value is PhiSpy. VJDB
Sequencing Center Institution that sequenced the collected sample. BVBRC
Submitter Name(s) of the submitter(s) of the virus sequence. NCBI Virus
Submitter Organization Organization the submitter(s) is(are) affiliated with. NCBI Virus
VirJenDB Accession Unique identifier of the virus sequence, assigned by VirJenDB. VJDB
VirJenDB Metadata Version Version of the virus sequence record, including metadata, assigned by VirJenDB. VJDB

Sample

VirJenDB Field Name Description Sources
Collection Date Date when the sample the virus sequence originates from was collected or sampled. BVBRC,
NCBI Virus
Collection Geo Location Country name and additional identifier like a state abbreviation from where the sample the sequence originates from was collected/sampled. BVBRC,
NCBI Virus
Collection Source Host Tissue Tissue, Specimen or Source of the superhost from which the sample the virus sequence originates from was collected or sampled. NCBI Virus
Country Country or sea area where the sample the virus sequence originates from was collected or sampled. BVBRC,
NCBI Virus
Genome Completeness Wether the sequence is a partial or complete sequence in NCBI GenBank. NCBI Virus
Genotype The genotype or subtype of a virus sequence, as provided by the sequence submitter. This field comes from the “/serotype” field of the GenBank record and is shown as submitted. Consistency and accuracy may vary. NCBI Virus
Isolate Name of the virus isolate associated with the sample from which the virus sequence was obtained. NCBI Virus
Molecule Type Molecule type of the sequence, e.g. ssRNA. Note that there are 15 molecule types corresponding to combinations of the Baltimore Classifications, from NCBI GenBank. NCBI Virus
NCBI Clade Sub-species designation of the virus sequence, from NCBI GenBank field “clade”. NCBI Taxonomy
NCBI Genotype Sub-species designation of the virus sequence, from NCBI GenBank field “genotype”. NCBI Taxonomy
NCBI Isolate Sub-species designation of the virus sequence, from NCBI GenBank field “isolate”. NCBI Taxonomy
NCBI Realm A ‘Realm’ is the highest taxonomic rank into which virus species can be classified. Defined by the NCBI. NCBI Taxonomy
NCBI Serotype Sub-species designation of the virus sequence, from NCBI GenBank field “serotype”. NCBI Taxonomy
Organism Name NCBI GenBank organism name: a taxonomic name at species level or below the species level. NCBI Virus
Segment Name Name of the virus segment. Can be a number. NCBI Virus
Sequence Length Length of the virus sequence in basepairs (bp). BVBRC,
NCBI Virus
Strain Name of the virus strain associated with the sample from which the virus sequence was obtained. BVBRC
Submitter Country Country of the submitter’s organization. NCBI Virus
Submitter Region Region or location of the organization of the submitters. NCBI Virus

Source

VirJenDB Field Name Description Sources
BV-BRC Sequence Flag indicating if this virus sequence is available in BV-BRC. VJDB
GenBank Sequence Determines if this virus sequence is available in Genbank. NCBI Virus
ICTV Exemplar Flag indicating yes if the sequence is an exemplar of the species in the ICTV Taxonomy. ICTV
RefSeq Sequence Determines if this virus sequence is a reference sequence in the RefSeq database. BVBRC,
NCBI Virus

Host

VirJenDB Field Name Description Sources
BV-BRC Host Age Host Age and Unit from BV-BRC. BVBRC
BV-BRC Host Group The host’s broader group association in a taxonomic context from BV-BRC. BVBRC
BVBRC Host Sex Sex of the sampled host from the BV-BRC. BVBRC
Host Accession Host Accession from NCBI GenBank that refers to the host sequence from which the prophage was predicted. VJDB
Host Age Host age number only from “BVBRC Host Age”. Caution: not yet standardized by unit! BVBRC
Host Assembly Accession Host Assembly Accession of the sequence (INSDC). IMG/VR,
PhD,
PhiSpy
Host Average Sequence Depth Average sequencing depth across the host sequence. Multiple host associations/predictions separated by semicolons. IMG/VR,
PhD,
PhiSpy
Host BioProject Accession References the BioProject Accession associated with the host sequence. IMG/VR,
PhD,
PhiSpy
Host Biosample Accession References the BioSample Accession associated with the host sequence. IMG/VR,
PhD,
PhiSpy
Host Collection Date Date of collection of the sample from which the host sequence originates. IMG/VR,
PhD,
PhiSpy
Host Common Name Common name of the host the virus sequence was collected from, e.g. human. Currently (v1.0) no ontology is applied. BVBRC,
NCBI Virus
Host Country Country or sea area origin of the sample from which the host sequence originates. IMG/VR,
PhD,
PhiSpy
Host GTDB Species Name References the Host GTDB Species Name to which the NCBI host taxonomy ID of the virus sequence could be mapped. See the mapping file on the Datasets page. IMG/VR,
PhD,
PhiSpy
Host IMGM Taxon OID Associated Host IMGM Taxon OID from which the virus sequence was extracted by IMG/VR. IMG/VR,
PhD,
PhiSpy
Host NCBI Tax ID The host sequence NCBI Taxonomy ID. Can be based on the mapping by GTDB of the GTDB Species Name to the NCBI Taxonomy ID. GTDB
Host Species The scientific species name of the sampled host. BVBRC
ICTV Host Group The ICTV Host source field content to be combined with other host group data from the other sources. ICTV

Analysis

VirJenDB Field Name Description Sources
Average Depth Average sequencing depth across the sequence from BVBRC. BVBRC
Cluster Reference The VirJenDB ID of the representative of its cluster. See documentation for selection details. VJDB
Cluster Representative Flag indicating yes if the sequence is a representative sequence of a group of sequences, computed by VClust. See documentation for selection details. VJDB
GC Content Percentage of G and C nucleotides in the sequence from BVBRC. BVBRC
Pangolin Lineage Lineage determined by Pangolin. BVBRC,
NCBI Virus
Platform Instrument platform used for sequencing the virus; multiple values are seperated by a semicolon (only for assemblies). BVBRC
Predicted Method The tool used to predict the virus sequence. No value means that it is not a predicted virus sequence. IMG/VR,
PhD,
PhiSpy
Unique Representative Flag indicating yes if the sequence is a representative for a group of identical VirJenDB sequences. VJDB

Taxonomy

VirJenDB Field Name Description Sources
ICTV Class A ‘Class’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Family A ‘Family’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Genus A ‘Genus’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Kingdom A ‘Kingdom’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Lineage Names Aggregation of the ICTV Taxonomy Names associated with the virus sequence. VJDB
ICTV Order A ‘Order’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Phylum A ‘Phylum’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Realm A ‘Realm’ is the highest taxonomic rank into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Species A Species is the lowest taxonomic rank in the hierarchy approved by the ICTV. While subspecies levels of classification may exist for some viruses (e.g. Hepatitis C virus), the ICTV does not classify viruses below the species level. Defined by the ICTV. ICTV
ICTV Subfamily A ‘Subfamily’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Subgenus A ‘Subgenus’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Suborder A ‘suborder’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
ICTV Subphylum A ‘Subphylum’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the ICTV. ICTV
NCBI Class A ‘Class’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC
NCBI Family A ‘Family’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC,
NCBI Virus
NCBI Genus A ‘Genus’ is a rank in the taxonomic hierarchy into which virus a sequence can be classified. Defined by the NCBI. BVBRC,
NCBI Virus
NCBI Kingdom A ‘Kingdom’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC
NCBI Lineage IDs NCBI TaxIDs assigned to the virus sequence, from BVBRC. BVBRC
NCBI Lineage Names Aggregation of the NCBI Taxonomy Names associated with the virus sequence. BVBRC
NCBI Order A ‘Order’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC
NCBI Phylum A ‘Phylum’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. BVBRC
NCBI Species A Species is the lowest taxonomic rank in the hierarchy approved by the NCBI. While subspecies levels of classification may exist for some viruses (e.g. Hepatitis C virus), the NCBI does not classify viruses below the species level. Defined by the NCBI. BVBRC,
NCBI Virus
NCBI SpeciesTax ID The virus species NCBI Taxonomy ID. BVBRC
NCBI Subfamily A ‘Subfamily’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. NCBI Taxonomy
NCBI Subgenus A ‘Subgenus’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. NCBI 
NCBI Suborder A ‘suborder’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. NCBI Taxonomy
NCBI Subphylum A ‘Subphylum’ is a rank in the taxonomic hierarchy into which a virus sequence can be classified. Defined by the NCBI. NCBI Taxonomy

Internal

VirJenDB Field Name Description Sources
Unique Reference The VirJenDB ID for the unique representative sequence. For the non-phages, the smallest Accession in the group. See documentation for selection details. VJDB

Workflows

VirJenDB Field Name Description Sources
Host Association Method Method by which the fields “Host Accession ID” from NCBI GenBank or “NCBI Taxonomy Lineage” was associated to the virus sequence. Currently (v1.0) either a host association from the data source or PhiSpy prophage prediction. Multiple host associations/predictions separated by semicolons. VJDB
Host Sequencing Platform Instrument or sequencing platform used for host sequencing. Multiple host associations/predictions separated by semicolons. IMG/VR,
PhD,
PhiSpy

Table of Contents