last update
2026-May-05
VirJenDB
v1.0
VirJenDB API
The VirJenDB API provides programmatic access to search, retrieve, and download viral genome data, and is available through the Swagger UI at https://api2.virjendb.org/swagger.
Overview of Endpoints
| Endpoint | Method | Description |
|---|---|---|
/v2/search |
POST | Search the VirJenDB dataset |
/v2/download |
POST | Download search results or selected records |
/v2/sequence |
POST | Retrieve sequences by accession |
/v2/datasets_download |
GET | Download precomputed datasets |
/v2/metadata |
POST | Query metadata fields |
/v2/metadata_download |
POST | Download metadata field definitions |
1. Search Endpoint
POST /v2/search
Search the VirJenDB dataset using structured JSON queries.
Request Body
{
"items": [
{
"operator": "or",
"metadataField": "string",
"searchTerm": "string"
}
],
"filter": [
{ "field": "", "value": "" }
],
"pageSize": 50,
"pageNumber": 1,
"sortColumn": "string",
"sortOrder": "asc"
}
Fields
| Field | Description |
|---|---|
items |
Array of search clauses. Each item contains operator, metadataField, and searchTerm. |
filter |
Additional filters in { field, value } form. |
pageSize |
Number of results per page. The API documents a maximum of 10,000. |
pageNumber |
Page number to return. |
sortColumn |
Field used to sort the results. |
sortOrder |
Sort direction: asc or desc. |
Notes
- The Swagger UI exposes
operatorandsortOrderas query parameters as well as in the JSON body examples. The request body shown above matches the documented example shape. - Keep search terms non-empty for meaningful results.
Responses
| Code | Description |
|---|---|
| 200 | Successful response |
| 422 | Validation error |
Example Request
Search for records where the metadata field Host Species matches Escherichia coli.
{
"items": [
{
"operator": "",
"metadataField": "Host Species",
"searchTerm": "Escherichia coli"
}
],
"filter": [],
"pageSize": 50,
"pageNumber": 1,
"sortColumn": "",
"sortOrder": "asc"
}
2. Download Endpoint
POST /v2/download
Download data based on search results or selected accession IDs.
Supported Data Types
- Metadata
- VirJenDB accession IDs
- Sequences
Supported File Types
csvtsvjsonxmlfasta(for sequence downloads)
Request Body
The download request contains two top-level parts:
download: controls what is exportedsearch: defines which records are included whendownload.scopeismax
Download Object
| Field | Description |
|---|---|
download.scope |
Controls which records are included in the download. |
download.information |
Controls which data is exported. |
download.fileType |
Output file type. |
download.metadata |
Optional metadata fields to include. |
download.selected_ids |
Accession IDs used when scope is selection. |
Download Scope (download.scope)
| Value | Description |
|---|---|
selection |
Download specific records listed in download.selected_ids. |
max |
Download records that match the search object. |
all |
Listed in Swagger, but not described in detail there. Treat this as undocumented unless the backend confirms its behavior. |
selection
- Requires
download.selected_ids - The API documentation says a maximum of 10,000 IDs is supported in a single request
- If more than 10,000 IDs are supplied, only the first 10,000 are processed
max
- Uses the
searchobject - Supports pagination with
search.pageSizeandsearch.pageNumber - The API documentation indicates Elasticsearch
search_afteris used for deep pagination pageSizeshould not exceed 10,000
Information Type (download.information)
| Value | Description |
|---|---|
metadata |
Export metadata fields for each record |
virjendb_accessions |
Export only VirJenDB accession IDs |
sequences |
Export sequence data in FASTA format |
Details
metadata
- Use
download.metadatato choose which metadata fields are included - If
download.metadatais empty, all public metadata fields are included
virjendb_accessions
- Produces a lightweight list of accession IDs
sequences
- Produces FASTA output
- Metadata fields from
download.metadatamay be included in the FASTA header - The accession appears first in the header, followed by additional fields separated by
|
File Type (download.fileType)
| Type | Description |
|---|---|
csv |
Comma-separated values |
tsv |
Tab-separated values |
json |
JSON output |
xml |
XML output |
fasta |
FASTA output, only for sequence downloads |
Search Object in Downloads
When download.scope is max, the search object follows the same structure as the /v2/search endpoint.
It supports:
- full-text search
- filtering
- sorting
- pagination
Important Notes
- VirJenDB accession IDs use the prefix
vjfollowed by digits, for examplevj000000000010 - If no results are found, the API returns
404 Not Found - The API documentation lists
400,404,422, and500as possible error responses for download requests - The Swagger UI notes that some file-type and parameter combinations may be rejected by the backend with
400 Bad Request
Errors
| Code | Meaning |
|---|---|
| 400 | Bad request, invalid parameters, or unsupported combination |
| 404 | No results found |
| 422 | Validation error |
| 500 | Internal server error |
Example Request
Download sequences for records where Host Species = Escherichia coli.
{
"download": {
"scope": "max",
"information": "sequences",
"fileType": "fasta",
"metadata": ["Host Species", "Molecule Type"]
},
"search": {
"items": [
{
"metadataField": "Host Species",
"searchTerm": "Escherichia coli"
}
],
"filter": [],
"pageSize": 50,
"pageNumber": 1,
"sortColumn": "",
"sortOrder": "asc"
}
}
3. Sequence Endpoint
POST /v2/sequence
Retrieve genome sequences using VirJenDB accession IDs.
Request Body
{
"virjendb_accessions": [
"vj000000000010",
"vj000005747069"
]
}
Notes
- The Swagger UI describes this endpoint as a sequence download endpoint
- The response schema shown in Swagger is generic, so the exact response shape should be verified against the live API if strict parsing is needed
4. Dataset Download
GET /v2/datasets_download
Download pre-generated VirJenDB dataset files.
Query Parameter
| Name | Description |
|---|---|
filename |
Dataset file to download |
Available Filenames
ncbi_to_gtdb_map.csv.gzvirjendbv1_full_dataset_sequence.csv.gzvirjendbv1_unique_seqs_sequence.csv.gzvirjendbv1_votu_clusters_metadata.csv.gzncbi_to_gtdb_map.sha256virjendbv1_full_dataset_sequence.sha256virjendbv1_unique_seqs_sequence.sha256virjendbv1_votu_clusters_metadata.sha256virjendbv1_full_dataset_metadata.csv.gzvirjendbv1_unique_seqs_metadata.csv.gzvirjendbv1_votu_cluster_comparison_metrics.csv.gzvirjendbv1_votu_clusters_sequence.csv.gzvirjendbv1_full_dataset_metadata.sha256virjendbv1_unique_seqs_metadata.sha256virjendbv1_votu_cluster_comparison_metrics.sha256virjendbv1_votu_clusters_sequence.sha256
Example
filename=virjendbv1_full_dataset_sequence.csv.gz
5. Metadata Endpoints
Query Metadata Fields
POST /v2/metadata
Filter and retrieve metadata field definitions.
Request Body
{
"field_name": [],
"privacy": [],
"tags": [],
"submission_requiredness": [],
"submission_fieldtype": [],
"submission_validation": [],
"match_mode": "and",
"summary": false
}
Notes
- This endpoint appears to support filtering by multiple metadata properties
- The Swagger UI does not fully explain the meaning of every field, so field-level behavior should be treated as schema-driven rather than inferred
Download Metadata Fields
POST /v2/metadata_download
Download metadata definitions as a file.
Request Body
{
"field_name": [],
"file_type": "csv"
}
Notes
- Use this endpoint to export metadata field definitions
- The exact output shape depends on the server-side implementation and selected file type
dditional Resources
- Swagger UI: https://swagger.io/tools/swagger-ui
- REST API basics: https://www.freecodecamp.org/news/how-to-use-rest-api