Genotype-Phenotype Datasets

HOME Genotype-Rx Genotype-Pheno Genotype-Clinical hivdb program

Genotype-Phenotype Datasets

Last updated on 2025-10-15

High Quality Filtered Datasets

This genotype-phenotype correlation dataset contains isolates on which in vitro susceptibility tests were performed using the PhenoSense assay (Monogram, South San Francisco, USA) (Zhang 2005). Redundant viruses obtained from the same individual that contained the same pattern of major drug resistance mutations (defined below) were excluded to minimize bias that would result from over-representing highly similar viruses. Viruses with sequences containing mixtures at these major drug resistance mutation positions were also excluded because the presence of mixtures at these positions may confound genotype-phenotype correlations.

PI Major Drug Resistance Positions: 30, 32, 47, 48, 50, 54, 76, 82, 84, 88
NRTI Major Drug Resistance Positions: 41, 65, 70, 74, 75, 151, 184, 210, 215
NNRTI Major Drug Resistance Positions: 100I, 101P, 103N, 106A/M, 181C/I/A, 188C/L/H, 190A/E/S/Q, 230L
INI Major Drug Resistance Positions: 66A/I/K, 92Q, 118R, 143, 148H/R/K, 155H, 263K
CAI Major Drug Resistance Positions: 56I, 57S, 66I, 67H/Y/N/K, 70R/S/N/H, 74D/S/K, 105T/E/S, 107N/C

In addition, the dataset can be read in directly over the web to the R script provided here. This R script contains a function which runs least squares regression on this dataset in cross-validations and generates two output files by default: (1) the coefficient and the standard error of each input mutation estimated from cross-validation runs; (2) the mean square error (MSE) estimated in each cross-validation run. The input parameters and additional options to this R function are documented in the script.

To access high quality filtered datasets from HIVDB by drug class, click the links below:

Drug Class	Data
PI	15585 phenotype results from 2428 isolates
NRTI	12739 phenotype results from 2219 isolates
NNRTI	5731 phenotype results from 2344 isolates
INI	1892 phenotype results from 765 isolates
CAI	phenotype results from isolates

Description of fields in the datasets

Field Name	Description
SeqID	Sequence identifier
Drug Fold	Fold resistance of Drug X compared to the wild type.
P1...Pn	Amino acid at this position. '-' indicates consensus; '.' indicates no sequence; '#' indicates an insertion; '~' indicates a deletion; '*' indicates a stop codon and a letter indicates one letter Amino Acid substitution. Two and more amino acid codes indicates a mixture. The consensus B amino acid sequences can be found here.
CompMutList	Complete list of mutations present in the sequence.

Complete Unfiltered Datasets

To access complete unfiltered datasets from HIVDB by drug class, click the links below:

Drug Class	Data	Method	Number of Isolates
PI	27341 phenotype results from 4512 isolates	PhenoSense	2715
PI	27341 phenotype results from 4512 isolates	Other	1797
NRTI	30331 phenotype results from 5524 isolates	PhenoSense	3370
NRTI	30331 phenotype results from 5524 isolates	Other	2154
NNRTI	11776 phenotype results from 5034 isolates	PhenoSense	2815
NNRTI	11776 phenotype results from 5034 isolates	Other	2219
INI	5694 phenotype results from 2146 isolates	PhenoSense	1169
INI	5694 phenotype results from 2146 isolates	Other	977
CAI	171 phenotype results from 171 isolates	PhenoSense	82
CAI	171 phenotype results from 171 isolates	Other	89

Description of fields in the datasets

Field Name	Description
SeqID	Sequence identifier
PtID	Patient identifier
Subtype	Subtype of sequence
Method	Phenotype method
RefID	Published reference. View References Table
Type	Clinical vs. Lab Isolate. Lab isolates are site directed mutants or results of in vitro passage.
IsolateName	Isolate identifier
SeqType	Complete vs. selective mutations. Complete mutation lists have been reported except for the isolates annotated with 'PartialMutationList' in this column which indicates that authors reported only a subset of mutations present in the isolates. Each specific criteria for reporting muations can be found in the associated publication in the References Table.
Drug Fold	Fold resistance of Drug X compared to the wild type.
P1...Pn	Amino acid at this position. '-' indicates consensus; '.' indicates no sequence; '#' indicates an insertion; '~' indicates a deletion; '*' indicates a stop codon and a letter indicates one letter Amino Acid substitution. Two and more amino acid codes indicates a mixture. The consensus B amino acid sequences can be found here.
CompMutList	Complete list of mutations present or reported (for the isolates annotated with 'PartialMutationList' as for the 'SeqType').

Genotype-Phenotype Datasets

Database

Resources

Team