Stanford University HIV Drug Resistance Database - A curated public database designed to represent, store, and analyze the divergent forms of data underlying HIV drug resistance.

Genotype-Phenotype Datasets

Last updated on 2025-10-15

High Quality Filtered Datasets
This genotype-phenotype correlation dataset contains isolates on which in vitro susceptibility tests were performed using the PhenoSense assay (Monogram, South San Francisco, USA) (Zhang 2005). Redundant viruses obtained from the same individual that contained the same pattern of major drug resistance mutations (defined below) were excluded to minimize bias that would result from over-representing highly similar viruses. Viruses with sequences containing mixtures at these major drug resistance mutation positions were also excluded because the presence of mixtures at these positions may confound genotype-phenotype correlations.

PI Major Drug Resistance Positions: 30, 32, 47, 48, 50, 54, 76, 82, 84, 88
NRTI Major Drug Resistance Positions: 41, 65, 70, 74, 75, 151, 184, 210, 215
NNRTI Major Drug Resistance Positions: 100I, 101P, 103N, 106A/M, 181C/I/A, 188C/L/H, 190A/E/S/Q, 230L
INI Major Drug Resistance Positions: 66A/I/K, 92Q, 118R, 143, 148H/R/K, 155H, 263K
CAI Major Drug Resistance Positions: 56I, 57S, 66I, 67H/Y/N/K, 70R/S/N/H, 74D/S/K, 105T/E/S, 107N/C

In addition, the dataset can be read in directly over the web to the R script provided here. This R script contains a function which runs least squares regression on this dataset in cross-validations and generates two output files by default: (1) the coefficient and the standard error of each input mutation estimated from cross-validation runs; (2) the mean square error (MSE) estimated in each cross-validation run. The input parameters and additional options to this R function are documented in the script.

To access high quality filtered datasets from HIVDB by drug class, click the links below:

Drug Class Data
PI 15585 phenotype results from 2428 isolates
NRTI 12739 phenotype results from 2219 isolates
NNRTI 5731 phenotype results from 2344 isolates
INI 1892 phenotype results from 765 isolates
CAI phenotype results from isolates

Description of fields in the datasets

Field NameDescription
SeqIDSequence identifier
Drug FoldFold resistance of Drug X compared to the wild type.
P1...PnAmino acid at this position. '-' indicates consensus; '.' indicates no sequence; '#' indicates an insertion; '~' indicates a deletion; '*' indicates a stop codon and a letter indicates one letter Amino Acid substitution. Two and more amino acid codes indicates a mixture. The consensus B amino acid sequences can be found here.
CompMutListComplete list of mutations present in the sequence.

Complete Unfiltered Datasets
To access complete unfiltered datasets from HIVDB by drug class, click the links below:

Drug Class Data Method Number of Isolates
PI 27341 phenotype results from 4512 isolates PhenoSense2715
Other1797
NRTI 30331 phenotype results from 5524 isolates PhenoSense3370
Other2154
NNRTI 11776 phenotype results from 5034 isolates PhenoSense2815
Other2219
INI 5694 phenotype results from 2146 isolates PhenoSense1169
Other977
CAI 171 phenotype results from 171 isolates PhenoSense82
Other89

Description of fields in the datasets

Field NameDescription
SeqIDSequence identifier
PtIDPatient identifier
SubtypeSubtype of sequence
MethodPhenotype method
RefIDPublished reference. View References Table
TypeClinical vs. Lab Isolate. Lab isolates are site directed mutants or results of in vitro passage.
IsolateNameIsolate identifier
SeqTypeComplete vs. selective mutations. Complete mutation lists have been reported except for the isolates annotated with 'PartialMutationList' in this column which indicates that authors reported only a subset of mutations present in the isolates. Each specific criteria for reporting muations can be found in the associated publication in the References Table.
Drug FoldFold resistance of Drug X compared to the wild type.
P1...PnAmino acid at this position. '-' indicates consensus; '.' indicates no sequence; '#' indicates an insertion; '~' indicates a deletion; '*' indicates a stop codon and a letter indicates one letter Amino Acid substitution. Two and more amino acid codes indicates a mixture. The consensus B amino acid sequences can be found here.
CompMutListComplete list of mutations present or reported (for the isolates annotated with 'PartialMutationList' as for the 'SeqType').