Comments and suggestions to HIVDB
This database is maintained by Stanford University as a benefit to the research and education community. This website is provided on an "as is" basis only and without warranty or representation, whether express or implied, (including warranties of merchantability and fitness for a particular purpose) as to its accuracy or reliability. Stanford University and its trustees, officers and employees are neither responsible for nor accept any liability for any direct or indirect loss or damages arising from or connected to the use of this website. The information provided on this website is intended for research and educational purposes and is not intended to substitute for care by a licensed healthcare professional.
II. Common FAQs
What is the purpose of the HIV Drug Resistance Database (HIVDB)?
HIV drug resistance data are critical for HIV drug resistance surveillance, ARV drug design, and the management of persons infected with drug-resistant HIV. These data are best represented in a database that not only catalogs mutations associated with drug resistance but also links complete genetic sequences to other forms of data.
What are the main functions of HIVDB?
(1) To store, analyze and make available the diverse forms of data underlying drug resistance knowledge to the broad community of researchers and clinicians studying HIV drug resistance and using HIV drug resistance tests; (2) To provide a publicly available online resource to help those performing HIV drug resistance surveillance, interpreting HIV drug resistance tests, and developing new antiretroviral drugs; (3) To identify gaps in drug resistance knowledge that could be filled by retrospective or prospective studies.
What types of data in HIVDB?
HIVDB collects the three fundamental types of correlations that form the basis of drug resistance knowledge: (1) Correlations between genotypic data with the treatments of persons from whom sequenced HIV-1 isolates have been obtained (genotype-treatment); (2) Correlations between genotype and in vitro drug susceptibility (genotype-phenotype); and (3) Correlations between genotype and the clinical response to a new treatment regimen (genotype-outcome). The particular advantages of each type of data are outlined in previous publications (Shafer 2006). The three HIV targets currently in HIVDB are Protease (PR), Reverse Transcriptase (RT), and Integrase (IN).
How much data is in HIVDB?
HIVDB contains more than 90,000 PR, RT, or IN sequences from more than 80,000 distinct virus isolations obtained from nearly 40,000 individuals. 98%-99% of the viruses are human HIV-1 isolates; 1%-2% of viruses are human HIV-2 isolates or other non-human primate lentiviruses (NHPL).
Where is the HIVDB data from?
The data have been obtained from more than 900 literature and GenBank references. About 15% of the data have been obtained from published research performed at Stanford University whereas the remainder is from published papers and/or GenBank. In some cases, a published paper or a GenBank submission contains sufficient data for entry into HIVDB. In other cases, data within a paper must be linked on a sequence by sequence basis to establish the correlations between a sequence and the ARV treatment of the person from whom the sequenced virus were obtained. In a large proportion of cases, contributors to the database have provided essential information directly to the HIVDB staff to make it possible to add their data to the database.
How is HIVDB funded?
HIVDB has been funded primarily by the NIH including the NIAID and the NIGMS. Each of the pharmaceutical companies that have manufactured ARV drug has also provided one or more unrestricted educational grant. Additional funders have included the California University-wide AIDS Research Program (2004-2005), the Stanford University Bio-X Interdisciplinary Initiative (2000-2001), and several diagnostic and biotech companies, particularly Celera. A complete list of the funds received by the database can be found here.
HIVDB and its programs have been described in several publications (Betts and Shafer 2003; Kantor et al. 2001; Kuiken et al. 2003; Liu and Shafer 2006; Rhee et al. 2003; Rhee et al. 2006a; Shafer 2006; Shafer et al. 2000a; Shafer et al. 2000b; Shafer et al. 1999). However, recommended that the most recent general review by Rhee et al in 2003 be cited: Rhee, S. Y., M. J. Gonzales, R. Kantor, B. J. Betts, J. Ravela, and R. W. Shafer. 2003. Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res 31:298-303. Although the database publications have been cited about 400 times, most papers that refer to the database do not cite a publication. Instead, the authors refer to the "Stanford database" or provide its URL.
How does one contribute to HIVDB?
The criteria for publication in the database include: (1) Sequence data - nucleic acid sequences are preferred but in rare instances we have accepted amino acid sequence data and (2) Correlated data consisting of either treatment histories, phenotypic test results, and/or virologic outcome data following the change to a new treatment regimen. In addition, we strongly prefer data that has been published in a peer-review journal and that has been submitted to GenBank for two main reasons. First data that have been published are generally of higher quality than unpublished data. Second, our database operates on the principle that once data have been published, it belongs in the public domain so that others can verify and build on the results. Indeed, we have been inspired by the quotation "If I have seen further it is by standing on the shoulders of giants" (Isaac Newton 1676). The long-term success of HIVDB and other biomedical databases depends on how many medical researchers are willing to be giants.
Why is HIVDB publicly available?
Data is the most important commodity in science and its management is of critical importance to the discovery of new knowledge. An HIV drug resistance database that provides unfettered access to the types of data described in the preceding paragraphs must be publicly available to the broadest number of users to promote discovery in the most efficient manner. Proprietary databases that deny access to the majority of researchers are not only inefficient but also counterproductive because the company or small group of researchers with a stake in such a database will often act to thwart the nonproprietary dissemination of data in order to maintain the perceived commercial or research value of their monopoly. Conclusions drawn from a proprietary database may also be unduly influenced by the interests of those who submit and retrieve the data.
III. Specific FAQs about HIVDB and its Programs
What is meant by an HIV-1 mutation?
The study of viral resistance requires a standard numbering system and a standard amino acid reference sequence. Mutations are defined as any difference from the consensus B amino acid sequence. Links to the consensus B amino acid sequences for protease, RT, and integrase can be found here.
Why is the subtype B consensus sequence used rather than a particular isolate, the consensus sequence for one of the more common subtypes (e.g. subtype C), or the consensus sequence for group M?
This is a commonly adopted convention that dates back to the late 1980s and early 1990s when subtype B viruses were the most common viruses in the U.S. and Europe. Using a different subtype of a group M consensus would cause too much confusion at this time. Using a particular isolate would also be confusing as nearly every common laboratory isolate has one or more unusual mutations that would always need to be noted. For example, the common laboratory strain HXB2 has a rare I3V mutation in protease. Therefore, if HXB2 were used for the consensus, nearly every sequence would have a V3I mutation in protease.
How does HIVDB indicate mutations?
Mutations are indicated by a shorthand consisting of the consensus B amino acid followed by the amino acid position followed by the amino acid detected in a sequence. For example, the RT mutation T215Y indicates that the consensus amino acid is T (Thr; threonine) but that the mutation found in the sequence of interest is Y (Tyr; tyrosine). The expression T215TY indicates that there is a mixture of viruses with the wildtype T and the mutation Y in the same sequence. Mixtures consisting of two amino acids at the same position in a sequence are common and generally represent true mixtures rather than sequence artifact (Shafer et al. 2001; Wang et al. 2007). Rarely, a sequence will contain a mixture of three or four amino acids at the same position. For example, the change from T to Y at position 215 requires two nucleotide changes (e.g. ACC => TAC). Rarely, a sequence will detect WMC (where W indicates the IUB ambiguity code for A and T and M indicates the ambiguity code for A and C). WMC can be translated in four ways resulting in the mutation T215TSNY. Sequences that are highly ambiguous (e.g. result in a translation consisting of more than 4 amino acids) are indicated by an "X" in HIVDB.
The above shorthand for mutations can be confusing because it is also used in the literature to indicate one or more mutations rather than the simultaneous presence of all of the indicated mutations. For example, an author may write (or one of our didactic sections may contain a sentence such as "V82A/T/F/S/L/M (or just as commonly V82ATFSLM) are associated with decreased susceptibility to one or more protease inhibitors."
How does HIVDB indicate insertions and deletions (indels)?
Insertions and deletions in the enzymatic targets of HIV therapy are not common. Yet they do occur and some are associated with decreased susceptibility. In the HIVdb program the presence of one or more amino acid insertions is indicated by the lower case letters "ins" whereas amino acids are always indicated by single upper-case letters; the presence of one or more deletions are indicated by the lower case letters "del". In amino acid alignments, we have chosen to indicate insertions using the symbol "#" and deletions using the symbol "~". Because these are not standard representations, an explanation for these symbols are provided wherever they are used.
Although they are not common, we have been asked many questions about indels such as how is their location determined, does the size of the insertion or deletion influence drug susceptibility, does the particular amino acid or amino acids inserted influence drug susceptibility. These questions will be addressed as the FAQs for HIVDB and its associated programs are expanded.
How are drug-resistance mutations (DRMs) defined?
The association between a mutation and drug resistance is based on three types of correlations around which much of the database is organized: genotype-treatment correlations, genotype-phenotype correlations, and genotype-clinical outcome correlations. Genotype-treatment correlations indicate whether a mutation is selected by ARV drug therapy in vitro and/or in vivo. Genotype-phenotype correlations indicate whether a mutation reduces or contributes to reduced drug susceptibility in vitro. Genotype-phenotype correlations may be based on the susceptibility testing of laboratory isolates with site-directed mutants. More commonly, however, they are based on the susceptibility testing of clinical isolates containing multiple mutations. The association between mutations are reduced drug susceptibility will then often require statistical analyses because most drug-resistant isolates contain multiple DRMs (Rhee et al. 2006b). Genotype-clinical outcome correlations represent statistical associations between the presence of a mutation prior to the start of a new antiretroviral treatment regimen. These associations, however, must be controlled by many factors such as the past treatment of a patient, the baseline virus level, and the multiple drugs used in most salvage therapy regimens.
The literature on HIV DRMs, however, is vast and many mutations have been linked to drug resistance by one of the three criteria described in the preceding paragraph. The strength of supporting evidence for different mutations is also highly variable. Some mutations fulfill all three criteria solidly; others fulfill only one or two criteria. The association of other mutations with ARV treatment, reduced in vitro susceptibility, or decreased response to ARV therapy is supported only by the flimsiest of statistical evidence. The need for adequate data is one of the primary rationales for this database. However, when such data have been lacking, it is necessary to weigh the evidence supporting the role of candidate DRMs.
Why are some DRMs called "Major" and others called "Minor"?
So many mutations are associated with decreased HIV-1 susceptibility by the criteria described above that it has become common to sub-classify the DRMs associated with each of the different drug classes. The most common sub-classifications divide DRMs into "Major" vs "Minor" or "Primary" vs "Secondary". Specific criteria for these sub-classifications have never been established and existing classification schemes have been developed on an ad hoc basis. The following characteristics of a DRM influenced its classification:
(1) Effect on in vitro drug susceptibility - mutations that by themselves reduce susceptibility to one or more drugs are generally classified as "Major". In contrast, mutations with little or no effect on susceptibility are usually classified as accessory or "Minor". The term accessory is used either because these mutations usually reduce susceptibility only in combination with a major mutation or increase the replication fitness of viruses containing major drug resistance mutations.
(2) Frequency of the mutation among persons with virological failure - mutations that occur commonly during virological failure are more likely to be classified as "Major". In contrast, rare mutations or those that usually occur only after other drug-resistance mutations are more likely to be classified as "Minor".
(3) Extent of polymorphism among untreated persons - mutations that occur commonly in untreated patients as naturally occurring variants are more likely to be classified as "Minor". Several naturally occurring variants cause slight reductions in drug susceptibility. Fortunately, none cause large reductions in drug susceptibility.
(4) Location of the DRM within the 3-D structure protein - mutations at structurally important and conserved parts of a protein are more likely to be considered "Major". These mutations, also often cause greater reductions in drug susceptibility.
Are there any other mutation classification schemes in HIVDB?
There are several more classification schemes used in HIVDB as outlined in the following paragraphs:
(1) The HIV Drug Resistance Interpretation program (HIVdb) categories RT mutations as "NRTI mutations", "NNRTI mutations", and "Other RT mutations" and PR mutations as "Major PI resistance mutations", "Minor PI resistance mutations", and "Other PR mutations". A regularly updated summary of how each mutation is categorized can be found here: Protease Inhibitors (PIs) associated DRMs, Nucleoside RT inhibitors (NRTIs) associated DRMs, Non-nucleoside RT inhibitors (NNRTIs) associated DRMs and Integrase inhibitors (INIs) associated DRMs. Every known PR and RT DRM is associated with a regularly updated comment and with a series of regularly updated drug-resistance penalty scores for the ARVs affected by these mutations. The considerations for classifying mutations into each of the categories described here are described in detail in the Release Notes of the HIVdb program.
(2) In the Reference Section of the database (reference page), the mutations in each sequence are divided into 6 RT, 4 PR, and 4 IN categories. The RT categories include (i) NRTI major, (ii) NRTI minor, (iii) NNRTI major, (iv) NNRTI minor, (v) Common polymorphisms, and (vi) Unusual mutations. The PR categories include (i) PI major, (ii) PI minor, (iii) Polymorphisms, and (iv) Unusual mutations. The IN categories include (i) INI major, (ii) INI minor, (iii) Common polymorphisms, and (iv) Unusual mutations.
The distinction between major and minor DRMs for each of the 4 drug classes: NRTI, NNRTI, PI, and INI are based on the criteria outlined in the previous section and discussed in more detail in a recent publication (Shafer and Schapiro 2008). The distinction between "Polymorphisms" and "Unusual mutations" is based on the frequency of the mutation in HIVDB. Mutations occurring at a frequency below 0.1% (< 1 in 1,000) in pooled viruses from treated and untreated persons with different group M subtypes are considered "Unusual mutations" unless they are known to be DRMs that happen to be rare.
Betts, B.J. and Shafer, R.W. 2003. Algorithm specification interface for human immunodeficiency virus type 1 genotypic interpretation. J Clin Microbiol 41: 2792-2794.
Kantor, R., R. Machekano, M.J. Gonzales, K.M. Dupnik, J.M. Schapiro, and Shafer, R.W. 2001. Human immunodeficiency virus reverse transcriptase and protease sequence database: An expanded model integrating natural language text and sequence analysis. Nucleic Acids Res 29: 296-299.
Kuiken, C., B. Korber, and Shafer, R.W. 2003. HIV sequence databases. AIDS Rev 5: 52-61.
Liu, T.F. and Shafer, R.W. 2006. Web resources for HIV type 1 genotypic-resistance test interpretation. Clin Infect Dis 42: 1608-1618.
Rhee, S.Y., M.J. Gonzales, R. Kantor, B.J. Betts, J. Ravela, and Shafer, R.W. 2003. Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic Acids Res 31: 298-303.
Rhee, S.Y., R. Kantor, D.A. Katzenstein, R. Camacho, L. Morris, S. Sirivichayakul, L. Jorgensen, L.F. Brigido, J.M. Schapiro, and Shafer, R.W. 2006a. HIV-1 pol mutation frequency by subtype and treatment experience: extension of the HIVseq program to seven non-B subtypes. AIDS 20: 643-651.
Rhee, S.Y., J. Taylor, G. Wadhera, A. Ben-Hur, D.L. Brutlag, and Shafer, R.W. 2006b. Genotypic predictors of human immunodeficiency virus type 1 drug resistance. Proc Natl Acad Sci U S A 103: 17355-17360.
Shafer, R.W. 2006. Rationale and uses of a public HIV drug-resistance database. J Infect Dis 194 Suppl 1: S51-58.
Shafer, R.W., K. Hertogs, A.R. Zolopa, A. Warford, S. Bloor, B.J. Betts, T.C. Merigan, R. Harrigan, and B.A. Larder. 2001. High degree of interlaboratory reproducibility of human immunodeficiency virus type 1 protease and reverse transcriptase sequencing of plasma samples from heavily treated patients. J Clin Microbiol 39: 1522-1529. 006. Rationale and uses of a public HIV drug-resistance database. J Infect Dis 194 Suppl 1: S51-58.
Shafer, R.W., D.R. Jung, and B.J. Betts. 2000. Human immunodeficiency virus type 1 reverse transcriptase and protease mutation search engine for queries. Nat Med 6: 1290-1292.
Shafer, R.W., D.R. Jung, B.J. Betts, Y. Xi, and M.J. Gonzales. 2000. Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic.Acids.Res. 28: 346-348.
Shafer, R.W. and J.M. Schapiro. 2008. HIV-1 Drug Resistance Mutations: an Updated Framework for the Second Decade of HAART. AIDS Rev 10: 67-84.
Shafer, R.W., D. Stevenson, and B. Chan. 1999. Human immunodeficiency virus reverse transcriptase and protease sequence database. Nucleic.Acids.Res. 27: 348-352.
Wang, C., Y. Mitsuya, B. Gharizadeh, M. Ronaghi, and Shafer, R.W. 2007. Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res 17: 1195-1201.