Stanford University HIV Drug Resistance Database - 
A curated public database designed to represent, store, and analyze the divergent forms 
of data underlying HIV drug resistance.

A curated public database designed to represent, store, and analyze the divergent forms of data underlying HIV drug resistance.

Home Genotype-RX Genotype-Pheno Genotype-Clinical HIVdb Program

Release Notes for HIVdb, HIVseq, HIValg

Table of Contents

  1. Introduction
  2. Features Used by Two or More Programs
  3. HIVdb
  4. HIVseq: HIV-1 RT and Protease Search Engine for Queries
  5. HIValg: Algorithm Comparison
  6. References
  7. Appendices
 
1. Introduction

HIVdb is an expert system that accepts user-submitted protease and RT sequences and returns inferred levels of resistance to 17 FDA-approved anti-HIV drugs. Each drug resistance mutation is assigned a drug penalty score; the total score for a drug is derived by adding the scores of each mutation associated with resistance to that drug. Using the total drug score, the program reports one of the following levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance. Genotypic interpretations are not designed to necessarily correlate with the inferred level of phenotypic resistance because the genotypic interpretation also uses correlations between genotype and clinical outcome in deciding how a drug's susceptibility should be graded.

HIVseq accepts user-submitted RT and protease sequences, compares them to the consensus subtype B reference sequence, and uses the differences as query parameters for interrogating the HIV Drug Resistance database1. This allows users to detect unusual sequence results immediately so that the person doing the sequencing can check the primary sequence output while it is still on the desktop. In addition, unexpected associations between sequences or isolates can be discovered by immediately retrieving data on isolates sharing one or more mutations with the sequence.

HIValg is designed for users interested in comparing the results of different algorithms or who are interested in comparing and evaluating existing and newly developed algorithms.

The three programs HIVseq, HIVdb, and HIValg all work off a common code base and a common set of technologies.

 
2. Features Used by Two or More Programs

2.1 User Interface

  • For each of the three programs, sequences can be entered by the following methods:
    • Pasting one or more (up to 100 at a time) nucleotide sequences in Fasta format: The first line contains a greater-than-sign ('>') followed by a sequence name, optionally followed by additional characteristics separated by pipes ('|'). The remaining lines contain the nucleotide sequence.
    • Uploading a file containing one or more non-interleaved sequences in Fasta or GRF (Bayer Diagnostics) format.
    • Selecting mutations using drop down boxes or by entering them in a text box, separated by spaces or commas.
  • An optional identifier and date.

  • A list of output options customizable by checkboxes.

2.2 Sequence Alignment and Mutation List

For all of the programs, users have the option to indicate either a list of sequences or a list of specific mutations. If users enter sequences, these are aligned to the consensus amino acid sequence using the program LAP2. Based on this alignment, mutations, frame shifts, insertions, deletions, and mutations are determined for each sequence.

Mutations are defined as differences from the consensus B reference sequence (PR and RT). In each of the three programs, mutations are divided into those associated with drug resistance ('Resistance Mutations') and those that have not been associated with drug resistance ('Other Mutations'). This separation, however, is not always sharp. There are some mutations that appear to be associated with drug therapy but which are not generally considered drug-resistance mutations. In our programs these mutations are not listed with the accepted drug-resistance mutations.

Nucleotide triplets containing ambiguities are translated into each of the possible amino acids they encode. However, when the resulting list of possible amino acids is more than four, we replace this list with an 'X'. For example, WMC is translated to NTYS (N for AAC, T for ACC, Y for TAC, S for TCC), but WMS is translated to X instead of NTYSK* (N for AAC, T for ACC, Y for TAC, S for TCC, K for AAG, T for ACG, * for TAG, and S for TCG). All possible translations are explicitly defined in the triplets-table.txt file.

2.3 Quality control analysis

The quality control analysis reports three types of problem positions: a. A list of positions containing stop codons or frame shifts; b. A list of positions containing highly ambiguous nucleotides: N (cannot distinguish between A,C,G, or T), B (contains a combination of C, G, and T), D (contains a combination of A, G, and T), H (contains a combination of A, C, T), and V (contains a combination of A, C, and G). Whereas mixtures of two nucleotides occur commonly and do not reflect sequencing artifact, the presence of mixtures with three or more nucleotides at the same position occurs rarely in high quality sequences; c. A list of positions with atypical mutations. Mutations are considered atypical if they have been observed in <0.1 % in published group M HIV-1 sequences. These three lists are accompanied by a summary figure containing blue lines for each difference from consensus B and red lines for each problem position.

2.4 Subtyping

Each sequence is compared to a list of reference sequences for each of the Main group of HIV-1 sequences representing subtypes A, B, C, D, F, G, H, J, K, CRF01_AE, and CRF02_AG. The subtype of the closest reference sequence is assigned to the submitted sequence. This method will generally be accurate 3; however, it will not accurately characterize uncommon inter-subtype recombinants. In addition, subtype B protease sequences are occasionally misclassified as belonging to subtype D because these subtypes are very similar and the protease contains fewer phylogenetically informative positions compared with RT. If subtype analysis is checked, the program will produce the uncorrected nucleotide distance between a submitted sequence and the reference sequences used for subtyping. A reference guide to HIV-1 classification can be found at the Los Alamos HIV Sequence Database web site and in the 4/6/2000 issue of Science.

2.5 Drug Resistance Interpretation

Both HIVdb and HIValg contain rules-based sequence interpretation programs. HIVdb contains an algorithm developed initially for use by the Stanford University Hospital Diagnostic Virology Laboratory which was made available online in September 2000. A detailed description of HIVdb can be found in section 4 of this document. HIValg allows users to get the results of multiple rules-based interpretations including HIVdb, other publicly available algorithms, as well as, user submitted algorithms. HIValg, however, does not provide the complete output associated with HIVdb (e.g. quality control and comments) and therefore should be used primarily for research purposes. HIValg is described in detail in section 5.

 
3. HIVdb

The drug resistance interpretation used in this program is similar to the one used by the Stanford University Hospital (SUH) Diagnostic Virology Lab. However, each of the SUH Diagnostic Virology Lab reports is manually reviewed before being reported to the ordering physician.

3.1 Levels of Resistance

The report provides five levels of inferred drug resistance:

  • Susceptible (score 0-9): virus isolates of this type have not shown reduced susceptibility to the drug.
  • Potential low-level resistance (score 10-14): virus isolates of this type have mutations which by themselves may not cause drug resistance, yet indicate the possibility of previous drug selection.
  • Low-level resistance (score 15-29): virus isolates of this type have reduced in-vitro susceptibility to the drug and/or patients with viruses of this genotype may have a suboptimal virologic response to treatment.
  • Intermediate resistance (score 30-59): the genotype suggests a degree of drug resistance greater than low-level resistance but lower than high-level resistance.
  • High-level resistance (score >=60): the genotype is similar to that of isolates with the highest levels of in-vitro drug resistance and/or patients infected with isolates having similar genotypes generally have little or no virologic response to treatment with the drug.

3.2 Comments

The second part of the report provides an explanation for the inferred drug resistance (selected on the input page using the "Comments" option). For each gene, bullets summarize the data linking a mutation to decreased drug susceptibility. The first set of comments that appear for a gene are generic, the second set are specific to the submitted sequence.

Files with the most recent comments are available in tab-delimited format:

3.3 Drug/Mutation Scores

The Mutation Scoring tables (selected on the input page using the "Mutation Scores" option) contain penalties for drugs based on a mutation's position and amino acid.

  • Mutation scores are derived from published literature linking mutations and antiretroviral drugs, including correlations between genotype and treatment history, genotype and phenotype, and genotype and clinical outcome.
  • The drug resistance interpretation for a drug is derived by adding together the scores of each for the mutations associated with resistance to that drug. The rationale and the range of scores for the five different levels of inferred drug resistance are described in section 3.1.
  • The current version of the drug-resistance program assigns the same score to a mutation regardless of whether the mutation is in pure form or present as a mixture with one exception. If an atypical mutation at a drug-resistance position appears as part of a mixture, the score for that mutation is divided by two because of the possibility that the mixture reflects a sequencing artifact.
  • "Z" in the amino acid column represents amino acids for which there is no specific row in a table. For example, in the NRTI table, any mutation at position 41 other than M41L would be assigned the penalty in the row having position = 41 and amino acid = "Z".
  • Mutations that cause hypersusceptibility to a drug have a negative score in that drug's column.
  • For more information about a mutation see the appropriate PI, NRTI, or NNRTI Resistance Notes page, which contains written summaries about most mutations along with references and links to MEDLINE. Clicking on the mutation will take you to a page containing data on the prevalence of that mutation in patients receiving different drugs and the phenotype susceptibility of isolates with that mutation.
  • The scores in the tables are updated based on new published data and based on feedback from users and experts in the field. If you have comments, suggestions, or relevant data, please contact Robert W. Shafer, M.D.

The most recent scores are available as tab-delimited files or tables sortable by position or drug:

Tab-Delimited FilesSortable Tables
  • scores for PIs
  • scores for NRTIs
  • scores for NNRTIs
  • scores for PIs
  • scores for NRTIs
  • scores for NNRTIs

  • Throughout our website we refer to each drug by its abbreviation and here you can find the different names for each drug

    Protease Inhibitors (PIs)
    Brand Name Generic Name Abbreviation
    Agenerase® amprenavir APV
    Aptivus® tipranavir TPV
    Crixivan® indinavir IDV
    Invirase® saquinavir SQV
    Kaletra® lopinavir + ritonavir LPV
    Lexiva® fosamprenavir FPV
    Norvir® ritonavir RTV
    Reyataz® atazanavir ATV
    Viracept® nelfinavir NFV
    Prezista® darunavir DRV or TMC114

    Nucleoside Reverse Transcriptase Inhibitors (NRTIs)
    Brand Name Generic Name Abbreviation
    Emtriva® emtricitabine FTC
    Epivir® lamivudine 3TC
    Hivid® zalcitabine ddC
    Retrovir® zidovudine AZT or ZDV
    Videx® EC didanosine: delayed-release capsules ddI
    Viread® tenofovir disoproxil fumarate (DF) TDF
    Zerit® stavudine d4T
    Ziagen® abacavir ABC

    Non-Nucleoside Reverse Transcriptase Inhibitors (NNRTIs)
    Brand Name Generic Name Abbreviation
    Rescriptor® delavirdine DLV
    Sustiva® efavirenz EFV
    Viramune® nevirapine NVP

    3.4 Program Updates

    The scoring tables, comments, and programs are frequently updated; these updates are tracked in the Updates page. Below is a listing of our current and previous versions linking to the specific improvements since January 2003.

    3.5 Program Download

    the code is available for download or online browsing.

     
    4. HIVseq: HIV-1 RT and Protease Search Engine for Queries

    This program provides analysis of HIV RT and protease sequences in the context of existing published sequence data on these genes.

    4.1 Output

    • List of differences from the consensus B amino acid sequence ("mutations") including mixtures.
    • Insertions and/or deletions.
    • Comparison with reference sequences of known subtype.
    • Quality control analysis as described in section 2.3
    • Data on the frequency with which each mutation occurs in individuals according to HIV-1 subtype and type of drug therapy.
    • Hyperlinks to references associated with each mutation.

    4.2 Mutation Data Frequency Analysis

    This analysis is unique to HIVseq. Each sample mutation is used as a query parameter to interrogate the HIV Drug Resistance Database. Within the database, mutation frequency tables contain the frequency with which each mutation occurs in different categories of HIV-1 isolates.

    For protease sequences, these categories include:

    • Subtype B isolates from "untreated" individuals (those who have not received a protease inhibitor (PI)).
    • Subtype B isolates from individuals who have received at least one PI ("treated").
    • Subtype B isolates from individuals who have received at least three PIs ("heavily treated").
    • Non-subtype B isolates from untreated individuals (this category is optional; to view these data, the "Subtyping Data" box must be checked).

    For RT sequences, these categories include:

    • Subtype B isolates from "untreated" individuals (those who have not received an RT inhibitor).
    • Subtype B isolates from individuals who have received at least one nucleoside RT inhibitor (NRTI) (but no non-nucleoside RT inhibitor (NNRTI)).
    • Subtype B isolates from individuals who have received at least four NRTIs (but no NNRTI).
    • Subtype B isolates from individuals who have received an NNRTI (+/- one or more NRTIs). Note: often these individuals have received multiple NRTIs, therefore these data are most relevant to those positions associated with NNRTI resistance.
    • Non-subtype B isolates from untreated individuals (this category is optional; to view these data, the "Subtyping Data" box must be checked).

    Note: To minimize reporting bias, the mutation frequency tables contain one sequence per individual. For individuals in whom sequences from multiple isolates were published, the mutation tables include the earliest sequence from untreated persons and the latest sequence (while on therapy) from persons receiving antiretroviral therapy. To exclude technical sequencing errors and cases of circulating virus containing unusual variants, the mutation tables include only mutations present as the predominant form whenever multiple clones from the same isolate were sequenced. Sequences of poor quality and those considered to be possible laboratory contaminants are excluded from the data sets.

    The following table provides a summary of the data sets in the used for the HIVseq output. Within the next several weeks, the Non-B category will be replaced by frequency data associated with specific subtypes.

    Summary of Data Sets (2005 May 23)
    GeneSubtypeTreatment# Isolate# References
    ProteaseBNone3465115
    ProteaseB>=1 PI509289
    ProteaseB>=3 PIs167043
    ProteaseNon-BNone3156147
    RTBNone197995
    RTB>=1 NRTI6720112
    RTB>=4 NRTIs272957
    RTBNNRTI305466
    RTNon-BNone2100122
     
    5. HIValg: Algorithm Comparison

    5.1 Objectives

    The objectives of this program are to 1) identify the extent of disagreement between commonly used drug resistance interpretation algorithms; and 2) to identify sequences responsible for disagreements. The first objective indicates the extent of expert agreement on the significance of different drug resistance mutations. The second objective is useful to those developing algorithms. If there is a conflict between two interpretation systems, then the experimental and/or clinical data on the mutational patterns causing the discordance should be re-examined.

    This program is not designed to determine whether one algorithm is superior to another algorithm. By creating this program, we anticipate that algorithms will evolve and most likely converge through a process of inter-algorithm comparison. Implementing algorithms on the web is also the first step for applying them on clinical data sets.

    5.2 Algorithms Available Online

    The following algorithms are available online in their XML form in the "Algorithm Specification Interface page". They are all encoded using the ASI format, which is also described in the same page.

    • ANRS: Agence Nationale de Recherches sur le SIDA 4,5.
    • HIVDB: The current version of the drug-resistance interpretation program on this site is referred to as the "HIVdb" algorithm.
    • Rega Institute: Courtesy of Professor Anne-Mieke Vandamme 7.

    Each of the algorithms reports their results differently. The table below shows how the results of the algorithm are normalized for comparison by the program. Users of HIValg can select whether they prefer to receive output with the original interpretation or with the normalized interpretation ('SIR' option).

    AlgorithmSIR
    ANRSSusceptiblePotentially resistantResistant
    HIVDBSusceptible,
    Potential low-level resistance
    Low-level resistance,
    Intermediate resistance
    High-level resistance
    Rega InstituteSensitiveAdvise against when other options availableResistant

    5.3 User-Submitted Algorithms

    Selecting which algorithms appear in the output report can be done in two different ways. The first technique is to select from the list of algorithms made available on our servers. The second technique allows you to upload an algorithm from your machine, assuming that the algorithm is in proper ASI format as described in the Algorithm Specification Interface page. These techniques can be used in combination.

     
    6. References
    1. Shafer RW, Jung DR, Betts BJ. Human immunodeficiency virus type 1 reverse transcriptase and protease mutation search engine for queries. Nat Med 2000; 6(11): 1290-1292.
    2. Huang X. Fast comparison of a DNA sequence with a protein sequence database. Microb Comp Genomics 1996; 1(4): 281-91.
    3. Gonzales MJ, Machekano RN, Shafer RW. Human immunodeficiency virus type 1 reverse transcriptase and protease subtypes: classification, amino acid mutation patterns, and prevalence in a northern california clinic-based population. J Infect Dis 2001; 184(8): 998-1006.
    4. Rousseau MN, Vergne L, Montes B, et al. Patterns of resistance mutations to antiretroviral drugs in extensively treated HIV-1-infected patients with failure of highly active antiretroviral therapy. J Acquir Immune Defic Syndr 2001; 26(1): 36-43.
    5. Meynard JL, Vray M, Morand-Joubert L, et al. Phenotypic or genotypic resistance testing for choosing antiretroviral therapy after treatment failure: a randomized trial. AIDS 2002; 16: 727-736.
    6. DeGruttola V, Dix L, D'Aquila R, et al. The relation between baseline HIV drug resistance and response to antiretroviral therapy: re-analysis of retrospective and prospective studies using a standardized data analysis plan. Antivir.Ther. 2000; 5(1): 41-48.
    7. Van Laethem K., De Luca A., Antinori A., Cingolani A., Perno C.F. and Vandamme A.-M. A genotypic drug resistance interpretation algorithm that significantly predicts therapy response in HIV-1 infected patients. Antiviral Therapy 2002; 7: 123-129.
     
    7. Appendices

    Appendix 1. Consensus B Sequences

    The subtype B consensus sequence is derived from an alignment of subtype B sequences maintained at the Los Alamos HIV Sequence Database (hiv-web.lanl.gov). The consensus B sequence is therefore a commonly used reference sequence to which new sequences are compared. Files containing the consensus PR and consensus RT are also available.

    Consensus B SequencesAmino Acids
    Protease PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGI
    GGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF
    RT PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKI
    GPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGL
    KKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPSINNETPGIRYQYNVLP
    QGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRT
    KIEELRQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKD
    SWTVNDIQKLVGKLNWASQIYAGIKVKQLCKLLRGTKALTEVIPLTEEAE
    LELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLK
    TGKYARMRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEA
    WWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRET
    KLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEVNIVTDSQ
    YALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDK
    LVSAGIRKVL


    Appendix 2. Sample Data Sets

    A small data set (N=10) has been compiled to provide users with a sample input for running our programs. To view the results for these sequences, copy and paste them into the input form.

    A large data set (N=2055) is also available. We ask users to restrict the number of sequences they process at a time using our programs to 100, so this data set cannot be directly submitted to our programs.

    A very large data set (N=5838) is available. Again, we ask users to restrict the number of sequences they process at a time using our programs to 100, so this data set cannot be directly submitted to our programs.

    Other sample sequences are available using the links below:

    SequenceFile
    Protease sequences with mutations at codons 30 and 88PR30-88.txt
    Protease sequences with mutations at codons 82, 84 and 90PR82-84-90.txt
    RT sequences with AZT mutations and M184VRT184plusAZTmut.txt
    RT sequences with mutations at codons 103 and 190RT103-190.txt

    The Team

    The Data