Stanford University HIV Drug Resistance Database - A curated public database designed to represent, store, and analyze the divergent forms of data underlying HIV drug resistance.

User Guide

Last updated on 10/00
 

The HIV RT and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy. The database contains a compilation of most published HIV RT and protease sequences, including submissions from International Collaboration databases (e.g. GenBank), and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year, 3,500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. This guide provides an explanation for each of the features of the database, as well as instructions on using each feature. The following table contains a summary of the key sections in the database divided into three sections (Database Documents, Database Query Forms, and Sequence Analysis Programs) with hyperlinks to each part:

Web Page Description
Database Documents  
Background; Primer Database rationale for lay audience and novice users
Data Model for Understanding HIV Drug Resistance Mutations Description of the sources of knowledge of drug resistance mutations
Resistance Notes Overview of HIV drug resistance with links to relevant database entries
Summary Statistics Dynamically updated summary of database content
User Guide Description of database schema and explanation of specific web site features
Database Query Forms  
Drug therapy Retrieve sequences of isolates from persons receiving a selected drugs or drug combinations
Mutations Retrieve sequences containing selected mutations
Drug susceptibility Retrieve published drug susceptibility data for isolates with selected mutations (in progress)
References Retrieve sequences and data summaries from published studies
Sequence Analysis Programs  
HIV-SEQ Compare new RT and protease sequences to previously published sequences with the same mutations
Drug Resistance Interpretation (beta test version) Infer drug resistance to 15 available drugs using rules hyperlinked to data within the database


Database Documents
  1. Background and Rationale
  2. Primer
  3. Data Model for Understanding HIV Drug Resistance
  4. Drug Resistance Notes
  5. Summary Statistics
  6. Slides
  7. Credits
  8. Citation
1. Background and Rationale is a brief bulleted description of the database rationale targeted towards a lay audience.

2. Primer is a detailed description of the database rationale, particularly with respect to the problem of HIV drug resistance. The primer also describes how the HIV RT and Protease Sequence Database differs from other databases containing HIV sequences, including the International Collaboration databases and the Los Alamos HIV Sequence Database.

3. Data Model for Understanding HIV Drug Resistance describes an ontology of HIV drug resistance based on four types of correlations between genetic sequences and other types of data. The four types of data include:
    1. Drug treatment histories of persons from whom sequenced HIV isolates are available
    2. In vitro drug susceptibility data on laboratory HIV isolates
    3. In vitro drug susceptibility data on clinical HIV isolates
    4. Clinical outcome data on persons receiving specific drug regimens

4. Drug Resistance Notes contains a graphical and text overview of the major resistance mutations for each of the three drug classes: NRTI, NNRTI, and PI. The graphical images contain colored rectangles that are hyperlinked to data relating mutations and drugs. The example below is the graphical overview for the NRTI mutations.
  • The column header contains drug names and the row headers contain amino acid positions. There is more than one row for positions at which there are different interpretations for different amino acids.
  • Red rectangles indicate high level resistance to NRTI's and NNRTI's, and primary resistance mutations for PI's. Yellow rectangles indicate mutations that contribute to resistance when present with other mutations. Pale yellow rectangles indicate accessory resistance mutations. In the case of the protease, many of these accessory mutations also occur as polymorphisms in untreated individuals. Rectangles containing a "?" indicate that the relationship between the mutation and drug resistance has not been fully defined. Rectangles containing an * indicate that the mutation causes hypersusceptibility.
  • Each rectangle is hyperlinked to a new web page ('Mutation Data') containing the four types of data described in 'Data Model for Understanding HIV Drug Resistance' relevant to the selected drug and position. The section below summarizes the four types of data which include data on the prevalence of the mutation in patients receiving anti-HIV drugs, susceptibility on laboratory isolates containing the mutation, susceptibility data on clinical isolates containing the mutation, and annotated references linking the mutation to clinical outcome.
Mutation Data Page

  • 'Submitted Query' shows the selected position, amino acids, and drugs.
  • 'Data Sources' are hyperlinks to relevant database entries.
  • 'Annotated References' section is pending.
Mutation prevalence data:
  • Data on the frequencies of the selected mutation in untreated persons and in persons receiving various drugs and drug combinations.
  • Data in the table are filtered to include one representative isolate per person.
  • MonoRx indicates that the sequenced isolates were obtained from persons who received no other drug of the same drug class.
Susceptibility data on lab isolates
  • Laboratory (lab) isolates are isolates created by site directed mutagenesis or during in vitro passage experiments.
  • A column indicating the susceptibility test method will soon be added.
  • Only two categories of mutations ('NRTI' and 'NNRTI' for RT mutations, and 'Major' and 'Minor' for protease mutations) are shown for lab isolates as the remaining positions in lab isolates are presumed to be wild type.
  • The 'Result' column generally indicates IC50's and less commonly indicates IC90's and IC95's. A column indicating which measure will be added.
  • The 'Fold Resistance' column shows the fold resistance relative to susceptible wildtype isolates.
Susceptibility data on clinical isolates:
  • Clinical isolates are isolates obtained from untreated and treated individuals.
  • This table is similar in format to the 'Susceptibility data on lab isolates' table
  • The 'Other Mutations' column, includes mutations not included in either the 'NRTI' or 'NNRTI' columns or in 'Major' or 'Minor' columns.
  • A column indicatindg the susceptibility test method will be added.

5. Summary Statistics is a dynamically generated page which contains the following three sections:
  1. A summary of the number of individuals, virus isolates, RT and protease sequences, and gene mutations within the HIV RT and Protease Sequence Database. Approximately 75% of the sequences were obtained from GenBank. Approximately 25% of the sequences are amino acid sequences which were obtained from published papers.
  2. A summary of the number of individuals with RT and protease sequences according to HIV-1 subtype and country. Subtype B is the most common HIV-1 subtype in the USA and in Europe. However, non-B subtypes predominates throughout the world. The subtype of an isolate was considered to be confirmed if env and/or gag sequences were available on the same isolate.
  3. A summary of individuals with RT and protease sequences according to anti-HIV therapy. Several rows from the table generated on 9/15/2000 are shown below. Under the column 'Received No Other Drug' are the numbers of individuals who received the indicated drug treatment but no other drugs of the same class. Under the column 'Received Other Drugs' are the numbers of individuals who received the indicated drug treatment and who may or may not have received other drugs of the same class. Under 'AA' are the numbers of individuals from whom just amino acid sequences are available. Under 'NA' are the numbers of individuals from whom nucleic acid and amino acid sequences are available.

6. Slides
Links to slide set presentations that can be viewed or downloaded. These are not updated over time. Therefore, some of the data may become dated or inaccurate.

7. Credits and Acknowledgements
People and organizations that contributed to the HIV RT and Protease Sequence Database; email addresses of the major current contributers; and links to map directions to Stanford Medical Center and to the Medical School Office Building.

8. Citation
Articles which should be cited when referring to different parts of the database.





Database Queries

Queries enable users to retrieve a theoretically limitless number of different sequence sets matching selection criteria based on specific drug treatments, RT and protease mutations, drug susceptibility patterns, and references. Retrieved data are initially returned in a tabular format ('Query Result'). Retrieved data can then be viewed in a variety of formats that are often independent of the original query ('Sequence Alignment', 'Composite Alignment'). The 'Sequence Alignment' returns raw sequence data in a variety of formats. The 'Composite Alignment' pages returns a summarized version of the sequence data. The table below summarizes the various query pages and the formats of the retrieved data and their associated sequences.


Queries

  1. Protease Inhibitors
  2. RT Inhibitors
  3. Protease Mutations
  4. RT Mutations
  5. Insertions
  6. References

 

Query Result

Reference
(Medline Link)
Patient
Isolate
GenBank Link
Rx Data
Mutation Data
Subtype
Seq. Method

Sequence Alignment

Aligned AA (compared to consensus B)

FASTA (NA/AA)

Composite Alignment

%AA variants by position



1. Protease Inhibitor Query Page

 

  Abbreviations: APV:amprenavir, IDV:indinavir, RTV:ritonavir, NFV:nelfinavir,                           SQV:saquinavir

  1. '# of PI': the total number of protease inhibitors received. This is a necessary field. the default value is '0'.
  2. 'PI received': the specific PIs received. This is an optional field. The sum of PI's indicated here may be less than but can not exceed that number indicated in '# of PI'.
  3. 'First PI': the first PI received. The default value is 'ANY'.
  4. 'Subtype': specifies phylogenetic criteria
    • Species: HIV1, HIV2 (includes SIVsooty mangabey and SIVmacaque), and AGM (primate immunodeficiency viruses infecting african green monkeys and other primates). If 'HIV1' is selected, the 'Group' must be specified. If HIV2 or AGM are selected, the 'Group' and 'Subtype' will be automatically unselected.
    • Groups: Main, O, - (not applicable). If 'Main' is selected, then a 'Subtype' must be specified.
    • Subtype: B, All, NonB, A, C, D, F, G, H, J, - (not applicalbe).
  5. 'Filter': filters the data retrieved (the default mode) to exclude sequences that are either redundant (e.g., multiple sequences obtained from the same person at the same point in time) or questionable (e.g., sequences of poor quality, possible intralaboratory contaminants), or short sequence fragments.Filtering is recommended except for users who plan to do their own analysis. About 5-10% of sequences in the database are filtered.
  6. 'Additional Output Options':
    1. Complete Rx (treatment) history including drug regimens, the order in which they were received, and duration in weeks.
    2. Complete mutation list. Lists all differences from the consensus B sequence. The default is to just show 'Major' and 'Minor' mutations. The current classification is as follows: Major positions: 8, 30, 32, 46, 47, 48, 50, 54, 82, 84, 90. Minor positions: 10, 20, 24, 33, 36, 53, 63, 71, 73, 74, 77, 88, 93.
    3. Subtype
    4. Sequence Method - Dideoxynucleoside sequencing (Dideoxy) vs DNA chip (Affymetrix)

2. RT Inhibitor Query Page

 

   Abbreviations: AZT:zidovudine, DDI:didanosine, DDC:zalcitabine, 3TC: lamivudine,                           D4T:stavudine, ABC:abacavir, ADV:adefovir

  1. '# NRTI': the total number of NRTIs received. This is a necessary field. The default value is '0'.
  2. 'NRTI received': choose the specific NRTIs and/or NRTI combinations received. These selections are optional. The sum of NRTI's indicated here maybe less than but can not exceed that indicated '# NRTI'.
  3. '# NNRTI': the total number of NNRTIs received. This is a necessary field. The default value is '0-3'. You may enter total number of NNRTI's and/or specific NNRTIs. The number of the drugs named can not exceed that indicated by number.
  4. 'Subtype': (explained above 'Protease Inhibitor Query')
  5. 'Filter': (explained above 'Protease Inhibitor Query')
  6. 'Additional Output Options': (explained above 'Protease Inhibitor Query')

3. Protease Mutations Query Page

        and

4. RT Mutations Query Page

Queries that enable retrieval of databse sequences according to specific codon and/or mutated amino acid definitions.

 

  1. 'Codon Amino Acid': select 1-5 codons. At least one codon must be selected. Amino acid selections are optional. For the protease, the codon must be between 1-99. For the RT, the codon must be between 1-250.
  2. 'Subtype':(explained above 'Protease Inhibitor Query')
  3. 'Filter': (explained above 'Protease Inhibitor Query')
  4. 'Additional Output Options': (explained above 'Protease Inhibitor Query')

Tabular Query Results

  1. Header: a summary of the query parameters, the number of isolates, patients, and references satisfying the query parameters, and a dialog box 'View Alignments'. The 'View Alignments' dialog box allows users to retrieve either nucleic acid or amino acid sequences in fasta format or to view the sequences in alignment with the consensus B sequence. In the alignment format, positions having the same residue as consensus B are represented by a '-'. For isolates having multiple clones, users have the option of retrieving all clones or representative consensus sequences.
 


  1. Main table:

  • 'Author (yr)' column: a link to MEDLINE.
  • 'Isolate' column: isolate names, and a link to addtional isolate data, including a list of the isolate mutations divided into 5 groups for PIs (major mutations, minor mutations, commom polymorphisims, rare polymorphisims, and conserved residues), and 6 groups for NRTIs/NNRTIs (NRTI mutations, NNRTI mutations, NonCanonical mutaions, commom polymorphisims, rare polymorphisims, and conserved residues); and the amino acid sequence/s of that isolate.
  • 'Acc#' column: a link to GenBank.
  • 'Wks' (for PI's only): total number of weeks PIs were received. This number is a sum of the duration of each PI from the 'PIs' column..
  • Complete treatment history, one of the four 'Additional Output Options' available, is shown above, and includes drug regimens, the order in which they were received, and the duration in weeks of each course.

Sequence Alignment Query Results

  1. The page returned contains a header that provides a summary of the query parameters, the number of isolates, patients, and references satisfying the query parameters, and a dialog box 'View Composite Alignment'. The 'View Composite Alignment' dialog box allows users to retrieve a presentation of the sequences with percentages or numbers of mutation frequencies.Users have the option of excluding single occurences of the different mutations.
 


  1. The main section of this page contains the retrieved sequences in one of two viewing options:
 
  1. Interleaved amino acids: the isolate's name appears to the left of the sequence. The top line represents the consensus fo this group of isolates. A dash means the amino acid at that position resembles the consensus. An 'X' stands for an undetermined amino acid at that position. A dot stands for an unsequenced position.
 


 
  1. FASTA: the isolate's name appears inthe top row of each sequence, followed by the nucleic acids (shown here) or amino acids, as chosen.
 

Composite Alignment Query Results

  • A summarized version for viewing the results of queries.
  • The first line shows the numbered consensus sequence.
  • The second line contains the total number of isolates in the data set at each position.
  • The remaining lines show the frequency of variation at each position in the sequence data set. Different colors are used to represent different frequencies according to the following legend:
    • Red: >=5%
    • Blue: 1-4.5%
    • Grey: <1%

5. Insertions Query Page

Retrieves sequences containing insertions. In the initial query page, users may specify gene and species parameters.

 


The tabular query result is similar to the one described above, with additional columns including: the codon where the insertion exists, the consensus amino acid in that codon, and the amino acid and nucleic acid sequences of the insertions. In the 'View Alignments' dialog box users may specify a codon of interest from the available list, and a preferred viewing format, similar to the choices mentioned above


6. References Query Page

Retrieves sequences according to an article or a first author. Users may retrieve data in 2 ways, both of which will retrieve all articles in the database of the specified author:

    1. Specitying a first author in the available drop-down list.
    2. Specifying a first author and year of publication from the References table. The author's name serves as a link to MEDLINE.
 

References Query Results Page

Data retrieved are presented in a table that contains articles by the author specifed and links to isolate data. Users may choose the gene of interest ('RT' or 'PR' in the 'Isolate data' column).

 

References Isolate Data Results Page

  1. Data retrieved are presented in a table, similar to the table in the other 'Tabular Query Results'. 'View Alignments' dialog box enables viewing of the actual sequence alignments in similar ways to those desribed above. 'View Additional Data' dialog box provides additional 4 types of data:
      1. Compete treatment (Rx) history.
      2. Isolate Data: body source of the sample (plasma, peripheral blood mononuclear cells - PBMCs, lymph node, cerebrospinal fluid - CSF, spleen or other); whether it has been cultured; the cloning method, if relevant (biological cloning - BC, genomic cloning - GC, molecular cloning - MC, and multiple molecular clones - MMC); and the sequencing method (dideoxy, DNA chip, or Unknown).
      3. Mutation lists of the isolates, divided into the previously described categories.
      4. Susceptibility data