|
The HIV RT and Protease Sequence Database is an on-line
relational database that catalogs evolutionary and drug-related sequence
variation in the human immunodeficiency virus (HIV) reverse transcriptase
(RT) and protease enzymes, the molecular targets of anti-HIV therapy.
The database contains a compilation of most published HIV RT and protease
sequences, including submissions from International Collaboration
databases (e.g. GenBank), and sequences published in journal articles.
Sequences are linked to data about the source of the sequence sample
and the antiretroviral drug treatment history of the individual from
whom the isolate was obtained. During the past year, 3,500 sequences
have been added and the data model has been expanded to include drug
susceptibility data on sequenced isolates. This guide provides an
explanation for each of the features of the database, as well as instructions
on using each feature. The following table contains a summary of the
key sections in the database divided into three sections (Database
Documents, Database Query Forms, and Sequence
Analysis Programs) with hyperlinks to each part: |
| Web
Page |
Description |
| Database Documents |
|
| Background; Primer |
Database rationale for lay audience and novice users |
| Data Model for Understanding HIV Drug Resistance Mutations
|
Description of the sources of knowledge of drug resistance
mutations |
| Resistance Notes |
Overview of HIV drug resistance with links to relevant database
entries |
| Summary Statistics |
Dynamically updated summary of database content |
| User Guide |
Description of database schema and explanation of specific
web site features |
| Database Query Forms |
|
| Drug therapy |
Retrieve sequences of isolates from persons receiving a selected
drugs or drug combinations |
| Mutations |
Retrieve sequences containing selected mutations |
| Drug susceptibility |
Retrieve published drug susceptibility data for isolates with
selected mutations (in progress) |
| References |
Retrieve sequences and data summaries from published studies
|
| Sequence Analysis
Programs |
|
| HIV-SEQ |
Compare new RT and protease sequences to previously published
sequences with the same mutations |
| Drug Resistance Interpretation (beta test version) |
Infer drug resistance to 15 available drugs using rules hyperlinked
to data within the database |
|
Database Documents
1.
Background and Rationale is a brief bulleted description
of the database rationale targeted towards a lay audience.
|
2.
Primer is a detailed description of the database rationale,
particularly with respect to the problem of HIV drug resistance. The
primer also describes how the HIV RT and Protease Sequence Database
differs from other databases containing HIV sequences, including the
International Collaboration databases and the Los Alamos HIV Sequence
Database.
|
3.
Data Model for Understanding HIV Drug Resistance describes
an ontology of HIV drug resistance based on four types of correlations
between genetic sequences and other types of data. The four types
of data include:
- Drug treatment histories of persons from whom sequenced HIV
isolates are available
- In vitro drug susceptibility data on laboratory HIV isolates
- In vitro drug susceptibility data on clinical HIV isolates
- Clinical outcome data on persons receiving specific drug regimens
|
4.
Drug Resistance Notes contains a graphical and text overview
of the major resistance mutations for each of the three drug classes:
NRTI, NNRTI, and PI. The graphical images contain colored rectangles
that are hyperlinked to data relating mutations and drugs. The example
below is the graphical overview for the NRTI mutations.
|
| 5.
Summary Statistics is a dynamically generated page which
contains the following three sections: |
- A summary of the number of individuals, virus isolates, RT and
protease sequences, and gene mutations within the HIV RT and Protease
Sequence Database. Approximately 75% of the sequences were obtained
from GenBank. Approximately 25% of the sequences are amino acid
sequences which were obtained from published papers.
- A summary of the number of individuals with RT and protease sequences
according to HIV-1 subtype and country. Subtype B is the most common
HIV-1 subtype in the USA and in Europe. However, non-B subtypes
predominates throughout the world. The subtype of an isolate was
considered to be confirmed if env and/or gag sequences
were available on the same isolate.
- A summary of individuals with RT and protease sequences according
to anti-HIV therapy. Several rows from the table generated on 9/15/2000
are shown below. Under the column 'Received No Other Drug' are the
numbers of individuals who received the indicated drug treatment
but no other drugs of the same class. Under the column 'Received
Other Drugs' are the numbers of individuals who received the indicated
drug treatment and who may or may not have received other drugs
of the same class. Under 'AA' are the numbers of individuals from
whom just amino acid sequences are available. Under 'NA' are the
numbers of individuals from whom nucleic acid and amino acid sequences
are available.
|
 |
| 6.
Slides |
Links to slide set presentations that can be viewed or downloaded.
These are not updated over time. Therefore, some of the data may become
dated or inaccurate.
|
| 7.
Credits and
Acknowledgements |
People and organizations that contributed to the HIV RT and Protease
Sequence Database; email addresses of the major current contributers;
and links to map directions to Stanford Medical Center and to the
Medical School Office Building.
|
| 8.
Citation |
| Articles which should be cited when referring to different parts
of the database. |
Database Queries
| Queries enable users to retrieve a theoretically limitless
number of different sequence sets matching selection criteria based
on specific drug treatments, RT and protease mutations, drug susceptibility
patterns, and references. Retrieved data are initially returned in
a tabular format ('Query Result'). Retrieved data can then be viewed
in a variety of formats that are often independent of the original
query ('Sequence Alignment', 'Composite Alignment'). The 'Sequence
Alignment' returns raw sequence data in a variety of formats. The
'Composite Alignment' pages returns a summarized version of the sequence
data. The table below summarizes the various query pages and the formats
of the retrieved data and their associated sequences. |
1.
Protease Inhibitor
Query Page
| |

Abbreviations: APV:amprenavir, IDV:indinavir, RTV:ritonavir, NFV:nelfinavir,
SQV:saquinavir
- '# of PI': the total number of protease inhibitors received.
This is a necessary field. the default value is '0'.
- 'PI received': the specific PIs received. This is an optional
field. The sum of PI's indicated here may be less than but can
not exceed that number indicated in '# of PI'.
- 'First PI': the first PI received. The default value is 'ANY'.
- 'Subtype': specifies phylogenetic criteria
- Species: HIV1, HIV2 (includes SIVsooty mangabey and SIVmacaque),
and AGM (primate immunodeficiency viruses infecting african
green monkeys and other primates). If 'HIV1' is selected,
the 'Group' must be specified. If HIV2 or AGM are selected,
the 'Group' and 'Subtype' will be automatically unselected.
- Groups: Main, O, - (not applicable). If 'Main' is selected,
then a 'Subtype' must be specified.
- Subtype: B, All, NonB, A, C, D, F, G, H, J, - (not applicalbe).
- 'Filter': filters the data retrieved (the default mode) to exclude
sequences that are either redundant (e.g., multiple sequences
obtained from the same person at the same point in time) or questionable
(e.g., sequences of poor quality, possible intralaboratory contaminants),
or short sequence fragments.Filtering is recommended except for
users who plan to do their own analysis. About 5-10% of sequences
in the database are filtered.
- 'Additional Output Options':
- Complete Rx (treatment) history including drug regimens,
the order in which they were received, and duration in weeks.
- Complete mutation list. Lists all differences from the consensus
B sequence. The default is to just show 'Major' and 'Minor'
mutations. The current classification is as follows: Major
positions: 8, 30, 32, 46, 47, 48, 50, 54, 82, 84, 90. Minor
positions: 10, 20, 24, 33, 36, 53, 63, 71, 73, 74, 77, 88,
93.
- Subtype
- Sequence Method - Dideoxynucleoside sequencing (Dideoxy)
vs DNA chip (Affymetrix)
|
2.
RT Inhibitor Query Page
| |

Abbreviations:
AZT:zidovudine, DDI:didanosine, DDC:zalcitabine, 3TC: lamivudine,
D4T:stavudine,
ABC:abacavir, ADV:adefovir
- '# NRTI': the total number of NRTIs received. This is a necessary
field. The default value is '0'.
- 'NRTI received': choose the specific NRTIs and/or NRTI combinations
received. These selections are optional. The sum of NRTI's indicated
here maybe less than but can not exceed that indicated '# NRTI'.
- '# NNRTI': the total number of NNRTIs received. This is a necessary
field. The default value is '0-3'. You may enter total number
of NNRTI's and/or specific NNRTIs. The number of the drugs named
can not exceed that indicated by number.
- 'Subtype': (explained above 'Protease Inhibitor Query')
- 'Filter': (explained above 'Protease Inhibitor Query')
- 'Additional Output Options': (explained above 'Protease Inhibitor
Query')
|
3.
Protease Mutations Query Page
and
4.
RT Mutations Query Page
|
Queries that enable retrieval of databse sequences according to specific codon and/or mutated amino acid definitions.
|
| |

- 'Codon Amino Acid': select 1-5 codons. At least one codon must
be selected. Amino acid selections are optional. For the protease,
the codon must be between 1-99. For the RT, the codon must be
between 1-250.
- 'Subtype':(explained above 'Protease Inhibitor Query')
- 'Filter': (explained above 'Protease Inhibitor Query')
- 'Additional Output Options': (explained above 'Protease Inhibitor
Query')
|
Tabular
Query Results
- Header: a summary of the query parameters, the number
of isolates, patients, and references satisfying the query parameters,
and a dialog box 'View Alignments'. The 'View Alignments' dialog
box allows users to retrieve either nucleic acid or amino acid
sequences in fasta format or to view the sequences in alignment
with the consensus B sequence. In the alignment format, positions
having the same residue as consensus B are represented by a '-'.
For isolates having multiple clones, users have the option of
retrieving all clones or representative consensus sequences.
|
| |

|
|

- 'Author (yr)' column: a link to MEDLINE.
- 'Isolate' column: isolate names, and a link to addtional isolate
data, including a list of the isolate mutations divided into 5
groups for PIs (major mutations, minor mutations, commom polymorphisims,
rare polymorphisims, and conserved residues), and 6 groups for
NRTIs/NNRTIs (NRTI mutations, NNRTI mutations, NonCanonical mutaions,
commom polymorphisims, rare polymorphisims, and conserved residues);
and the amino acid sequence/s of that isolate.
- 'Acc#' column: a link to GenBank.
- 'Wks' (for PI's only): total number of weeks PIs were received.
This number is a sum of the duration of each PI from the 'PIs'
column..
- Complete treatment history, one of the four 'Additional Output
Options' available, is shown above, and includes drug regimens,
the order in which they were received, and the duration in weeks
of each course.
|
Sequence
Alignment Query Results
- The page returned contains a header that provides a summary
of the query parameters, the number of isolates, patients, and
references satisfying the query parameters, and a dialog box 'View
Composite Alignment'. The 'View Composite Alignment' dialog box
allows users to retrieve a presentation of the sequences with
percentages or numbers of mutation frequencies.Users have the
option of excluding single occurences of the different mutations.
|
- The main section of this page contains the retrieved sequences
in one of two viewing options:
|
| |
- Interleaved amino acids: the isolate's name appears to
the left of the sequence. The top line represents the consensus
fo this group of isolates. A dash means the amino acid at that
position resembles the consensus. An 'X' stands for an undetermined
amino acid at that position. A dot stands for an unsequenced position.
|
| |
|
| |
- FASTA: the isolate's name appears inthe top row of each
sequence, followed by the nucleic acids (shown here) or amino
acids, as chosen.
|
| |

|
Composite
Alignment Query Results
|

- A summarized version for viewing the results of queries.
- The first line shows the numbered consensus sequence.
- The second line contains the total number of isolates in the
data set at each position.
- The remaining lines show the frequency of variation at each
position in the sequence data set. Different colors are used to
represent different frequencies according to the following legend:
- Red: >=5%
- Blue: 1-4.5%
- Grey: <1%
|
5.
Insertions
Query Page
Retrieves sequences containing insertions. In the initial query page,
users may specify gene and species parameters.
The tabular query result is similar to the one described above, with
additional columns including: the codon where the insertion exists, the
consensus amino acid in that codon, and the amino acid and nucleic acid
sequences of the insertions. In the 'View Alignments' dialog box users
may specify a codon of interest from the available list, and a preferred
viewing format, similar to the choices mentioned above
6.
References
Query Page
Retrieves sequences according to an article or a first author. Users
may retrieve data in 2 ways, both of which will retrieve all articles
in the database of the specified author:
- Specitying a first author in the available drop-down list.
- Specifying a first author and year of publication from the References
table. The author's name serves as a link to MEDLINE.
References Query
Results Page
Data retrieved are presented in a table that contains articles by the
author specifed and links to isolate data. Users may choose the gene
of interest ('RT' or 'PR' in the 'Isolate data' column).
| |

|
References Isolate
Data Results Page
- Data retrieved are presented in a table, similar to the table in
the other 'Tabular Query Results'. 'View Alignments' dialog box enables
viewing of the actual sequence alignments in similar ways to those
desribed above. 'View Additional Data' dialog box provides additional
4 types of data:
- Compete treatment (Rx) history.
- Isolate Data: body source of the sample (plasma, peripheral
blood mononuclear cells - PBMCs, lymph node, cerebrospinal fluid
- CSF, spleen or other); whether it has been cultured; the cloning
method, if relevant (biological cloning - BC, genomic cloning
- GC, molecular cloning - MC, and multiple molecular clones -
MMC); and the sequencing method (dideoxy, DNA chip, or Unknown).
- Mutation lists of the isolates, divided into the previously
described categories.
- Susceptibility data
|