Stanford University HIV Drug Resistance Database - A curated public database designed to represent, store, and analyze the divergent forms of data underlying HIV drug resistance.

HIV RT and Protease Analysis (HIV-SEQ)

 
Run Program Using:
User Submitted Sequence
Sample Sequence: HMDR3-197 
Sample Sequence: RF
Sample Sequence: INS-PT2

Sample Sequence Descriptions

HMDR3-197
Multi drug resistant RT and protease sequence (Accession# AF047318).

RF
WT RT and protease treatment naive sequence (Accession# M17451).

INS-PT2
RT sequence with insertion
(Accession# AF096882).
 
 
HIV-1 RT and Protease - Search Engine For Queries (HIV-SEQ) Release Notes (last updated: 9/15/00)
A program providing analysis of HIV RT and protease sequences in the context of existing published sequence data on these genes.    Written by Robert Shafer, Duane Jung, and Brad Betts
Input:
Nucleic acid sequence of HIV-1 RT and/or protease: Single sequence or set of multiple sequences. Sequence name and date are optional though they should be used to obtain ACTG/SDAC formatted output.
Input format:
Paste sequence into text box or use file upload button to choose text file with sequence. Sets of multiple sequences must be in non-interleaved fasta format. Note: Submitted sequences are not added to the database or stored on the server.
Output:
(1) Formatted and aligned nucleic acid and amino acid sequence.
(2) List of differences from the consensus B amino acid sequence ("mutations") including mixtures.
(3) Insertions and/or deletions.
(4) Data on the frequency with which each mutation occurs in individuals according to HIV-1 subtype and type of drug therapy.
(5) Hyperlinks to references associated with each mutation.
(6) Comparison with reference sequences of known subtype.
(7) Background information on each mutation.
Notes: Feedback from users in order to improve the program is appreciated. To save your results it is necessary to print them from the screen or to save them on your own computer. There will be an option in future versions for multiple sequences and their reports to be accessed for the duration of a working session. Analyzed sequences and results will not be stored or examined locally.
Sequence translation and alignment: The program identifies the sample sequence's correct reading frame by determining which reading frame contains the following conserved motifs:
Protease: "PQ(I|V)T", "DTG", "GCTLN";
RT: "PISP", "WPLT", "D(V|I)GDA", "QY(.)DDL", "WMG(Y|F)";
The program then uses the positions of these motifs to infer the starting positions of the protease and RT.
Frame shifts The presence of motifs in more than one reading frame indicates a reading frame shift. When this occurs, the program locates, records, and removes such shifts by applying an optimal sequence alignment algorithm to the region between the closest motifs in different reading frames. Although reading frame shifts in HIV-1 RT and protease have not been reported, they may conceivably result from defective virions or sequencing errors. If a reading frame shift is detected a warning will be highlighted in red at the beginning of the program.
Amino acid insertions and deletions: Amino acid insertions or deletions will cause the distances between conserved motifs to be different from the inter-motif distances in the consensus reference sequence. When an amino acid insertion or deletion is detected, the program locates, records, and removes it using an optimal sequence alignment algorithm. RT insertions and deletions have been reported in 1%- 2% of isolates from heavily treated persons.
Representation of amino acid mixtures: Nucleotide triplets containing ambiguities are translated into each of the possible amino acids they encode. For example, WMC is translated to NTYS (N for AAC, T for ACC, Y for TAC, S for TCC).
Comparison to the consensus B sequence: Genetic analysis of HIV-1 isolates has revealed multiple different group M (main) subtypes as well as several divergent group O (outlier) and group N (non-M, non-O) isolates. The group M subtypes differ from each other by 10%-30% along their genomes (usually about 10% in RT and protease). The most common subtype in the United States and Europe is subtype B. The subtype B consensus sequence is derived from an alignment of subtype B sequences maintained at the Los Alamos HIV Sequence Database (hiv-web.lanl.gov). The consensus B sequence is therefore a commonly used reference sequence to which new sequences are compared.
Comparison to sequences with known subtype: Following alignment, the nucleotide sequence is compared to a set of reference sequences of different subtypes. The list is obtained from the National Center for Biotechnology Information's HIV-1 Subtyping Tool (www.ncbi.nlm.nih.gov/retroviruses/HIV1). The genetic distance of the sample sequence to each of the reference sequences is calculated and shown in a table towards the end of the report. Note: Determination of HIV subtype using a single-sequence comparison is not robust and may be misleading in the case of isolates resulting from intersubtype recombination. A reference guide to HIV-1 classification can be found at the Los Alamos HIV Sequence Database web site and in the 4/6/2000 issue of Science.

Mutation frequency data: Each sample mutation is used as a query parameter to interrogate the HIV RT and Protease Sequence Database. Within the database, mutation frequency tables contain data on the frequency with which each mutation occurs in different categories of HIV-1 isolates.
For protease sequences, these categories include:
(i) Subtype B protease isolates from 'untreated' individuals (those who have not received a protease inhibitor (PI))
(ii) Subtype B isolates from individuals who have received at least one PI ('treated')
(iii) Subtype B isolates from individuals who have received at least three PIs ('heavily treated')
(iv) Non-subtype B isolates from untreated individuals (this category is optional; to view these data, the "Subtyping Data" box must be checked)
For RT sequences, these categories include:
(i) Subtype B RT isolates from 'untreated' individuals (those who have not received an RT inhibitor (RTI))
(ii) Subtype B RT isolates from individuals who have received at least one nucleoside RTI (NRTI) (but no non-nucleoside RTI (NNRTI))
(iii) Subtype B RT isolates from individuals who have received at least four NRTI (but no NNRTI).
(iv) Subtype B RT isolates from individuals who have received an NNRTI (+/- one or more NRTI). Note: often these individuals have received multiple NRTI, therefore these data are most relevant to those positions associated with NNRTI resistance.
(v) Non-subtype B RT isolates from untreated individuals (this category is optional; to view these data, the "Subtyping Data" box must be checked)
Note: To minimize reporting bias, the mutation frequency tables contain one sequence per individual. For individuals in whom sequences from multiple isolates were published, the mutation tables include the earliest sequence from untreated persons and the latest sequence from persons receiving antiretroviral therapy. To exclude technical sequencing errors and cases of circulating virus containing unusual variants, the mutation tables include only mutations present as the predominant form whenever multiple clones from the same isolate were sequenced. Sequences of poor quality and those considered to be possible laboratory contaminants are excluded from the data sets.


  	 	       I. Consensus B Sequences
Protease: PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAI GTVLVGPTPVNIIGRNLLTQIGCTLNF RT: PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWR KLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPSINNETPGIRY QYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGFTT PDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVKQLCKLLRGTKA LTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGA HTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEAWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKE PIVGAETFYVDGAANRETKLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEVNIVTDSQYALG IIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVL
II. Summary of Data Sets (9/15/00)
Gene Subtype Treatment # Isolates # References
Protease B None 509 57
Protease B >=1 PI 730 43
Protease B >=3 PI 96 14
Protease Non-B None 163 39
RT B None 253 45
RT B >=1 NRTI 495 42
RT B >=3 NRTI 102 17
RT B NNRTI 138 15
RT Non-B None 117 31


Features that are pending:
(1) Drug-specific data sets: As the number of representative sequences from patients receiving specific drugs or drug combinations increases, the program will provide additional categories of mutation frequency data.
(2) Drug susceptibility data: The HIV RT and Protease Sequence Database has been expanded and now includes nearly all published primary drug susceptibility data on isolates of known sequence. Future versions of HIV-SEQ will extract susceptibility data on sequences that have mutations most closely resembling those of a submitted sample.