|
|
| | |
|
Run Program Using:
Sample Sequence Descriptions
HMDR3-197 Multi drug resistant RT and protease sequence (Accession# AF047318).
RF WT RT and protease treatment naive sequence (Accession# M17451).
INS-PT2 RT sequence with insertion (Accession# AF096882).
| |
|
| Input: |
| Nucleic acid sequence of HIV-1 RT and/or protease:
Single sequence or set of multiple sequences.
Sequence name and date are optional though they should be
used to obtain ACTG/SDAC formatted output. |
| Input format: |
| Paste sequence into text box or use file upload button
to choose text file with sequence. Sets of multiple sequences
must be in non-interleaved fasta format. Note: Submitted sequences
are not added to the database or stored on the server. |
| Output: |
| (1) Formatted and aligned nucleic acid and amino acid sequence. |
| (2) List of differences from the consensus B amino acid sequence ("mutations") including mixtures. |
| (3) Insertions and/or deletions. |
| (4) Data on the frequency with which each mutation occurs
in individuals according to HIV-1 subtype and type of
drug therapy. |
| (5) Hyperlinks to references associated with each mutation. |
| (6) Comparison with reference sequences of known subtype. |
| (7) Background information on each mutation. |
| Notes: Feedback from users in order to improve the program
is appreciated. To save your results it is necessary to print
them from the screen or to save them on your own computer.
There will be an option in future versions for multiple
sequences and their reports to be accessed for the duration
of a working session. Analyzed sequences and results will
not be stored or examined locally. |
| Sequence translation and alignment: The program
identifies the sample sequence's correct reading frame by
determining which reading frame contains the following
conserved motifs: |
| Protease: "PQ(I|V)T", "DTG", "GCTLN"; |
| RT: "PISP", "WPLT", "D(V|I)GDA", "QY(.)DDL", "WMG(Y|F)"; |
| The program then uses the positions of these motifs to
infer the starting positions of the protease and RT. |
| Frame shifts The presence of motifs in more than
one reading frame indicates a reading frame shift.
When this occurs, the program locates, records, and
removes such shifts by applying an optimal sequence
alignment algorithm to the region between the closest
motifs in different reading frames. Although reading
frame shifts in HIV-1 RT and protease have not been
reported, they may conceivably result from defective
virions or sequencing errors. If a reading frame shift
is detected a warning will be highlighted in red at
the beginning of the program. |
| Amino acid insertions and deletions: Amino acid insertions
or deletions will cause the distances between conserved motifs to
be different from the inter-motif distances in the consensus
reference sequence. When an amino acid insertion or deletion is
detected, the program locates, records, and removes it using an
optimal sequence alignment algorithm. RT insertions and deletions
have been reported in 1%- 2% of isolates from heavily treated
persons. |
| Representation of amino acid mixtures: Nucleotide triplets
containing ambiguities are translated into each of the possible
amino acids they encode. For example, WMC is translated to NTYS
(N for AAC, T for ACC, Y for TAC, S for TCC). |
| Comparison to the consensus B sequence: Genetic analysis of
HIV-1 isolates has revealed multiple different group M (main)
subtypes as well as several divergent group O (outlier) and group N
(non-M, non-O) isolates. The group M subtypes differ from each other
by 10%-30% along their genomes (usually about 10% in RT and protease).
The most common subtype in the United States and Europe is subtype B.
The subtype B consensus sequence is derived from an alignment of
subtype B sequences maintained at the Los Alamos HIV Sequence Database
(hiv-web.lanl.gov). The consensus B sequence is therefore a commonly
used reference sequence to which new sequences are compared. |
| Comparison to sequences with known subtype: Following alignment,
the nucleotide sequence is compared to a set of reference sequences
of different subtypes. The list is obtained from the National Center
for Biotechnology Information's HIV-1 Subtyping Tool
(www.ncbi.nlm.nih.gov/retroviruses/HIV1). The genetic distance of the
sample sequence to each of the reference sequences is calculated and
shown in a table towards the end of the report. Note: Determination
of HIV subtype using a single-sequence comparison is not robust
and may be misleading in the case of isolates resulting from
intersubtype recombination. A reference guide to HIV-1 classification
can be found at the
Los Alamos HIV Sequence Database web site and in the 4/6/2000
issue of Science. |
| Mutation frequency data: Each sample mutation is used as a
query parameter to interrogate the HIV RT and Protease Sequence
Database. Within the database, mutation frequency tables contain
data on the frequency with which each mutation occurs in different
categories of HIV-1 isolates. |
| For protease sequences, these categories include: |
| (i) Subtype B protease isolates from 'untreated' individuals
(those who have not received a protease inhibitor (PI)) |
| (ii) Subtype B isolates from individuals who have received at
least one PI ('treated') |
| (iii) Subtype B isolates from individuals who have received at
least three PIs ('heavily treated') |
| (iv) Non-subtype B isolates from untreated individuals
(this category is optional; to view these data, the
"Subtyping Data" box must be checked) |
| For RT sequences, these categories include: |
| (i) Subtype B RT isolates from 'untreated' individuals
(those who have not received an RT inhibitor (RTI)) |
| (ii) Subtype B RT isolates from individuals who have received
at least one nucleoside RTI (NRTI) (but no non-nucleoside
RTI (NNRTI)) |
| (iii) Subtype B RT isolates from individuals who have received
at least four NRTI (but no NNRTI). |
| (iv) Subtype B RT isolates from individuals who have received
an NNRTI (+/- one or more NRTI). Note: often these
individuals have received multiple NRTI, therefore these
data are most relevant to those positions
associated with NNRTI resistance. |
| (v) Non-subtype B RT isolates from untreated individuals
(this category is optional; to view these data, the
"Subtyping Data" box must be checked) |
| Note: To minimize reporting bias, the mutation frequency tables
contain one sequence per individual. For individuals in whom
sequences from multiple isolates were published, the mutation
tables include the earliest sequence from untreated persons and
the latest sequence from persons receiving antiretroviral therapy.
To exclude technical sequencing errors and cases of circulating
virus containing unusual variants, the mutation tables include
only mutations present as the predominant form whenever multiple
clones from the same isolate were sequenced. Sequences of poor
quality and those considered to be possible laboratory
contaminants are excluded from the data sets. |
I. Consensus B Sequences
Protease:
PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAI
GTVLVGPTPVNIIGRNLLTQIGCTLNF
RT:
PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWR
KLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPSINNETPGIRY
QYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGFTT
PDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVKQLCKLLRGTKA
LTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGA
HTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEAWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKE
PIVGAETFYVDGAANRETKLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEVNIVTDSQYALG
IIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVL
II. Summary of Data Sets (9/15/00)
| Gene |
Subtype |
Treatment |
# Isolates |
# References |
| Protease |
B |
None |
509 |
57 |
| Protease |
B |
>=1 PI |
730 |
43 |
| Protease |
B |
>=3 PI |
96 |
14 |
| Protease |
Non-B |
None |
163 |
39 |
| RT |
B |
None |
253 |
45 |
| RT |
B |
>=1 NRTI |
495 |
42 |
| RT |
B |
>=3 NRTI |
102 |
17 |
| RT |
B |
NNRTI |
138 |
15 |
| RT |
Non-B |
None |
117 |
31 |
| Features that are pending: |
| (1) Drug-specific data sets: As the number of
representative sequences from patients receiving
specific drugs or drug combinations increases,
the program will provide additional categories
of mutation frequency data. |
| (2) Drug susceptibility data: The HIV RT and
Protease Sequence Database has been expanded and
now includes nearly all published primary drug
susceptibility data on isolates of known sequence.
Future versions of HIV-SEQ will extract
susceptibility data on sequences that have mutations
most closely resembling those of a submitted sample.
|
|