2.1 User Interface
- For each of the three programs, sequences can be entered by the following methods:
- Pasting one or more (up to 100 at a time) nucleotide sequences in Fasta format: The first line contains a greater-than-sign ('>') followed by a sequence name, optionally followed by additional characteristics separated by pipes ('|'). The remaining lines contain the nucleotide sequence.
- Uploading a file containing one or more non-interleaved sequences in Fasta or GRF (Bayer Diagnostics) format.
- Selecting mutations using drop down boxes or by entering them in a text box, separated by spaces or commas.
- An optional identifier and date.
- A list of output options customizable by checkboxes.
2.2 Sequence Alignment and Mutation List
For all of the programs, users have the option to indicate either a list of sequences or a list of specific mutations. If users enter sequences, these are aligned to the consensus amino acid sequence using the program LAP2. Based on this alignment, mutations, frame shifts, insertions, deletions, and mutations are determined for each sequence.
Mutations are defined as differences from the consensus B reference sequence
(PR and
RT).
In each of the three programs, mutations are divided into those associated with drug resistance ('Resistance Mutations') and those that have not been associated with drug resistance ('Other Mutations'). This separation, however, is not always sharp. There are some mutations that appear to be associated with drug therapy but which are not generally considered drug-resistance mutations. In our programs these mutations are not listed with the accepted drug-resistance mutations.
Nucleotide triplets containing ambiguities are translated into each of the possible amino acids they encode. However, when the resulting list of possible amino acids is more than four, we replace this list with an 'X'. For example, WMC is translated to NTYS (N for AAC, T for ACC, Y for TAC, S for TCC), but WMS is translated to X instead of NTYSK* (N for AAC, T for ACC, Y for TAC, S for TCC, K for AAG, T for ACG, * for TAG, and S for TCG). All possible translations are explicitly defined in the
triplets-table.txt file.
2.3 Quality control analysis
The quality control analysis reports three types of problem positions: a. A list of positions containing stop codons or frame shifts; b. A list of positions containing highly ambiguous nucleotides: N (cannot distinguish between A,C,G, or T), B (contains a combination of C, G, and T), D (contains a combination of A, G, and T), H (contains a combination of A, C, T), and V (contains a combination of A, C, and G). Whereas mixtures of two nucleotides occur commonly and do not reflect sequencing artifact, the presence of mixtures with three or more nucleotides at the same position occurs rarely in high quality sequences; c. A list of positions with atypical mutations. Mutations are considered atypical if they have been observed in <0.1 % in published group M HIV-1 sequences. These three lists are accompanied by a summary figure containing blue lines for each difference from consensus B and red lines for each problem position.
2.4 Subtyping
Each sequence is compared to a list of reference sequences for each of the Main group of HIV-1 sequences representing subtypes A, B, C, D, F, G, H, J, K, CRF01_AE, and CRF02_AG. The subtype of the closest reference sequence is assigned to the submitted sequence. This method will generally be accurate 3; however, it will not accurately characterize uncommon inter-subtype recombinants. In addition, subtype B protease sequences are occasionally misclassified as belonging to subtype D because these subtypes are very similar and the protease contains fewer phylogenetically informative positions compared with RT. If subtype analysis is checked, the program will produce the uncorrected nucleotide distance between a submitted sequence and the reference sequences used for subtyping. A reference guide to HIV-1 classification can be found at the Los Alamos HIV Sequence Database web site and in the 4/6/2000 issue of Science.
2.5 Drug Resistance Interpretation
Both HIVdb and HIValg contain rules-based sequence interpretation programs. HIVdb contains an algorithm developed initially for use by the Stanford University Hospital Diagnostic Virology Laboratory which was made available online in September 2000. A detailed description of HIVdb can be found in section 4 of this document. HIValg allows users to get the results of multiple rules-based interpretations including HIVdb, other publicly available algorithms, as well as, user submitted algorithms. HIValg, however, does not provide the complete output associated with HIVdb (e.g. quality control and comments) and therefore should be used primarily for research purposes. HIValg is described in detail in section 5.
|