Last updated on Jul 27, 2017

HIV Subtyping Program

The HIVDB subtyping program assigns a subtype to a submitted sequence based on its pol sequence. The assigned subtype can be either a pure subtype, an established circulating recombinant form (CRF), a unique recombinant form (URF), a non-group M HIV-1 sequence or an HIV-2 sequence.

Following alignment of a submitted sequence, the program creates a concatenated sequence comprising PR +/- RT +/- IN. This sequence is then compared to a set of subtype-specific reference sequences. This comparison generates a table of uncorrected distances between the submitted sequence and each of the reference sequences. This distance is determined after masking codons at which the submitted sequence contains an SDRM. Nucleotide ambiguities that partially match a reference sequence nucleotide are counted as complete matches.

The distances table is sorted in ascending order. The subtype of the submitted sequence is determined by examining the closest matching reference sequence and applying a set of rules that make use of known properties of the different subtypes and CRFs:

  1. If the distance to the closest matching reference sequence is greater than 11%, the subtype is reported as unknown.
  2. If the distance to the closest matching reference sequence is below the distance upper-limit for that subtype and the closest matching reference is a pure subtype, CRF01_AE, CRF02_AG, or other non-simple CRF (as defined in the subtype properties table), then this subtype is reported to the user.
  3. If the distance to the closest matching reference sequence is below the distance upper-limit for that subtype and the closest matching reference is a simple CRF, then the program determines whether the submitted sequence contains sufficient coverage on either side of the established breakpoint(s) for that CRF. If there is sufficient coverage, then the program reports this CRF. Otherwise, the program reports the parent subtype for the region covered by the submitted sequence.
  4. If the distance to the closest matching reference sequence is above the distance upper-limit for that subtype and the closest matching reference is a CRF01_AE, CRF02_AG, or a simple CRF, we examine the original table for the next-closest matching parent sequence. If the distance to the parent sequence is within 1% of the closest matching sequence, then the parent sequence is reported. If the distance to the parent sequence is more than 1% greater than the distance to the recombinant, then we consider the subtype to be a URF and each of the parents of the recombinant are reported separated by a plus sign. Subtype A is considered to be the parent of CRF01_AE and subtypes A and G are considered to be the parents of CRF02_AG.
  5. If the distance to the closest matching reference sequence is above the distance upper-limit for that subtype and the closest matching reference is a pure subtype, then this pure subtype is reported to the user with a warning indicating that further analysis using a more sophisticated subtyping program is indicated.