Last updated on Feb 13, 2017

HIV Genotyping Program

The HIVDB genotyping program assigns a genotype to a submitted sequence solely based on its pol sequence. The assigned genotype can be either a pure subtype, an established circulating recombinant form (CRF), a unique recombinant form (URF), a sub-subtype, a non-group M HIV-1 sequence or an HIV-2 sequence.

Following alignment of a submitted sequence, the program creates a concatenated sequence containing PR +/- RT +/- IN. This sequence is then compared to each of the reference sequences in this FASTA file. This comparison generates a “distances table” containing the uncorrected distance between a submitted sequence and each of the reference sequences. The distance is determined after masking codons at which the submitted sequence contains an SDRM. Nucleotide ambiguities that partially match a reference sequence nucleotide are counted as complete matches.

The distances table is sorted in order of ascending distance. The genotype of the submitted sequence is then determined by examining the closest matching reference sequence and applying a set of rules that are represented in a genotype properties table.

  1. If the distance to the closest matching reference sequence is greater than 11%, the subtype is reported as unknown.
  2. If the distance to the closest matching reference sequence is below the distance upper-limit for that genotype and the closest matching reference is a pure subtype, CRF01_AE, CRF02_AG, or other non-simple CRF (as defined in the genotype properties table), then this genotype is reported to the user.
  3. If the distance to the closest matching reference sequence is below the distance upper-limit for that genotype and the closest matching reference is a simple CRF, then the program determines whether the submitted sequence contains sufficient coverage on either side of the established breakpoint(s) for that CRF. If there is sufficient coverage, then the program reports this CRF. Otherwise, the program reports the parent subtype for the region covered by the submitted sequence.
  4. If the distance to the closest matching reference sequence is above the distance upper-limit for that genotype and the closest matching reference is a CRF01_AE, CRF02_AG, or a simple CRF, we examine the original table for the next-closest matching parent sequence. If the distance to the parent sequence is within 1% of the closest matching sequence, then the parent sequence is reported. If the distance to the parent sequence is more than 1% greater than the distance to the recombinant, then we consider the genotype to be a URF and each of the parents of the recombinant are reported separated by a plus sign. Subtype A is considered to be the parent of CRF01_AE and subtypes A and G are considered to be the parents of CRF02_AG.
  5. If the distance to the closest matching reference sequence is above the distance upper-limit for that genotype and the closest matching reference is a pure subtype, then this pure subtype is reported to the user with a warning indicating that further analysis using a more sophisticated subtyping program is indicated.