This page lists the change logs of current and previous versions of Sierra program since December 2017. For algorithm change logs (scoring tables and comments) please access Algorithm Updates page.
This update replaced the algorithms of ANRS and Rega to their latest versions.
- ANRS Algorithm was upgraded to v27 (Sep 2017): XML, PDF
- Rega Institute Algorithm was upgraded to 10.0.0 (May 2017): XML, PDF
The ANRS algorithm was updated to version 27 (see PDF) on January 9. We have since made two modifications to this update.
The ANRS algorithm has different scores for dolutegravir (DTG) 50mg QD and DTG 50mg BID as well as for darunavir/r (DRV/r) 600mg/100mg BID and DRV/r 800mg QD. We have modified the ASI implementation of the algorithm as follows. The rules for low and high dose DTG are indicated by DTGQD and DTGBID, respectively. The rules for low and high dose DRV/r are indicated by DRV/r_QD and DRV/r_BID, respectively.
For users of HIValg, DTG will contain the same rules as DTGQD and DRV/r will contain the same rules as those for DRV/r_QD.
This is a major update to the HIVDB genotypic resistance interpretation program software Sierra. The main part of this update is a change from the alignment program LAP (Local Alignment Program) to an open source alignment program NucAmino. This change has allowed us to open source the complete software pipeline for the HIVDB genotypic resistance interpretation program (github.com/hivdb/sierra). The new version of Sierra which uses NucAmino is now called 2.2 and the older version of Sierra which used LAP is now called 2.1.
This change from Sierra 2.1 to Sierra 2.2 provides additional impetus for users of the old Sierra 1.1 SOAP web service to migrate to the Sierra 2 GraphQL web service. Although the interpretation system has not been changed, the change in alignment, quality control, and warnings will affect the interpretations of a small subset of sequences. Therefore, if you are still using the SOAP version of Sierra, please visit this page. Feel free to contact us if you need any help with migration.
The switch from LAP to NucAmino has been accompanied by several additional changes to the process of aligning submitted sequences to HIV-1 pol:
- Each submitted sequence is aligned to the complete HIV-1 pol reference subtype B sequence rather than to PR, RT, and IN separately. This results in slightly different handling of the small number of sequences that previously exhibited alignment anomalies at the PR/RT and RT/IN boundaries.
- We have also introduced stricter criteria for quality control at the beginning and end of sequences.
- We have introduced a new set of warnings for sequences (i) containing very short regions of PR, RT, or IN; (ii) containing poor quality at its boundaries; and (iii) containing inappropriate concatenations of noncontiguous regions of pol.
As with any change to an alignment algorithm, there will be minor differences in the placement of frameshifts and unusual indels and in the handling low quality regions at sequence boundaries. To determine the effect of the change from LAP to NucAmino, we ran Sierra versions 2.1 and 2.2 on 115,118 HIV-1 pol-containing GenBank sequences. For each sequence we recorded the GenBank accession number, pol sequence coverage, the list of mutations (including indels) and frameshifts. We have created a spreadsheet that contains 5,501 sequences for which Sierra versions 2.1 and 2.2 yielded non-identical results:
- For 3,639 sequences, Sierra 2.2 applied stricter filtering of amino acids found at the sequence’s 5’ and 3’ ends. For these sequences, the median number of filtered amino acids was 3 (range: 1 to 145). For 34 of these sequences, a DRM (defined as a mutation that receives a drug-resistance mutation penalty) was filtered. The filtered mutations included: PR10F (2), PR20T (2), PR82L (1), RT41L (1), RT108I_115F_138G (1), RT184I (1), RT188F (1), RT215A_210R (1), RT215S_227L (1), RT219Q (1), RT221Y (1), RT225H (1), RT227L (2), RT230I (3), RT238T (1), RT318F (3), RT348I (9), IN155H (1), IN263K (1). Each of these cases was reviewed manually to ensure that filtering was appropriate. Additionally, a warning is provided to indicate the number of amino acids filtered at sequence boundaries. For four sequences, Sierra 2.2 applied a less strict filter and a median of 2.5 addition amino acids (range: 1 to 4).
- For 1,045 sequences, there was a difference in how frameshifts were handled. This occurs due to a largely unavoidable amount of stochasticity in the alignment process when frameshifts (which are usually technical artifacts) are present. This resulted in 11 sequences with different DRMs. For eight sequences, Sierra 2.1 detected DRMs not detected by Sierra 2.2 including RT41L (1), RT65R (1), RT100I (1), RT100I_103N (1), RT103N (2), and RT215N (2). For three sequences, Sierra 2.2 detected DRMs not detected by Sierra 2.1 including RT65E (1), RT67N (1), and RT70R (1). In each of these examples, the version that did not detect the DRM placed the frameshift at the DRM position or detected a highly unusual mutation at the DRM position. In each of these cases, it could not be inferred which version produced the preferable alignment. Additionally, a warning is provided to indicate the presence of the frameshift.
- There were 695 sequences containing a protease sequence concatenated to a section of RT that did not include the 5’ part of its sequence. For example, the RT sequence would often begin at about position 35. This was common in sequences generated by the TruGene Assay. Sierra 2.1 occasionally found that the end of PR could align to the beginning of RT resulting in an inappropriate sequence overlap. This has now been corrected. This change did results in any difference in detectable DRMs.
- For 47 sequences, the placement of an indel (an insertion or deletion of one or more nucleotide triplets), differed between Sierra 2.1 and Sierra 2.2 largely as a result of unavoidable stochasticity: (i) For 24 of these sequences, there was an insertion in the PR codon 33/41 loop region. In two cases this resulted in the detection of PR33F by Sierra 2.2 but not Sierra 2.1; (ii) For 20 of these sequences, there was an indel in either PR, RT, or IN. The changed placement of the indel did not result in a change in detectable DRMs; and (iii) For 3 sequences, Sierra 2.1 detected a deletion at RT codon 69, whereas Sierra 2.2 placed the deletion at RT codon 70. Each of these deletions receive a similar mutation penalty. Of note, Sierra 2.1 and 2.2 agreed on the placement of deletions in this region in the remaining 61 sequences containing a deletion at each of these positions. The two versions always agreed on the placement of insertions in this region (placing them at codon 69) as a result of post-processing performed after the alignment.
- There were 38 sequences with (i) short gene fragments, which were generally of low quality; (ii) deletions or more than 30 bp; or (iii) an unusual concatenation of multiple partial genes were filtered by Sierra 2.2 but not Sierra 2.1.
- There were 33 sequences containing concatenations of multiple partial genes, separated by ‘N’s. This resulted in some partial gene sequences being detected by Sierra 2.2 but not Sierra 2.1. Fifteen of these 33 sequences contained PI-resistance mutations. These sequences were accompanied by a warning indicating that the complete gene had not been sequenced.
The raw comparison result can be found in this Excel spreadsheet: download.