HIVdb version 8.1.1 (last updated 2016-09-23)

Release Notes

For HIVdb, HIVseq, and HIValg

Introduction

The presence of HIV-1 drug resistance before starting a new antiretroviral (ARV) drug treatment regimen is an independent predictor of the virological response to that regimen. Several studies have shown that the use of genotypic resistance testing prior to the start of new treatment regimen increases the likelihood of virological response to that regimen. However, interpreting the results of HIV-1 drug resistance tests is difficult. First, there are many different drug resistance mutations (DRMs). Second, these DRMs cause varying levels of reduced susceptibility to different ARVs. Third, standard genotypic resistance tests fail to detect DRMs present at low levels within a patient's virus population.

The HIVdb program assesses how active an ARV is likely to be against a particular mutant virus compared with that ARV's activity against a wildtype virus. When combined with a sound understanding of the principles of ARV therapy, the interpretations and associated comments help health care providers better understand the results of HIV-1 genotypic resistance tests. However, because HIVdb does not consider other relevant clinical data such as previous drug-resistance test results, ARV treatment history, plasma HIV-1 RNA levels, CD4 counts, and drug toxicity, it does not have the logical power to instruct clinicians on which ARVs should be used when constructing a treatment regimen.

HIV-1 drug resistance is rarely an all-or-none phenomenon. Clinicians treating infected patients usually need the answers to the following two questions: (i) Does the genotypic resistance test suggest that the patient will respond to a drug in a manner comparable to a patient with a wild-type isolate? (ii) Does the test suggest that the patient will obtain any antiviral benefit from the drug? To answer these questions it is necessary to grade the extent of inferred resistance relative to both wild type and to the most resistant isolates.

There are 3 programs in the HIV Drug Resistance Database which share a common code base: HIVdb, HIVseq, and HIValg. HIVdb is an expert system that accepts user-submitted HIV-1 pol sequences and returns inferred levels of resistance to 22 FDA-approved ARV drugs including 8 protease inhibitors (PIs), 7 nucleoside RT inhibitors (NRTIs), 4 non-nucleoside RT inhibitors (NNRTIs), and 3 integrase strand transfer inhibitors (INSTIs). In the HIVdb system, each DRM is assigned a drug penalty score and a comment; the total score for a drug is derived by adding the scores of each DRM associated with resistance to that drug. Using the total drug score, the program reports one of the following 5 levels of inferred drug resistance: susceptible, potential low-level resistance, low-level resistance, intermediate resistance, and high-level resistance.

HIVseq accepts user-submitted protease, RT, and integrase sequences, compares them to the consensus subtype B reference sequence, and uses the differences as query parameters for interrogating the HIV Drug Resistance database. HIVseq then displays the prevalence of PR, RT and IN mutations according to subtype and ARV class treatment history.

HIValg is designed for users interested in comparing the results of different algorithms or who are interested in comparing and evaluating existing and newly developed algorithms. The ability to develop new algorithms that can be run on the HIV Drug Resistance Database depends on the Algorithm Specific Interface (ASI) compiler software.

User Interfaces

InputOutputNo. SamplesInput FormatOutput Format
Mutation List
Mutation classification
Predicted ARV activity
Mutation comments
Mutation penalty scores
1
Text box
Drop-down menu
HTML
DNA Sequence
Mutation classification
Predicted ARV activity
Mutation comments
Mutation penalty scores
Sequence quadivty control
1 to 2000
Text box
Drop-down menu
HTML
Spreadsheet
XML
Sierra Webservice
Mutation classification
Predicted ARV activity
Mutation comments
1 to 500User scriptXML

Mutation List Interface

The mutation list interface was developed to help HIV care providers who typically do not have the complete DNA sequence of a patient’s virus sample but who instead have an external genotypic resistance report generated by the laboratory used by their clinic. This external report will usually have a list of DRMs. The external report will also often have a list of all mutations that differ from the laboratory's reference sequence - even if they are not DRMs. Although most external reports contain predictions of drug resistance, many care providers are also interested in the HIVdb predictions and comments. In addition, this interface allows care providers to type in various mutation permutations such as a composite list of mutations present on more than one genotypic test.

To use the Mutation List Form, select mutations using the drop down boxes or by entering the mutations into the textboxes. When using the textboxes, it is preferable that amino acid mutations are entered in uppercase whereas insertions and deletions should be entered as "Insertion" or "Deletion". The mutations must be separated by one or more spaces or commas. If there is a mixture of more than one amino acid at a position, include both amino acids with or without forward slashes. Including the consensus amino acid residue before the amino acid position is optional.

When using the drop down menu, choose the amino acid present in the sequence. If the amino acid is not present, then select the asterisk, which will open a text box allowing you to enter an amino acid that is not on the drop-down list.

Sequence Interface

To use the Sequence Analysis Form, paste one or more non-interleaved sequences in FASTA format into the textbox or upload a file containing up to 2000 non-interleaved FASTA sequences. In accordance with the FASTA format each sequence should be preceded by a line containing ">" followed by a sequence name and optionally followed by additional descriptors separated by pipes ("|").

Web Service

In addition to its html interface, HIVdb can be accessed via a Web service called Sierra. Sierra is a computer-to-computer programmatic interface designed for research and clinical labs that typically upload large numbers of sequences and wish to automate and individualize the manner in which data are extracted from HIVdb’s output. Sequences submitted to HIVdb either via the Web interface or Sierra are not stored.

Output Options

There are three types of output: HTML, spreadsheet, and XML. As indicated in the table above, HTML is the only option for those using the Mutation List form and XML is the only option for those using the Sierra web service. However, those using the Sequence Analysis form can specify HTML, XML, or one or more spreadsheet outputs.

HTML Output

HTML output contains the output for either one sequence or for multiple sequences. Reports for sequences contain a menu bar that allows the user to choose the report for a specific sequence. The HTML output includes the following information:

  1. Header: This contains the SequenceID, which is the fasta header and a Date field containing the date the program was run
  2. Summary Data: This section shows which residues in PR, RT, and/or IN were present in the submitted sequence and the closest matching subtype. This section also contains two buttons. The "Pretty pairwise" button displays how each gene of a sequence aligns to the consensus reference sequence. The "SDRMs" button indicates the surveillance DRMs present in the sequence.
  3. Sequence Quality Assessment: This section contains figures for each gene in which each mutation is indicated by a bar. Blue bars indicate DRMs, black bars indicate differences from the consensus amino acid sequence, and red bars indicate problematic mutations. Hovering over the bar displays the mutation text. This section will also contain warnings if there are indicators of overall or localized poor sequence quality including the presence of stop codons, frame shifts, unusual insertions or deletions, APOBEC-mediated G-to-A hypermutation, and an excess of highly unusual mutations.
  4. Mutation Classification: PR mutations are classified into Major DRMs, Accessory DRMs, and mutations that do not receive mutation penalty scores (Other). RT mutations are classified into NRTI DRMs, NNRTI DRMs, and Other. IN mutations are classified into Major DRMs, Accessory DRMs, and Other.
  5. Drug Resistance Interpretation: For PR, drug-resistance interpretations are provided for each of the ritonavir-boosted PIs. For RT, interpretations are provided for seven NRTIs and four NNRTIs. For IN, interpretations are provided for the three FDA-approved INSTIs.
  6. Comments: Comments are provided for (i) All DRMs with a mutation penalty score, (ii) Unscored mutations that have been associated with drug resistance but are considered to have minimal or no impact on currently used ARVs, and (iii) Highly unusual mutations at known drug-resistance positions that are not established DRMs.
  7. Scoring Table: There is one table for each ARV class. The first column indicates each of the DRMs and DRM combinations that contributed to the overall penalty score for one or more ARVs. The remaining columns contain the penalty scores for the ARVs indicated in the column header. The total penalty score for each ARV -- obtained by adding each of the individual scores -- is shown in the column header.

Spreadsheet output files

There are three types of spreadsheet / tabular output files for the HIVdb program: (i) Sequence summary; (ii) Resistance summary; and (iii) Formatted amino acid alignments for each gene. These files are useful for users submitting sets of sequences. These files contain tab-delimited text files that can readily be opened in Excel or compatible spreadsheet software. These files are downloaded into the user's download directory. If more than one output file is requested, the files are downloaded as a zip file.

Sequence summary

After the header row, each row contains one sequence. The fields are organized into the following types of information:

  1. SequenceID: The fasta headers of the submitted sequences.
  2. Gene coverage: The first and last residue of PR, RT, and/or IN.
  3. Subtype: Subtype information including the closest matching subtype and its genetic distance from one of 200 reference sequences.
  4. Percentage of ambiguities (Pcnt Mix): Percentage of nucleotides with R (A/G), Y (C/T), M (A/C), W (A/T), S (G/C), or K (G/T).
  5. Mutation Classification:PR mutations are classified into Major DRMs, Accessory DRMs, and mutations that do not receive mutation penalty scores (Other). RT mutations are classified into NRTI DRMs, NNRTI DRMs, and Other. IN mutations are classified into Major DRMs, Accessory DRMs, and Other. For each gene in a sequence, there are three comma-separated lists of mutations. Columns contain ‘None’ when there are no mutations belonging to the relevant classification. Columns contain 'NA' when the relevant gene was not sequenced.
  6. Surveillance Drug Resistance Mutations (SDRMs)The SDRMs present in PR and RT.
  7. Additional treatment-selected mutations (TSMs): TSMs are mutations that are non-polymorphic in ARV-naive individuals but occur with significantly increased frequency in ARV-experienced individuals. The most common TSMs are also DRMs. However, many TSMs are not established DRMs because they are either uncommon and/or they usually occur in sequences containing multiple DRMs and therefore have not been well studied.
  8. Sequence Quality Assessment: For each gene, frame shifts, insertions and deletions, stop codons, mutations indicative of APOBEC-mediated G-to-A hypermutation, highly ambiguous nucleotides (B, D, H, V, N), and highly unusual amino acids.

Resistance summary

After the header row, each row contains one sequence. The fields are organized into the following types of information:

  1. SequenceID: The fasta headers of the submitted sequences.
  2. Gene coverage: The first and last residue of PR, RT, and/or IN.
  3. DRM Classification: PR DRMs are classified into Major DRMs and Accessory DRMs. RT DRMs are classified into NRTI DRMs and NNRTI DRMs. IN DRMs are classified into Major DRMs and Accessory DRMs. For each gene in a sequence, there are two comma-separated lists of mutations. Columns contain ‘None’ when there are no mutations belonging to the relevant classification.
  4. Drug resistance levels and scores:For PR, drug-resistance interpretations are provided for each of the ritonavir-boosted PIs. For RT, interpretations are provided for seven NRTIs and four NNRTIs. For IN, interpretations are provided for three INSTIs. There are 5 drug resistance levels: 1 indicates susceptible, 2 indicates potential low-level resistance, 3 indicates low-level resistance, 4 indicates intermediate resistance, 5 indicates high-level resistance. The scores are the sum of each mutation penalty score for a drug. Scores less than 10 indicate susceptible; scores between 10 and 14 indicate potential low-level resistance; scores between 15 and 29 indicate low-level resistance; scores between 30 and 59 indicate intermediate resistance. Scores of 60 or greater indicate high-level resistance.

Formatted amino acid alignments

Separate amino acid alignment files for each gene are created. The header contains two rows. The first contains each amino acid position for the gene: 1 to 99 for PR, 1 to 560 for RT, and 1 to 288 for IN. The second contains the consensus B amino acid for each position. Each row that follows contain the amino acids for each sequence. A '.' indicates positions that were not present in the sequence. A '-' indicates positions in the sequence matching the consensus. Amino acids represent differences from the consensus sequence. If the sequence codon translates to a mixture of 2 to 4 amino acids, each amino acid is shown. However, if the sequence translates to a mixture of 5 or more amino acids, an 'X' is shown. Insertions are shown using an underscore (e.g., 'S_SS'). Deletions are indicated by 'del'.

XML Output

This output contains information similar to that provided by the HTML output. A complete summary of this output can be found this page: hivdb.stanford.edu/page/xml-2009-spec.

Drug Resistance Mutations (DRMs) and Sequence Interpretation

DRM classification

A DRM can be characterized according to the following criteria:

  1. Polymorphism frequency: its prevalence in virus isolates from ART-naïve patients in regions with low-levels of TDR. Polymorphic DRMs may occur in the absence of selective drug pressure. Polymorphic DRMs usually have little effect on ARV susceptibility when they occur without other DRMs.
  2. Treatment prevalence: its prevalence in virus isolates from patients receiving ART compared with its prevalence in virus isolates from ART-naïve patients. Nonpolymorphic DRMs that occur frequently in patients receiving an ARV are usually associated with clinically significant resistance to that ARV.
  3. In vitro phenotype: its contribution to reduced in vitro susceptibility either alone or in combination with other DRMs.
  4. Association with VF: its association with a reduced virological response to an ARV in a new treatment regimen.

The HIVDB report groups mutations within each gene into 3 lists:

  • RT
    • NRTI: Mutations in this list nearly always have a penalty score for one or more NRTIs. Rarely, this list may contain an unusual amino acid at a position that is associated with NRTI resistance.
    • NNRTI: Mutations in this list nearly always have a penalty score for one or more NNRTIs. Rarely, this list may contain an unusual amino acid at a position that is associated with NNRTI resistance.
    • Other: Mutations that do not have a penalty score. This category will occasionally have rare non-polymorphic treatment-selected mutations (TSMs) that have not been shown to contribute to drug resistance.
  • PR
    • Major: Nonpolymorphic DRMs that make a major contribution to reduced susceptibility to one or more PIs. These usually have a penalty score of 30 to 60.
    • Accessory (formerly Minor): Nonpolymorphic or minimally polymorphic mutations that contribute to reduced susceptibility in combination with major DRMs. Highly unusual and poorly characterized mutations at major drug-resistance positions are also included in this list.
    • Other: Mutations that do not receive penalty scores. Some of these are highly polymorphic mutations that may be weakly associated with drug resistance or increased virus fitness when they occur in combination with Major DRMs. It may also include rare nonpolymorphic TSMs that have not been shown to contribute to drug resistance.
  • IN
    • Major: Primarily nonpolymorphic DRMs that make a major contribution to reduced susceptibility to one or more INSTIs and several nonpolymorphic. These usually have a penalty score of 30 to 60.
    • Accessory (formerly Minor): Nonpolymorphic or minimally polymorphic mutations that contribute to reduced susceptibility in combination with major DRMs. Highly unusual and poorly characterized mutations at major drug-resistance positions are also included in this list.
    • Other: Mutations that are not associated with drug resistance and do not receive penalty scores. These may include highly polymorphic mutations that may be weakly associated with drug resistance, but that are primarily accessory, are also placed in this category. It may also include rare nonpolymorphic PI- or INI-selected mutations that have not been studied for their effects on drug susceptibility.

DRM penalty scores and resistance interpretation

  • The estimated level of resistance to a drug is determined by adding up the penalty scores associated with each of the DRMs present in a submitted sequence.
  • Once the total score is calculated the estimated level of resistance can be calculated as follows:
    • Susceptible: Total score 0 to 9
    • Potential low-level resistance: Total score 10 to 14
    • Low-level resistance: Total score 15 to 29
    • Intermediate resistance: Total score 30 to 59
    • High-level resistance: Total score >= 60
  • "Susceptible" indicates no evidence of reduced ARV susceptibility compared with a wild-type virus. "Potential low-level resistance" indicates that the sequence may contain mutations indicating previous ARV exposure or may contain mutation that are associated with drug resistance only when they occur with additional mutations. "Low-level resistance" indicates that there that the virus encoded by the submitted sequence may have reduced in vitro ARV susceptibility or that patients harboring viruses with the submitted mutations may have a suboptimal virological response to treatment with the ARV. "Intermediate resistance" indicates a high likelihood that a drug's activity will be reduced but that the drug will likely retain significant remaining antiviral activity. "High-level resistance" indicates that the predicted level of resistance is similar to those observed in viruses with the highest levels of in vitro drug resistance or that clinical data exist demonstrating that patients infected with viruses having such mutations usually have little or no virological response to treatment with the ARV.
  • Some combinations of DRMs receive penalty scores that are added to the total score for a drug. For example:
    • The RT mutations L74I/V (L74I or L74V) and M184I/V (M184I or M184V) have penalty scores of 30 and 15, respectively for abacavir (ABC). In addition, L74I/V + M184/V has a penalty score of 15 for ABC. Therefore, a sequence with L74V + M184V will have a total penalty score of 60 (30 + 15 + 15) which is translated into high-level ABC resistance.
    • The PR mutations M46I/L, I54V, and V82A have penalty scores of 10, 15, and 30 respectively for lopinavir/r (LPV/r). M54A/L/M/S/T/V + V82A/C/F/M/L/S/T/V has a penalty score of 10 for LPV/r. Therefore, a sequence with M46IL + I54V + V82A will have a total penalty score of 65 (10 + 15 + 30 + 10) which is translated into high-level LPV/r resistance.
  • When there is a mixture of two mutations at the same position, the mutation associated with the largest penalty is scored. Therefore, if a mutation associated with a negative penalty score is present in a mixture with the wildtype amino acid at that position, there will be no negative penalty score.
  • Some DRMs have negative penalty scores for certain drugs. For example:
    • The RT mutations M184I/V have scores of -10 for AZT, d4T, and tenofovir (TDF).
    • The PR mutation I50L has scores -10 for LPVR/r and darunavir/r (DRV/r).
  • The HIVdb output contains a table in which each of the individual and combination scores associated with a drug is listed. Each scored DRM is hyperlinked to a set of entries in HIVDB that the support the DRM’s associated with reduced susceptibility. The table can indicate whether a drug with intermediate resistance is just above the low-level resistance threshold (e.g., has a score of 30 or 35) or close to the high-level resistance threshold (e.g., has a score of 50 or 55).
  • The most recent scores are available as tab-delimited files or tables sortable by position or drug:
  • Comparison of Rules and patterns between the recent two large versions (currently are 7.0 and 8.1.1) are also available as tab-delimited files (Download all):

    Rules Comparison

    Patterns Comparison

Comments

HIVdb output contains 3 types of comments:

  • Comments on ARV resistance mutations that receive mutation penalty scores. These comments are designed to justify the score and to provide additional information about a mutation that may be clinically relevant, depending on the clinical scenario.
  • Comments on mutations that have been potentially associated with reduced ARV susceptibility but which do not have mutation penalty scores because they are either highly polymorphic or have a minimal, if any, effect on drug susceptibility. Comments on these mutations are designed to alert the user to the presence of these mutations while at the same time justifying the absence of mutation penalty scores.
  • Comments on highly unusual mutations at known drug resistance positions.
  • The most recent comments are available as tab-delimited files or web pages:

Program Updates

The scoring tables, comments, and programs are frequently updated; these updates are tracked in the Updates page. Below is a listing of our current and previous versions linking to the specific improvements since January 2003.

HIVseq

HIVseq accepts user-submitted RT, PR, and IN mutations or sequences. For each gene, the prevalence of each mutation (whether submitted directly or obtained from the sequence) in previously published virus sequences belonging to 8 subtypes obtained from ARV-class naive and experienced individuals is returned in a table described. Reports for sequences contain a menu bar that allows the user to choose the report for a specific sequence. Reports for sequences also contain a header section with a sequence summary and sequence quality assessment.

The mutation prevalence tables contain 18 columns. The first column contains an ordered list of all the mutations in a sequence. The second column contains the nucleotides encoding each mutation. Columns 3 to 10 contain the prevalence of each mutation in viruses belonging to subtypes A, B, C, D, F, G, CROF01_AE, and CRF02_AG in ARV-class naive individuals. Columns 11 to 18 contain the prevalence of each mutation in viruses belonging to the same subtype in ARV-class experienced individuals.

The table header contains the number of individuals in HIVDB with gene sequences belonging to appropriate subtype and treatment category. However, not all of those sequences encompass all gene positions. This is particularly relevant for RT for which the number of sequenced viruses encompassing positions above 300 to 400 is much lower than the number in the table header. The body of the table contains the prevalence of those mutations present in >= 0.1% of HIVDB sequences belonging to the subtype and treatment category. The mutation prevalence is also a hyperlink to a web page with detailed information on each virus belonging to the subtype and treatment category. Each mutation row can be expanded to indicate the prevalences of each amino acid at that mutation's position.

To minimize reporting bias, the mutation frequency tables contain one sequence per individual. For individuals in whom sequences from multiple isolates were published, the mutation tables include the earliest sequence from untreated persons and the latest sequence (while on therapy) from persons receiving antiretroviral therapy. To exclude technical sequencing errors and cases of circulating virus containing unusual variants, the mutation tables include only mutations present as the predominant form whenever multiple clones from the same isolate were sequenced. Sequences of poor quality and those considered to be possible laboratory contaminants are excluded from the data sets.

HIValg

HIValg makes it possible to compare the results of different ASI-encoded algorithms. Like HIVdb and HIVseq, HIValg accepts user-submitted RT, PR, and IN mutations or sequences. Additionally, the user can select algorithms from a list of three commonly used genotypic drug resistance interpretation systems or can upload their own ASI files.

Reports for sequences contain a menu bar that allows the user to choose the report for a specific sequence. Reports for sequences also contain a header section with a sequence summary and sequence quality assessment. The main output is a table in which the column headers are interpretation systems, the row names are drugs, and the contents of each cell has three elements: (i) SIR: A simplified drug resistance interpretation scheme provided by the ASI that indicates whether the virus sequence is considered to be susceptible (S), intermediately or possibly resistant (I), or fully resistant (R). Algorithms with more than three levels are simplified from the algorithm's original interpretation scheme. (ii) Interpretation (Intrp): The original interpretation scheme provided by the algorithm; and (iii) Explanation (Expln): A list of the rules triggered based on the submitted mutations. For those systems using mutation penalty scores, the total score and detailed mutation penalty scores are shown.

Algorithms

The following algorithms are available online in their XML form in the "Algorithm Specification Interface page". They are all encoded using the ASI format, which is also described in the same page.

  • ANRS: Agence Nationale de Recherches sur le SIDA 4,5.
  • HIVDB: The current version of the drug-resistance interpretation program on this site is referred to as the "HIVdb" algorithm. The latest version of HIVdb algorithm ASI file is downloadable at hivdb.stanford.edu/downloads/HIVDB_8.1.1.xml.
  • Rega Institute: Courtesy of Professor Anne-Mieke Vandamme 7.

Each of the algorithms reports their results differently. The table below shows how the results of the algorithm are normalized for comparison by the program. Users of HIValg can select whether they prefer to receive output with the original interpretation or with the normalized interpretation ('SIR' option).

AlgorithmSIR
ANRSSusceptiblePossible resistanceResistance
HIVDBSusceptible
Potential low-level resistance
Low-level resistance
Intermediate resistance
High-level resistance
Rega InstituteSusceptible GSS 1
Susceptible GSS 1.5
Intermediate Resistant GSS 0.75
Intermediate Resistant GSS 0.5
Intermediate Resistant GSS 0.25
Resistant GSS 0

Appendices

Appendix 1. Consensus B Sequences

The subtype B consensus sequence is derived from an alignment of subtype B sequences maintained at the Los Alamos HIV Sequence Database (hiv.lanl.gov). The consensus B sequence is therefore a commonly used reference sequence to which new sequences are compared. Files containing the consensus PR, consensus RT, and consensus IN are also available.

Consensus B SequencesAmino Acids
Protease
PQITLWQRPLVTIKIGGQLKEALLDTGADDTVLEEMNLPGRWKPKMIGGIGGFIKVRQYDQILIEICGHKAIGTVLVGPTPVNIIGRNLLTQIGCTLNF
RT
PISPIETVPVKLKPGMDGPKVKQWPLTEEKIKALVEICTEMEKEGKISKIGPENPYNTPVFAIKKKDSTKWRKLVDFRELNKRTQDFWEVQLGIPHPAGLKKKKSVTVLDVGDAYFSVPLDKDFRKYTAFTIPSINNETPGIRYQYNVLPQGWKGSPAIFQSSMTKILEPFRKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRWGFTTPDKKHQKEPPFLWMGYELHPDKWTVQPIVLPEKDSWTVNDIQKLVGKLNWASQIYAGIKVKQLCKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPFKNLKTGKYARMRGAHTNDVKQLTEAVQKIATESIVIWGKTPKFKLPIQKETWEAWWTEYWQATWIPEWEFVNTPPLVKLWYQLEKEPIVGAETFYVDGAANRETKLGKAGYVTDRGRQKVVSLTDTTNQKTELQAIHLALQDSGLEVNIVTDSQYALGIIQAQPDKSESELVSQIIEQLIKKEKVYLAWVPAHKGIGGNEQVDKLVSAGIRKVL
integrase
FLDGIDKAQEEHEKYHSNWRAMASDFNLPPVVAKEIVASCDKCQLKGEAMHGQVDCSPGIWQLDCTHLEGKIILVAVHVASGYIEAEVIPAETGQETAYFLLKLAGRWPVKTIHTDNGSNFTSTTVKAACWWAGIKQEFGIPYNPQSQGVVESMNKELKKIIGQVRDQAEHLKTAVQMAVFIHNFKRKGGIGGYSAGERIVDIIATDIQTKELQKQITKIQNFRVYYRDSRDPLWKGPAKLLWKGEGAVVIQDNSDIKVVPRRKAKIIRDYGKQMAGDDCVASRQDED

Appendix 2. Sample Data Sets

Several FASTA-format sample data sets have been compiled to provide users with sample inputs for running our programs. To view the results for these sequences, copy and paste them into the input form.