HBV Site Release Notes
HBVseq accepts user-submitted HBV RT sequences, determines their genotypes, and compares them to the genotype consensus reference sequences. The mutations defined as differences between the submitted sequence and the consensus reference sequence are used as query parameters for interrogating a local HBV RT drug resistance database (HBVrt DB) to retrieve the prevalence of each mutation according to genotype and treatment. Each mutation in the HBVseq output is linked to the publications from which the data output were obtained.
HBVrt DB was constructed by annotating the body of publicly available HBV RT sequences which were obtained from a BLAST search using a HBV RT consensus amino acid sequence against GenBank viral sequence files.
To browse the prevalence of mutations at all 344 HBV RT positions in HBVrt DB, use 'HBV RT Mutations According to Genotype and Treatment' page. To browse a list of all HBV RT sequences in GenBank grouped by reference, use 'Blast Hits Database' page.
For more information on each page and database, please read the sections below:
Table of Contents
- Sequence alignment, genotyping and amino acid translation
- Sequence quality assessment
- Automated database lookup
- Phylogenetic analysis
- Summary data
- Sequence quality assessment
- Mutation prevalence according to genotype and treatment
- Phylogenetic tree
- HBV RT Mutations According to Genotype and Treatment
- Blast Hits DB and HBVrt DB
- Consensus amino acid reference sequences
- Genotype-specific nucleotide reference sequences
- A list of absolutely conserved RT positions
- Blast Hits DB Stats
- HBVrt DB Stats
- Sample sets of HBV RT sequences
Like HIVseq, HBVseq demonstrates that published sequence data on a gene can be made available in real time to researchers sequencing new isolates of that gene. HBVseq's output includes the genotype-specific prevalence of each mutation and the frequency with which the mutation has been reported in patients receiving (i) L-nucleosides (3TC, FTC or telbivudine) and/or entecavir, (ii) acyclic nucleoside phosphonates (adefovir or tenofovir), or (iii) L-nucleosides and/or + acyclic nucleoside phosphonates compared with the its frequency in untreated patents.
Sequences can be entered using either the Sequence Analysis Form or the Mutation List Form. To use the Sequence Analysis Form, paste one or more non-interleaved sequences in fasta format into the textbox or upload a file containing up to 100 non-interleaved fasta sequences (character limit: 600,000) . Consistent with the fasta format, each sequence should be preceded by a line containing ">" followed by a sequence name and optionally followed by additional descriptors separated by pipes ("|").
To use the Mutation List Form enter a list of mutations (RT position followed by an amino acid, e.g. 180L, 204V) into the textbox. Amino acid mutations must be entered in UPPERCASE whereas the presence of an insertion or deletion should be entered by typing lowercase "i" or lowercase "d" following the RT position. If there is a mixture of more than one amino acid at a position, enter both amino acids (intervening forward slashes are optional). RT mutations must be separated by either spaces or commas. Consensus amino acids placed before the RT position are optional.
Sequences submitted to HBVseq are not stored on the server.
a. Sequence alignment, genotyping, and amino acid translation
Each submitted nucleotide sequence is aligned to the consensus genotype A HBV RT amino acid sequence (appendix 1) using a nucleotide to amino acid sequence local alignment program ("Lap.c" by X Huang, Genomics 1996).
A sequence that does not include RT positions 180 to 240 or that contains fewer than 100 amino or that has <50% identity to each of the consensus genotype sequences yields an error report and HBVseq will stop processing for the sequence. A sequence containing multiple insertions, deletions, and frame-shifts may not align successfully and will yield a warning.
Once sequence alignment is done successfully, a sequence is then compared to a list of genotype-specific reference sequences (appendix 2). The genotype of the closest reference sequence is assigned the submitted sequence.
The nucleotide sequence generated by local alignment is gap-stripped and translated in the correct reading frame using the standard genetic code. The resulting amino acid is considered a mutation if it differs from the consensus genotype reference sequence (appendix 1).
Nucleotide triplets containing IUPAC ambiguities (e.g. R indicates a mixture of A and G) are translated into each of the possible amino acids they encode. For example, ATR indicates a mixture of ATA (Isoleucine; I) and ATG (Methionine; M). However, when a nucleotide triplet contains too many mixtures (i.e. the translated codon would contain >4 different amino acids) an "X" is substituted for the list of mutations.
b. Sequence quality assessment
For each sequence, the HBVseq output flags (i) Stop codons, frame shifts, and amino acid insertions and deletions; (ii) Positions containing highly ambiguous nucleotides: (BDHVN); (iii) Highly unusual mutations at absolutely conserved RT positions (appendix 3) that have been rarely if ever been reported previously.
c. Automated database lookup
Each of the mutations identified in a sequence is used to interrogate HBVrt DB to ascertain the mutation's prevalence according to genotype and treatment.
d. Phylogenetic analysis
A phylogenetic tree is created from the submitted sequence(s) and a set of representative sequences belonging to each of the eight genotypes and recently reported genotype I. The tree is created by the PAUP program using the neighbor-joining method applied to a matrix of genetic distances calculated using the HKY85 substitution model and a gamma distribution at variable sites.
HBVseq output for a submitted sequence is divided into three sections: (i) Summary Data, (ii) Sequence Quality Assessment, (iii) A tabular display of Mutation Prevalence According to Genotype and Treatment, and (iv) A phylogenetic tree containing the submitted sequences in the context of representative sequences belonging to each of the genotypes.
a. Summary data
This section reports the RT amino acid positions present in the submitted sequence and indicates whether the sequence contains frameshifts, insertions, deletions or well-characterized drug-resistance mutations (Lok 2007). It also reports the genotype of the sequence and the percent similarity to the genotype reference nucleotide sequence. Sequences with a percent similarity < 95% may be recombinants and should be genotyped by one of the following alternate programs: Oxford HBV Automated Subtyping Tool, STAR.
b. Sequence quality assessment (QA)
This section indicates positions that contain stop codons, frame shifts, insertions, deletions, highly ambiguous nucleotides, and mutations at absolutely conserved positions. The accompanying figure indicates positions with QA problems as short red lines, positions with polymorphic mutations as short blue lines, and drug-resistance positions as tall blue lines.
c. Mutation prevalence according to genotype and treatment
This section contains a table containing the frequency (%) of each mutation in nucleos(t)ide RT inhibitor(N(t)RTI)-naive individuals according to genotype and in N(t)RTI-treated individuals according to drug class: (i) L-nucleoside and/or entacavir, (ii) acyclic nucleoside phosphonates, or (iii) L-nucleoside and/or entacavir and one or more acyclic nucleoside phosphonates. The table contains one row for each mutation in the submitted sequence. The total numbers of individuals (by category) from whom sequences were available in HBVrt DB are listed in the table header.
Columns 1 to 3 list the position, the submitted codon, and the submitted codon's translation. Columns 4 to 12 contains the consensus amino acid followed by the frequency of each mutation previously reported in N(t)RTI-naive individuals according to genotype (A, B, C, D, E, F, G, H and I). Column 13 contains pooled data for sequences from N(t)RTI-naive individuals belonging to all genotypes. Amino acids which represent the consensus for one or more genotypes are listed at the top followed by the frequency of all other reported mutations in sequences from N(t)RTI-naive individuals.
Columns 14 to 16 list the consensus amino acids followed by the frequency of each mutation in pooled genotypes from N(t)RTI-treated individuals: (i) L-Nucleosides (3TC, FTC or telbivudine) and/or entecavir, (ii) acyclic nucleoside phosphonates (adefovir or tenofovir), (iii) L-Nucleosides and/or entacavir + acyclic nucleoside phosphonates.
For the purposes of calculating mutation prevalence, we performed the following analyses: (i) when sequences were available before and after an individual received N(t)RTIs, the pre-therapy sequences were used for calculating prevalence in untreated individuals, whereas the post-therapy sequences were used for calculating prevalence in treated individuals; (ii) individuals having more than one isolate with the same mutation were counted only once; (iii) when sequences of multiple clones of the same virus isolate were present, we used the consensus sequence of the multiple clones.
Each non-consensus mutation is a hyper-link to a web page containing detailed information for each reported sequence, including its references, GenBank accession number, and complete sequence and treatment.
d. Phylogenetic tree
The last part of the output contains a phylogenetic tree created built from the submitted sequence(s) and a set of representative sequences belonging to each of the eight genotypes. Submitted sequence names are indicated in red. The nexus tree file can be downloaded for viewing using other tree viewing tools.
2. HBV RT Mutations According To Genotype and Treatment
Whereas HBVseq reports a table containing the prevalence of submitted mutations (or mutations that were identified in a submitted sequence) in HBVrt DB according to genotype and treatment, this program "HBV RT Mutation According To Genotype and Treatment" outputs a table containing the prevalence of mutations at all 344 HBV RT positions in HBVrt DB.
Users can select an option to show all mutations regardless of their prevalence or mutations that occur in >0.5% of viruses. Users can also choose to ignore those mutations that occur as part of electrophoretic mixtures. The table has the same columns as the table described above in section HBVseq: 3.c. The cells in the table contains the consensus amino acid for each genotype followed by the prevalence of non-consensus variants. Each mutation is linked to page containing a table listing the publications in which the mutation had previously been reported.
3. Blast Hits DB and HBVrt DB
A new release of GenBank flatfiles containing viral sequences has been downloaded from NCBI. A BLAST search using a consensus HBV RT sequence against GenBank sequences was performed to collect HBV RT sequences. The search results were grouped by reference and added to Blast Hits DB. The reference may include a published paper or the title provided with the GenBank submission. The number of sequences and references included in the BLAST search results are summarized in the Blast Hits DB Stats table (appendix 4).
The sequences in Blast Hits DB were curated to identify those sequences from individuals with known nucleoside RT inhibitor treatment from whom codons 180 to 240 were sequenced. These sequences were annotated with the year and country of origin and were used to populate HBVrt DB. Sequences of poor quality, virus constructs, or defective integrated virus genomes were excludes. When sequences from multiple clones were reported, we created a consensus amino acid sequence to use to calculate prevalences for the HBVseq and the "HBV RT Mutation Prevalence According to Genotype and Treatment" page. When multiple sequences from the same patient were obtained, we used one sequence before therapy and one following therapy for the HBVseq and "HBV RT Mutation Prevalence According to Genotype and Treatment" page. The number of individuals in HBVrt DB according to genotype and treatment are summarized in the HBVrt DB Stats table (appendix 5).
The "Blast Hits DB" page contains a table with a list of all HBV RT sequences in GenBank grouped by reference. The first three fields contain the author, year of publication, title, and citation of the primary reference. The fourth field is the e-value of the sequence in the study with the lowest e-value. The '# in GB' field indicates the number of sequences in GenBank that cite this reference as the primary reference. The '# in HBVDB' field indicates the number of sequences from this GenBank reference that are in HBVrt DB. The Annotation field indicates if the sequences from the study are in HBVrt DB ('HBVDB') or are not in HBVrt DB because sufficient annotation are not available ('Unpublished', 'HBV Rx and/or other data NA') or because the sequences failed to meet the HBVrt DB inclusion criteria (e.g. 'Gene fragments').
A list of consensus HBV RT amino acid sequences for each of nine genotypes ('A','B','C','D','E','F','G', 'H' and 'I') used for defining mutations.
A list of HBV RT nucleotide reference sequences for each of eight genotypes for defining the genotype of a submitted sequence.
A list of HBV RT positions which are absolutely conserved in all eight genotypes. An amino acid followed by a position is the consensus amino acid at the position. Any non-consensus amino acids at these positions in submitted sequences will be indicated as "Unusual Residues" in HBVseq reports.
The number of sequences and references in the recent BLAST search results
The numbers of individuals available in the HBVrt DB according to genotype and treatment
Appendix 6. Sets of sample HBV RT sequences
Set 1: five sequences.
Set 2: sequences from Margeridon-Thermet, 2009.