The Phylogeny of HIV

(Based on wEMBOSS & eBioKit)

Teacher: Hans-Henrik Fuxelius



Overview

During this exercise you will:
  1. Perform a multiple alignment of gp120 protein sequences from HIV and SIV (using Clustal)
  2. Construct an unrooted tree from the alignment of gp120 sequences (using the "neighbor joining" algorithm in Clustal)
  3. Visualize the gp120-based tree using the program Jalview
  4. Consider the evolutionary implications of the gp120-based tree
  5. Investigate the robustness of your tree by bootstrapping
  6. Perform a second multiple alignment based on POL protein sequences from HIV and SIV.
  7. Construct a new neighbor joining tree from the POL alignment
  8. Investigate whether the POL-based tree supports the conclusions from the gp120-based analysis
  9. Perform a multiple alignment of the same POL sequences and a POL sequence from HTLV-1.
  10. Root the POL-based tree using the HTLV sequence as an outgroup

Background: AIDS, HIV1, HIV2, and SIV

Acquired Immune Deficiency Syndrome (AIDS) is caused by two divergent viruses, Human Immunodeficiency Virus one (HIV-1) and Human Immunodeficiency Virus two (HIV-2). HIV-1 is responsible for the global pandemic, while HIV-2 has, until recently, been restricted to West Africa and appears to be less virulent in its effects. Viruses related to HIV have been found in many species of non-human primates (monkeys, apes, ...) and have been named Simian Immunodeficiency Virus, SIV.

These primate viruses are lentiviruses, a subfamily of the retroviruses. Retroviruses have RNA genomes but are unique among RNA viruses because they have a replication cycle that involves the reverse transcription of their RNA genome into DNA (this is the opposite direction compared to the usual flow of information from DNA to RNA). The reverse-transcribed viral DNA is stably incorporated into the genomic DNA of an infected cell and subsequent transcription can then create multiple copies of mRNA encoding new viral material.

Like other retroviruses, particles of HIV are made up of 2 copies of the single-stranded RNA genome packaged inside a protein core, or capsid. The core particle also contains viral proteins that are essential for the early steps of the virus life cycle, such as reverse transcription and integration. A lipid envelope, derived from the infected cell, surrounds the core particle. Embedded in this envelope are the surface glycoproteins of HIV: gp120; and, gp41. The gp120 protein is crucial for binding of the virus particle to target cells. It is the specific affinity of gp120 for the CD4 protein that targets HIV to those cells of the immune system that express CD4 on their surface (e.g., T-helper lymphocytes, monocytes, and macrophages).

Purpose of exercise, description of data:

In this exercise you are going to investigate the phylogenetic relationship between HIV and SIV and investigate the evolutionary aspects.
You will do this using two different data sets:
  1. A set consisting of 27 different gp120 protein sequences from isolates of HIV1, HIV2, chimpanzee SIV and macaque monkey SIV: gp120.fasta
  2. A set consisting of 20 different POL-polyprotein sequences from HIV1, HIV2, chimpanzee SIV and sooty mangabey SIV: hiv-siv-pol.fasta
    and with the HTLV-1 sequence: htlv-hiv-siv-pol.fasta
(Note for enthusiasts: a number of lines of evidence have indicated that macaques are not naturally infected with SIV and that they have acquired their SIV infection while in captivity by cross-species transmission of SIV from sooty mangabeys. This means that both the macaque SIVs and the sooty mangabey SIVs originate from sooty mangabeys).


Finally - The Exercise:

First, we will use the file gp120.fasta, which contains 27 gp120 envelope protein sequences from isolates of HIV-1, HIV-2, and SIV in fasta-format.
In this file, all HIV-1 sequences have names starting HV1. All HIV-2 sequences have names starting HV2. SIVCZ was isolated from chimpanzee. SIVMK, SIVM1, and SIVML were isolated from macaques.

Multiple alignment

We will use the program Clustal (named emma in wEMBOSS) to make a multiple alignment of the virus sequences.
  1. Load the sequences into ClustalW (wEMBOSS->emma):
  2. Computing an unrooted tree:
    In this part of the exercise we will use Jalview with the gp120_emma.fas (from above) to produce a phylogenetic tree.  The tree is built with the neighbour joining algorithm, and is based on distances computed from the multiple alignment you just constructed.
  3. View a plot of the unrooted tree:
    There are several programs for visualizing tree-files like the  gp120_emma.tree. Today we will use the java version of the program Dendroscope, which can be downloaded as OS X or in Windows. Jalview is very nice for viewing trees but Dendroscope is the choice for advanced features, editing and publish ready printing.

  4. Think for a minute about the implications!
    What does this tree tell us about the phylogenetic relationship of HIV-1, HIV-2 and SIV? Notice especially where the two different groups of SIV cluster compared to the two different groups of HIV.
    When you've thought about the problem, you can read a brief explanation. Additionally, you can find a good description of HIV evolution here: http://evolution.berkeley.edu/evolibrary/article/0_0_0/medicine_04

Bootstrapping a neighbor joining tree

  1. wEMBOSS also has the possibility of bootstrapping your neighbor joining tree:
  2. View the bootstrapped tree:



Rooting a tree using an outgroup

In this part of the exersize you will use a data set of 20 different POL-polyprotein sequences isolated from HIV-1, HIV-2, chimpanzee SIV, and sooty mangabey SIV. (The Pol gene encodes three different polypeptides: integrase, reverse transcriptase, and protease. It is expressed as a single polyprotein and is subsequently cleaved by protease into its three separate parts).

First, you will construct a neighbor-joining tree like before and investigate whether this new, independent data set confirms the conclusions you made based on the alignment of gp120 sequences. Then you will add a POL-polyprotein sequence from HTLV-1 to the data set and construct a new tree, that you can then root using the HTLV sequence as an outgroup. (HTLV-1 is another member of the family of retroviruses and is thus more distantly related to HIV - which was originally named HTLV-3 by the way)


  1. Download and have a look at the POL sequence file:
    Download the aligned hiv-siv-pol.aln file to the working directory, and inspect the alignment with a text editor or alignment viewer as Jalview. As mentioned, this file contains POL-polyprotein sequences from HIV-1, HIV-2, chimpanzee SIV, and sooty mangabey SIV.

  2. Construct a neighbor-joining tree with no outgroup:
    Re-open the Jalview window and load the sequence file hiv-siv-pol.fasta. Now, start the alignment by choosing:
  3. Inspect the unrooted tree in Dendroscope:
  4. Construct a neighbor-joining tree with an added outgroup:
  5. Inspect the unrooted tree in Dendroscope:
  6. Define outgroup:

    We will now use the same data for constructing a rooted tree, using the HTLV sequence as a way of defining where to place the root.

    Open "htlv-hiv-siv-pol.phb" in Dendrogram if it is not open.

    For this purpose mark the HTLV branch (it will become red marked) of the tree with the mouse and select "EDIT" -> "Reroot"

    The outgroup will be used to place the root of the tree. The rationale is as follows: our data set consists of sequences from HIV-1, HIV-2, SIV and HTLV. We know from other evidence that the lineage leading to HTLV branched off before any of the remaining viruses diverged from each other. The root of the tree connecting the organisms investigated here, must therefore be located between the HTLV sequence (the "outgroup") and the rest (the "ingroup"). This way of finding a root is called "outgroup rooting", and constructs a tree where the outgroup is a monophyletic sister group to the ingroup.

    The results from the rooting service shows first the original tree(s) and in the bottom the constructed rooted tree(s).

    Test: On the sketch you made before, indicate which branch the root is located on. Was this were you expected it?

  7. What can generally now be said about the evolution of HIV and in particular related to humans?