GRCh37/hg19 GRCh38/hg38 Multi-Genome Tutorial

From GenPlay, Einstein Genome Analyzer

Jump to: navigation, search

Goal: This tutorial illustrates how the multi-genome mode of GenPlay can be used to simultaneously display data aligned on different reference genomes. In this tutorial we will compare gene annotation data aligned on GRCh37/Hg19 with gene annotation data aligned on GRCh38/Hg38.

Prerequisite: GenPlay needs to be installed on your computer. If you haven't installed GenPlay yet, please visit the Downloads page and follow the instructions to download and install GenPlay.

Note: The final result of this tutorial is available as a project that can be loaded from the Projects page of this website.

Getting started

In order to set up and manage a Multi-Genome Project in Genplay, please refer to the following sections of the documentation:

Downloading Files

Starting a New Project

Selecting the Reference Assembly

After starting GenPlay you will be prompted to select a name, a clade, a genome and an assembly for your project. You can enter hg19 - hg38 Tutorial for the name. Select the mammal clade, the human genome and the hg38 assembly (figure 1).

Figure 1: New Project Window

Then, click on the tool box button on the assembly line. A new window will appear allowing you to select chromosomes. For this tutorial we will work only on the basic chromosomes (chr1 to chr22 plus chrX and Y) . You can select the basic chromosomes by clicking on the Basics button (figure 2).

Figure 2: Project Chromosomes

Setting the Multi-Genome Parameters

Manually

Next we need to setup a multi-genome project. To do so, click on the Multi Genome Project radio button at the bottom of the screen and click on Select VCF. Click on the Add... label of the File column to select the VCF file to load. Select the VCF downloaded earlier. Only one VCF file is going to be loaded for this tutorial. The VCF file contains differences between the reference genome NCBI37/hg19 and the reference genome GRCh38/hg38.

File column

Click on the Add... label and then on the Add... menu and select the VCF file downloaded earlier (hg19ToHg38.vcf.gz).

Raw name(s) column

The raw name is automatically filled. In the case of this tutorial there is only one genome beside the hg38 assembly: hg19

Nickname column

The nickname can be used do differentiate samples having the same raw name. In this example we can keep the default nick name.

Group column

Since this tutorial is about comparing reference genomes; a generic group name can be Reference genome. Click on the Group 1 text of the Group column and then click on the pencil to edit the the group name.

The result is shown in figure 3.

Figure 3: VCF Loader Window

Automatically

You can automatically setup the multi-genome project by clicking on the Import Config button at the bottom of the project screen and selecting the XML file downloaded earlier. You have to make sure that the VCF file and the XML file are in the same directory when you choose this option.

Displaying SNPs, Insertions and Deletions

Once you're done with the previous step click on create to initialize the project. This should only take a few seconds.

We can now display variants. Let's start by loading SNPs. To do so, right click on the handler of the first track (the blue part of the track with a number on it) and then select the Add Variant Layer option (figure 4).

Figure 4: Add Variant Layer

Then click on the SNPs check box (figure 5).

Figure 5: VCF Select Variants to Add

Using the same method, load insertions on track 2 and deletions on track 3. The result should be similar to what is shown on figure 6.

Figure 6: Variant Layers Added

Displaying Gene Annotation Layers

Let's start by loading the hg38 gene annotation.

Right click on the handler of the track 4 and select the Add Layer(s) option (figure 7).

Figure 7: Add Layer

Then select the hg38 gene annotation file downloaded at the beginning of this tutorial. On the next screen select Gene Annotation Layer (figure 8).

Figure 8: Load Gene Annotation

And then we need to tell GenPlay that the data were aligned on the hg38 reference genome (figure 9)

Figure 9: Select hg38

We now need to repeat the same operation for hg19. You will need to select the other gene annotation file and then select hg19 as the genome used for the alignment (figure 10).

Figure 10: Select hg19

The result of this step is showed in figure 11.

Figure 11: Gene Layers

Adding DNA Sequence Layers

We are going to insert two blank tracks. To do so, right click on the track handler of track 1 and select the Insert option of the contextual menu. Repeat this operation a to insert a second empty track.

Now click on the first track handler and select Add Layer(s). Then, select the hg38 DNA file downloaded at the beginning of this tutorial. When asked what was the genome used for the alignment, select hg38.

Repeat this operation for the hg19 DNA sequence file. Make sure to select hg19/Maternal allele as the genome used for the alignment.

You should now be able to visualize DNA sequences. Please note that you might need to zoom-in in order to visualize the DNA sequences. This can be easily done by using the mouse wheel.

The final result of this tutorial is shown in figure 12.

Figure 12: Final Result