GRCh37/hg19 GRCh38/hg38 Multi-Genome Tutorial
From GenPlay, Einstein Genome Analyzer
Goal: This tutorial illustrates how the multi-genome mode of GenPlay can be used to simultaneously display data aligned on different reference genomes. In this tutorial we will compare gene annotation data aligned on GRCh37/Hg19 with gene annotation data aligned on GRCh38/Hg38.
Prerequisite: GenPlay needs to be installed on your computer. If you haven't installed GenPlay yet, please visit the Downloads page and follow the instructions to download and install GenPlay.
Note: The final result of this tutorial is available as a project that can be loaded from the Projects page of this website.
In order to set up and manage a Multi-Genome Project in Genplay, please refer to the following sections of the documentation:
- XML settings file (right click on the link and select Save Link As...)
- VCF file
- Indexed VCF file (Tabix)
- Refseq BED file for NCBI38/hg38 (right click on the link and select Save Link As...)
- Refseq BED file for GRCh37/hg19 (right click on the link and select Save Link As...)
- DNA sequence file for NCBI38/hg38
- DNA sequence file for NCBI37/hg19
Starting a New Project
Selecting the Reference Assembly
After starting GenPlay you will be prompted to select a name, a clade, a genome and an assembly for your project. You can enter hg19 - hg38 Tutorial for the name. Select the mammal clade, the human genome and the hg38 assembly (figure 1).
Then, click on the tool box button on the assembly line. A new window will appear allowing you to select chromosomes. For this tutorial we will work only on the basic chromosomes (chr1 to chr22 plus chrX and Y) . You can select the basic chromosomes by clicking on the Basics button (figure 2).
Setting the Multi-Genome Parameters
Next we need to setup a multi-genome project. To do so, click on the Multi Genome Project radio button at the bottom of the screen and click on Select VCF. Click on the Add... label of the File column to select the VCF file to load. Select the VCF downloaded earlier. Only one VCF file is going to be loaded for this tutorial. The VCF file contains differences between the reference genome NCBI37/hg19 and the reference genome GRCh38/hg38.
Click on the Add... label and then on the Add... menu and select the VCF file downloaded earlier (hg19ToHg38.vcf.gz).
Raw name(s) column
The raw name is automatically filled. In the case of this tutorial there is only one genome beside the hg38 assembly: hg19
The nickname can be used do differentiate samples having the same raw name. In this example we can keep the default nick name.
Since this tutorial is about comparing reference genomes; a generic group name can be Reference genome. Click on the Group 1 text of the Group column and then click on the pencil to edit the the group name.
The result is shown in figure 3.
You can automatically setup the multi-genome project by clicking on the Import Config button at the bottom of the project screen and selecting the XML file downloaded earlier. You have to make sure that the VCF file and the XML file are in the same directory when you choose this option.
Displaying SNPs, Insertions and Deletions
Once you're done with the previous step click on create to initialize the project. This should only take a few seconds.
We can now display variants. Let's start by loading SNPs. To do so, right click on the handler of the first track (the blue part of the track with a number on it) and then select the Add Variant Layer option (figure 4).
Then click on the SNPs check box (figure 5).
Using the same method, load insertions on track 2 and deletions on track 3. The result should be similar to what is shown on figure 6.
Displaying Gene Annotation Layers
Let's start by loading the hg38 gene annotation.
Right click on the handler of the track 4 and select the Add Layer(s) option (figure 7).
Then select the hg38 gene annotation file downloaded at the beginning of this tutorial. On the next screen select Gene Annotation Layer (figure 8).
And then we need to tell GenPlay that the data were aligned on the hg38 reference genome (figure 9)
We now need to repeat the same operation for hg19. You will need to select the other gene annotation file and then select hg19 as the genome used for the alignment (figure 10).
The result of this step is showed in figure 11.
Adding DNA Sequence Layers
We are going to insert two blank tracks. To do so, right click on the track handler of track 1 and select the Insert option of the contextual menu. Repeat this operation a to insert a second empty track.
Now click on the first track handler and select Add Layer(s). Then, select the hg38 DNA file downloaded at the beginning of this tutorial. When asked what was the genome used for the alignment, select hg38.
Repeat this operation for the hg19 DNA sequence file. Make sure to select hg19/Maternal allele as the genome used for the alignment.
You should now be able to visualize DNA sequences. Please note that you might need to zoom-in in order to visualize the DNA sequences. This can be easily done by using the mouse wheel.
The final result of this tutorial is shown in figure 12.