Multi-Genome Tutorial
From GenPlay, Einstein Genome Analyzer
Contents
Getting started
In order to set up and manage a Multi-Genome Project in Genplay, please refer to the following sections of the documentation:
Conversion between NCBI36/hg18 and GRCh37/hg19
Description
This tutorial describes how to display concurrently tracks mapped on the genome assembly NCBI36/hg18 and tracks mapped on the genome assembly GRCh37/hg19. In the example, the user will be able to see all the modifications on the NCBI36/hg18 genome leading to the GRCh37/hg19 reference genome.
Note: The final result of this tutorial is available as a project that can be loaded from the Projects page of this website.
Files
- XML settings file
- VCF file
- Indexed VCF file (Tabix)
- Refseq BED file for GRCh37/hg19
- Refseq BED file for NCBI36/hg18
Steps
Project settings
Project name
The first thing to do is to choose a name for the new project; here the name is GenPlay-MG – Reference genome tutorial (Figure 1).
Project assembly
The reference genome for this tutorial is GRCh37/hg19. The mammal clade and human genome need to be selected (Figure 2).
Chromosome selection
The VCF file contains Structural Variants for chromosomes 1 to 22 and chromosomes X and Y. The list of chromosomes available in the project can be set by clicking on the settings button (toolbox image) next to the assembly name (Figure 3).
VCF Loading
Manually
Next we need to setup a multi-genome project. To do so, click on the Multi Genome Project radio button at the bottom of the screen and click on Select VCF. Click on the Add... label of the File column to select the VCF file to load. Select the VCF downloaded earlier. Only one VCF file is going to be loaded for this tutorial. The VCF file contains differences between the reference genome NCBI36/hg18 and the reference genome GRCh37/hg19.
Group column
Since this tutorial is about comparing reference genomes; a generic group name can be Reference genome. Click on the Group 1 text of the Group column and then click on Add... to enter group (Figure 4).
The Group name editor should looks like the Figure 5 below.
Once the values has been added to the list, it can be saved by closing the "Group name editor window"
value: Reference genome
Genome column
The genome name is an Alias for the selected raw name. In this tutorial, the genome name is going to be Hg18. On the Genome name list editor, user clicks on the plus button to invoke the input text box and fills it (Figure 6).
The Genome name list editor should looks like the Figure 7 below.
Once the values has been added to the list, it can be saved by closing the "Genome name list editor window"
value: Hg18
Type column
This field cannot be edited by the users. The provided VCF file is a Structural Variant type, user therefore has to choose SV (Figure 8).
value: SV
File column
Once the VCF file is downloaded, user has to open the File list editor, user clicks on the plus button to show the file chooser dialog and choose the VCF file according to its location.
value: VCF path
Raw name(s) column
The raw name list is automatically filled. In the case of this tutorial there is only one genome: NCBI36 (Figure 10).
value: NCBI36
Again, value is saved by closing the windows
Automatically
You can automatically setup the multi-genome project by clicking on the Import Config button at the bottom of the project screen and select the XML file downloaded earlier. You have to make sure that the VCF file and the XML file are in the same directory when you choose this option.
Conclusion
Finally, the screen should be like the one on Figure 11.
Conclusion
The welcome screen should finally be similar to the Figure 12.
The "Create" button will create the project and will run the synchronization.
GRCh37/hg19 genes loading
Files can be loaded by right clicking on the track handler (left part of the track displaying the track number). Right click on a track handler and then choose "Add Layer(s)". Select the hg19 gene annotation bed file downloaded at the beginning of the tutorial. Select Gene Annotation Layer when prompted. The window showed in figure 13 will appear.
You need to specify to which genome were the data of the file aligned. Here, we need to choose "Feb 2009 (GFCh37/hg19)" because the BED file contains data aligned on that genome. The gene file for GRCh37/hg19 reference is now loaded.
NCBI36/hg18 genes loading
Repeat the same operation for the gene annotation from hg18. This time you will need to select "Reference genome - hg18 (NCBI36)" (Figure 14).
Conclusion
You can now navigate into the different chromosomes and visualizes differences between both genomes using the stripes. All genes are perfectly synchronized and are display according to the meta-genome coordinates.
The Figure 15 shows an example of the result of this tutorial. It is possible to see deletions (in red) and insertions (in green) in the NCBI36/Hg18 reference genome compare to the GCh37/Hg19 reference genome.
Chromosome: chr1
Position: 143,822,670