Difference between revisions of "Multi-Genome Tutorial"

From GenPlay, Einstein Genome Analyzer

Jump to: navigation, search
(VCF files loading)
(Getting started)
 
(26 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
== Getting started ==
 
== Getting started ==
=== Introduction ===
+
In order to set up and manage a Multi-Genome Project in Genplay, please refer to the following sections of the documentation:
To create a multi-genome session, users must first load VCF files. VCF files are files that describe all the differences between a reference genome and a particular genome.
+
* [[Documentation#Multi Genome Project|Loading a Multi Genome Project]]
 
+
* [[Documentation#Loading a Variant Layer|Loading a Variant Layer]]
The first part of this tutorial presents how to set up a multi-genome project, especially how to load VCF files.
+
* [[Documentation#Variant Layer Operations|Variant Layer operations]]
The second part concerns loading data layers onto tracks. Data layers must be mapped to one of the loaded genomes.
 
Finally, we will describe how to highlight information from VCF files.
 
 
 
=== The Welcome screen ===
 
The welcome screen is the first screen of GenPlay-MG and allow user to create or to load a project.
 
 
 
==== New Project ====
 
In order to create a new project, users must give it a name as shown in Figure 1.
 
[[image:mg_basics_project name.png|center|frame|Figure 1: Text field to define the project name]]
 
<br/>
 
The second step is to choose a reference genome. Users can choose it using the different list according to the clade, the genome and the assembly (Figure 2).
 
[[image:mg_basics_assembly_chooser.png|center|frame|Figure 2: Assembly chooser]]
 
<br/>
 
Several chromosomes are available for each assembly but users can choose to select only some of them.<br/>
 
To open the chromosome chooser (Figure 3), users have to click on the tools button next to the assembly name.
 
[[image:mg_basics_chromosome_chooser.png|center|frame|Figure 3: Chromosome chooser]]
 
<br/>
 
The third and last step is to choose between a ''Simple Genome Project'' and a ''Multi Genome Project''. This tutorial is about multi genome project, after having checked this option, the welcome screen should be as the one shown in Figure 4.
 
[[image:mg_basics_empty_welcome_screen.png|center|frame|Figure 4: Empty welcome screen for multi-genome project]]
 
<br/>
 
 
 
==== Load Project ====
 
Coming soon<br/>
 
(Option unavailable in the beta version)
 
 
 
=== VCF Files ===
 
==== Description ====
 
VCF files describe differences between genomes. Usually, it concerns differences between one or several genomes of interest and the reference genome used for the mapping process. VCF files define multiple type of variations; GenPlay is able to read and represent the followings:
 
* InDels
 
* SNPs
 
* SV (Structural Variation)
 
<br/>
 
A complete description of VCF files is given on the 1000 genomes project website:<br/>
 
[http://www.1000genomes.org/wiki/analysis/variant-call-format/vcf-variant-call-format-version-42 Variant Call Format specification]<br/>
 
 
 
==== Tabix ====
 
===== Introduction =====
 
VCF files contain a lot of information which  makes the scanning (loading) processes longer.<br/>
 
In order to increase the scanning efficiency, VCF files have to be compressed and indexed. The compression is done using BGZip and the indexing with Tabix.<br/>
 
[http://samtools.sourceforge.net/tabix.shtml Tabix manual reference pages]<br/>
 
[http://sourceforge.net/projects/samtools/files/tabix/ Tabix download]
 
 
 
===== VCF files indexing methods =====
 
====== Using GenPlay ======
 
GenPlay is now able to compress and index VCF files using the VCF Loader.<br/>
 
That process is detailed below on the VCF files loading section.<br/>
 
It is fully automatic and non-platform dependent (works on Windows, Linux and Mac).
 
====== Manually ======
 
First, please note the following process must be performed in either Linux or Mac environments.<br/>
 
Each VCF files must be first compress to a BGZF (.bgz file) format. Tabix provides a tool to perform the compression.
 
After compression, VCF files must be indexed using the associated command.
 
Once Tabix is  installed, two commands are necessary to perform the indexation.
 
<br/><br/>
 
 
 
Available commands from the Tabix folder:<br/>
 
''bgzip -f VCF_PATH;''<br/>
 
''tabix –p vcf VCF_PATH;''
 
<br/><br/>
 
 
 
For example, a VCF file named my_vcf.vcf located in the same folder as Tabix can be indexed with the following commands (Figure 5):<br/>
 
''bgzip -f ./my_vcf.vcf;''<br/>
 
''tabix –p vcf ./my_vcf.vcf.gz;''
 
[[image:mg_basics_indexation_commands.png|center|frame|Figure 5: VCF file indexation command]]
 
<br/><br/>
 
 
 
'''Note:''' the first command '''replaces''' the current VCF file by the compressed VCF file (.vcf.gz). The second command '''creates''' the indexed VCF file in the current folder (.vcf.gz.tbi).<br/>
 
More options are available on [http://samtools.sourceforge.net/tabix.shtml Tabix manual reference pages].
 
 
 
=== VCF files loading ===
 
 
 
==== The VCF Loader ====
 
===== Introduction =====
 
The VCF Loader is the most important part of multi-genome project settings. It allows users to load all necessary VCF files and to define how to extract information from them. It appears when users click on the "Edit" button from the welcome screen.<br/>
 
The Figure 6 shows an empty VCF Loader screen.
 
[[image:Mg_welcome_screen_vcf_loader.png|center|frame|Figure 6: VCF loader]]
 
<br/>
 
 
 
GenPlay-MG does not use directly the VCF file, it uses a compress version of it (.gz). Moreover, GenPlay-MG also needs the compress VCF file to be indexed with Tabix. Both file versions must be in the '''same folder''' and must have the '''same name''', only file extensions differ (.gz and .tbi).
 
 
 
By right clicking, the user can add or remove rows.
 
 
 
===== Columns description =====
 
'''''File'''''<br/>
 
This column refers to the VCF file path. Once loaded, the raw name column is automatically filled with every raw genome name contained in the selected VCF file.<br/>
 
'''''Raw name(s)'''''<br/>
 
The ''Raw name(s)'' column list is automatically filled when a VCF file has been chosen. That list contains every genotype headers contained inside the selected VCF file. Because Genome names might be difficult to remembers, GenPlay-MG offers users the option of adding another name (an alias) using the ''Genome'' column.<br/>
 
'''''Nickname'''''<br/>
 
The ''Nickname'' column allows users to associate an alias  to the selected genome. This alias will appear in GenPlay-MG and can be useful because genome names in VCF files are often non descriptive numbers that can be hard to remember.<br/>
 
'''''Group'''''<br/>
 
Users can gather genomes by group. Group names are used to distinguish genomes  and to perform some specific functionalities.<br/><br/>
 
 
 
 
 
===== Columns edition =====
 
''Group'', ''Nickname'' and ''File'' column have their own editable list.To edit a cell, click on it, go over the item you want to edit and choose one of the following action:<br/>
 
- Add (green symbol on empty item)<br/>
 
- Edit (pen symbol on an item)<br/>
 
- Delete (red symbol on an item)<br/>
 
 
 
That way, users can set up all columns before starting (or at the same time) to fill the table.<br/>
 
 
 
'''Note: ''' The ''Raw name(s)'' column is automatically filled with genome name from the selected VCF file, that column cannot be edited manually.
 
 
 
==== Import/Export ====
 
Once a project has been set up, it can be saved using the import/export function. Pressing the export button saves an XML files to the hard drive.  This XML file can then be imported to reload the project.
 
 
 
The XML file structure is simple. Each row are stored in ''row'' mark containing every attribute names such as ''group'', ''genome, ''file'' and ''raw_name''. The settings file is formatted as shown in Figure 9.
 
[[image:mg_basics_xml_settings.png|center|frame|Figure 9: XML file settings]]
 
<br/>
 
 
 
 
 
'''Note:''' If the user moves the VCF files or changes one of its genotype headers, the XML file will not work anymore. User has to modify ''file'' and/or ''raw_name'' attribute values.<br/>
 
 
 
=== Tracks loading ===
 
Once a multi-genome project has been created, GenPlay creates a meta-genome that is the sum of all the loaded genomes and is capable of converting the coordinates of any data files into the meta-genome coordinates. GenPLay can therefore load data files (tracks) mapped in the coordinates of any of the loaded genomes. Of course, when loading a file, user must specify which genome was used for the mapping.
 
 
 
When loading a track, GenPlay displays a list showing every loaded genome below the window allowing user to load the files. Once the information is entered, GenPlay transforms the coordinates in the file into the coordinates of the meta-genome using the differences information of the specified genome.
 
 
 
=== Displaying information ===
 
In order to display variants information related to genome differences, each track has its own window settings. When user right-clicks on the track handler (on the left of the track) of an empty tracks, and clicks on ''Multi Genome Stripes'' the window below appears.
 
[[image:mg_basics_unset_mg_selector.png|center|frame|Figure 10: Multi-genome stripes selection on an empty track]]
 
<br/>
 
 
 
In this example,, there are three families with one member each. Genomes have been mapped on the reference genome GRCh37/hg19. It is possible to show on the selected track information such as insertion, deletion, SNPs and structural variants. User can define the colors (by clicking on the colored squared) and also define stripes transparency using the slider.
 
 
 
If a data file concerning Person 01 of the first family has been loaded, the window looks like the one on Figure 11.
 
[[image:mg_basics_set_mg_selector.png|center|frame|Figure 11: Multi-genome stripes selection]]
 
<br/>
 
 
 
Firstly, all stripes related to variants of the selected genome appear (In this case the insertion and the deletion for person 1, family 1). Secondly, all insertions in the genome of family 1 are shown as black stripes on all other genomes (black stripes are synchronization marker that have been introduced in the meta-genome to be able to display multiple genome at the same time. The black stripes represent insertions in genome others than in the current genome. User can modify stripes visualization using this panel.
 
  
 
== Conversion between NCBI36/hg18 and GRCh37/hg19 ==
 
== Conversion between NCBI36/hg18 and GRCh37/hg19 ==
 
=== Description ===
 
=== Description ===
This tutorial will explain how to display at the same time tracks mapped on genome assembly NCBI36/hg18 or GRCh37/hg19. In the example, user will be able to see all the modifications on the NCBI36/hg18 genome leading to the GRCh37/hg19 reference genome.
+
This tutorial describes how to display concurrently tracks mapped on the genome assembly NCBI36/hg18 and tracks mapped on the genome assembly GRCh37/hg19. In the example, the user will be able to see all the modifications on the NCBI36/hg18 genome leading to the GRCh37/hg19 reference genome.
 +
 
 +
'''Note:''' The final result of this tutorial is available as a project that can be loaded from the [[Projects#Multi-Genome Tutorial| Projects]] page of this website.
  
 
=== Files ===
 
=== Files ===
*[http://www.genplay.net/library/Human/Multi-Genome/hg18tohg19_tutorial_settings.xml XML settings file]
+
*[http://genplay.einstein.yu.edu/library/tutorials/MG-Reference_Genome_Tutorial/hg18tohg19_tutorial_settings.xml XML settings file]
*[http://www.genplay.net/library/Human/Multi-Genome/hg18tohg19_tutorial_sv.vcf.gz VCF file]
+
*[http://genplay.einstein.yu.edu/library/tutorials/MG-Reference_Genome_Tutorial/hg18tohg19_tutorial_sv.vcf.gz VCF file]
*[http://www.genplay.net/library/Human/Multi-Genome/hg18tohg19_tutorial_sv.vcf.gz.tbi Indexed VCF file (Tabix)]
+
*[http://genplay.einstein.yu.edu/library/tutorials/MG-Reference_Genome_Tutorial/hg18tohg19_tutorial_sv.vcf.gz.tbi Indexed VCF file (Tabix)]
*[http://www.genplay.net/library/Human/hg19/Gene_Annotation/RefSeq_From_UCSC_04-23-10(hg19).bed Refseq BED file for GRCh37/hg19]
+
*[http://genplay.einstein.yu.edu/library/Human/hg19/Gene_Annotation/Genes_RefSeq_hg19_09.20.2013.bed Refseq BED file for GRCh37/hg19]
*[http://www.genplay.net/library/Human/hg18/Gene_Annotation/RefSeq_From_UCSC_04-23-10(hg18).bed Refseq BED file for NCBI36/hg18]
+
*[http://genplay.einstein.yu.edu/library/Human/hg18/Gene_Annotation/Genes_RefSeq_hg18_09.20.2013.bed Refseq BED file for NCBI36/hg18]
  
 
=== Steps ===
 
=== Steps ===
 
==== Project settings ====
 
==== Project settings ====
 
===== Project name =====
 
===== Project name =====
User must choose a name for a new project; here the name is ''GenPlay-MG – Reference genome tutorial'' (Figure 1).
+
The first thing to do is to choose a name for the new project; here the name is ''GenPlay-MG – Reference genome tutorial'' (Figure 1).
 
[[image:mg_hg18tohg19_project_name.png|center|frame|Figure 1: Project name]]
 
[[image:mg_hg18tohg19_project_name.png|center|frame|Figure 1: Project name]]
  
 
===== Project assembly =====
 
===== Project assembly =====
According to bed files provided in this tutorial, the reference genome is GRCh37/hg19. User has to select the ''mammal'' clade, the ''human'' genome and the ''Feb 2009 (GRCh37/hg19)'' assembly as in Figure 2.
+
The reference genome for this tutorial is GRCh37/hg19.  
 +
The ''mammal'' clade and ''human'' genome need to be selected (Figure 2).
 
[[image:mg_hg18tohg19_project_assembly.png|center|frame|Figure 2: Project assembly]]
 
[[image:mg_hg18tohg19_project_assembly.png|center|frame|Figure 2: Project assembly]]
  
 
===== Chromosome selection =====
 
===== Chromosome selection =====
The VCF file is about Structural Variants and contains information for chromosomes 1 to 22 and chromosomes X and Y. User can select to load on or more chromosomes by clicking on the settings button next to the assembly name (Figure 3).
+
The VCF file contains Structural Variants for chromosomes 1 to 22 and chromosomes X and Y. The list of chromosomes available in the project can be set by clicking on the settings button (toolbox image) next to the assembly name (Figure 3).
 
[[image:mg_hg18tohg19_chromosome_chooser.png|center|frame|Figure 3: Chromosome chooser]]
 
[[image:mg_hg18tohg19_chromosome_chooser.png|center|frame|Figure 3: Chromosome chooser]]
  
 
===== VCF Loading =====
 
===== VCF Loading =====
'''''Manually'''''<br/>
+
====== Manually ======
To load VCF files , users must first fill the column lists and then select from the list the appropriate data. The VCF Loader appears after clicking on the ''Edit'' button from the welcome screen. The bottom left part of the VCF Loader contains the ''Column list edition'' section. User has to select a column and click on ''Edit'' button in order to show the associated list.
+
Next we need to setup a multi-genome project. To do so, click on the ''Multi Genome Project'' radio button at the bottom of the screen and click on ''Select VCF''.
Only one VCF file is going to be loaded for this tutorial. The VCF file contains differences between the reference genome NCBI36/hg18 and the reference genome GRCh37/hg19.
+
Click on the ''Add...'' label of the File column to select the VCF file to load. Select the VCF downloaded earlier. Only one VCF file is going to be loaded for this tutorial. The VCF file contains differences between the reference genome NCBI36/hg18 and the reference genome GRCh37/hg19.
<br/><br/>
+
 
 +
 
 +
''Group column''
 +
 
 +
Since this tutorial is about comparing reference genomes; a generic group name can be ''Reference genome''.
 +
Click on the ''Group 1'' text of the ''Group'' column and then click on ''Add...'' to enter group (Figure 4).
 +
 
 +
The ''Group name editor'' should looks like the Figure 5 below.
 +
 
 +
Once the values has been added to the list, it can be saved by closing the "Group name editor window"
  
''Group'' column<br/>
 
This tutorial compares reference genome; a generic group name can be ''Reference genome''.
 
On the ''Group name list editor'', user clicks on the plus button to show the input text box and fills it (Figure 4).<br/>
 
The ''Group name list editor'' should looks like the Figure 5 below.<br/>
 
Once the values has been added to the list, it can be saved by closing the "Group name list editor window"<br/>
 
 
''value:'' ''' Reference genome'''
 
''value:'' ''' Reference genome'''
 
<gallery widths=350px perrow=2>
 
<gallery widths=350px perrow=2>
Line 175: Line 53:
 
image:mg_hg18tohg19_group_editor.png|Figure 5: Group name editor
 
image:mg_hg18tohg19_group_editor.png|Figure 5: Group name editor
 
</gallery>
 
</gallery>
<br/><br/>
 
  
''Genome'' column<br/>
+
 
 +
''Genome column''
 +
 
 
The genome name is an Alias for the selected raw name. In this tutorial, the genome name is going to be '''Hg18'''.
 
The genome name is an Alias for the selected raw name. In this tutorial, the genome name is going to be '''Hg18'''.
On the ''Genome name list editor'', user clicks on the plus button to invoke the input text box and fills it (Figure 6).<br/>
+
On the ''Genome name list editor'', user clicks on the plus button to invoke the input text box and fills it (Figure 6).
 +
 
 
The ''Genome name list editor'' should looks like the Figure 7 below.<br/>
 
The ''Genome name list editor'' should looks like the Figure 7 below.<br/>
Once the values has been added to the list, it can be saved by closing the "Genome name list editor window"<br/>
+
Once the values has been added to the list, it can be saved by closing the "Genome name list editor window"
 +
 
 
''value:'' '''Hg18'''
 
''value:'' '''Hg18'''
 
<gallery widths=350px perrow=2>
 
<gallery widths=350px perrow=2>
Line 187: Line 68:
 
image:mg_hg18tohg19_genome_editor.png|Figure 7: Genome name editor
 
image:mg_hg18tohg19_genome_editor.png|Figure 7: Genome name editor
 
</gallery>
 
</gallery>
<br/><br/>
 
  
''Type'' column<br/>
+
 
This field cannot be edited by the users. The provided VCF file is a Structural Variant type, user therefore has to choose '''SV''' (Figure 8).<br/>
+
''Type column''
 +
 
 +
This field cannot be edited by the users. The provided VCF file is a Structural Variant type, user therefore has to choose '''SV''' (Figure 8).
 +
 
 
''value:'' '''SV'''
 
''value:'' '''SV'''
 
[[image:mg_hg18tohg19_type.png|center|frame|Figure 8: VCF type list]]
 
[[image:mg_hg18tohg19_type.png|center|frame|Figure 8: VCF type list]]
<br/><br/>
 
  
''File'' column<br/>
+
 
 +
''File column''
 +
 
 
Once the VCF file is downloaded, user has to open the ''File list editor'', user clicks on the plus button to show the file chooser dialog and choose the VCF file according to its location.<br/>
 
Once the VCF file is downloaded, user has to open the ''File list editor'', user clicks on the plus button to show the file chooser dialog and choose the VCF file according to its location.<br/>
 
''value:'' '''VCF path'''
 
''value:'' '''VCF path'''
 
[[image:mg_hg18tohg19_file_editor.png|center|frame|Figure 9: VCF File editor]]
 
[[image:mg_hg18tohg19_file_editor.png|center|frame|Figure 9: VCF File editor]]
<br/><br/>
 
  
''Raw name(s)'' column<br/>
+
 
 +
''Raw name(s) column''
 +
 
 
The raw name list is automatically filled. In the case of this tutorial there is only one genome: '''NCBI36''' (Figure 10).<br/>
 
The raw name list is automatically filled. In the case of this tutorial there is only one genome: '''NCBI36''' (Figure 10).<br/>
 
''value:'' '''NCBI36'''
 
''value:'' '''NCBI36'''
 
[[image:mg_hg18tohg19_raw_name.png|center|frame|Figure 10: Raw name list]]
 
[[image:mg_hg18tohg19_raw_name.png|center|frame|Figure 10: Raw name list]]
 
Again, value is saved by closing the windows
 
Again, value is saved by closing the windows
<br/><br/>
 
  
'''''Import XML settings'''''<br/>
+
====== Automatically ======
In order to set the project with ease, user can import the settings using the XML file above. Please be careful about the VCF path, user must changes it directly on the xml file if he wants to use the import function.
+
You can automatically setup the multi-genome project by clicking on the ''Import Config'' button at the bottom of the project screen and select the XML file downloaded earlier. You have to make sure that the VCF file and the XML file are in the same directory when you choose this option.
<br/>
 
  
'''''Conclusion'''''<br/>
+
====== Conclusion======
 
Finally, the screen should be like the one on Figure 11.
 
Finally, the screen should be like the one on Figure 11.
 
[[image:mg_hg18tohg19_vcf_loader.png|center|frame|Figure 11: VCF loader]]
 
[[image:mg_hg18tohg19_vcf_loader.png|center|frame|Figure 11: VCF loader]]
Line 223: Line 106:
  
 
==== GRCh37/hg19 genes loading ====
 
==== GRCh37/hg19 genes loading ====
To load a file, user has to do a right click on the left part of the track. Then to choose "Load Gene Track", a file chooser appears to select the file given in this tutorial. After having chosen the BED file, a new selection box appears (Figure 13).
+
Files can be loaded by right clicking on the track handler (left part of the track displaying the track number).  
 +
Right click on a track handler and then choose "Add Layer(s)".  Select the hg19 gene annotation bed file downloaded at the beginning of the tutorial. Select ''Gene
 +
Annotation Layer'' when prompted. The window showed in figure 13 will appear.
 
[[image:mg_hg18tohg19_genome_selector_01.png|center|frame|Figure 13: Genome selection dialog for GRCh37/hg19 genes file]]
 
[[image:mg_hg18tohg19_genome_selector_01.png|center|frame|Figure 13: Genome selection dialog for GRCh37/hg19 genes file]]
 
   
 
   
This box asks which genome is related to the BED file. Here, user has to choose "Feb 2009 (GFCh37/hg19)" option because the BED file contains information about that genome.
+
You need to specify to which genome were the data of the file aligned. Here, we need to choose "Feb 2009 (GFCh37/hg19)" because the BED file contains data aligned on that genome.
Gene file for GRCh37/hg19 reference has been loaded.
+
The gene file for GRCh37/hg19 reference is now loaded.
  
 
==== NCBI36/hg18 genes loading ====
 
==== NCBI36/hg18 genes loading ====
The same operation as loading a gene files for GRCh37/hg19 reference genome. The only step changing is to choose the "Reference genome - hg18 (NCBI36)" option after the BED file selection (Figure 14)
+
Repeat the same operation for the gene annotation from hg18. This time you will need to select "Reference genome - hg18 (NCBI36)" (Figure 14).
 
[[image:mg_hg18tohg19_genome_selector_02.png|center|frame|Figure 14: Genome selection dialog for NCBI36/hg18 genes file]]
 
[[image:mg_hg18tohg19_genome_selector_02.png|center|frame|Figure 14: Genome selection dialog for NCBI36/hg18 genes file]]
 
   
 
   
 
==== Conclusion ====
 
==== Conclusion ====
User can navigate into the different chromosomes and visualizes differences between both genomes using the stripes. All genes are perfectly synchronized and are display according to the meta-genome coordinates.
+
You can now navigate into the different chromosomes and visualizes differences between both genomes using the stripes. All genes are perfectly synchronized and are display according to the meta-genome coordinates.
  
 
The Figure 15 shows an example of the result of this tutorial. It is possible to see deletions (in red) and insertions (in green) in the NCBI36/Hg18 reference genome compare to the GCh37/Hg19 reference genome.<br/>
 
The Figure 15 shows an example of the result of this tutorial. It is possible to see deletions (in red) and insertions (in green) in the NCBI36/Hg18 reference genome compare to the GCh37/Hg19 reference genome.<br/>

Latest revision as of 12:56, 25 June 2014

Getting started

In order to set up and manage a Multi-Genome Project in Genplay, please refer to the following sections of the documentation:

Conversion between NCBI36/hg18 and GRCh37/hg19

Description

This tutorial describes how to display concurrently tracks mapped on the genome assembly NCBI36/hg18 and tracks mapped on the genome assembly GRCh37/hg19. In the example, the user will be able to see all the modifications on the NCBI36/hg18 genome leading to the GRCh37/hg19 reference genome.

Note: The final result of this tutorial is available as a project that can be loaded from the Projects page of this website.

Files

Steps

Project settings

Project name

The first thing to do is to choose a name for the new project; here the name is GenPlay-MG – Reference genome tutorial (Figure 1).

Figure 1: Project name
Project assembly

The reference genome for this tutorial is GRCh37/hg19. The mammal clade and human genome need to be selected (Figure 2).

Figure 2: Project assembly
Chromosome selection

The VCF file contains Structural Variants for chromosomes 1 to 22 and chromosomes X and Y. The list of chromosomes available in the project can be set by clicking on the settings button (toolbox image) next to the assembly name (Figure 3).

Figure 3: Chromosome chooser
VCF Loading
Manually

Next we need to setup a multi-genome project. To do so, click on the Multi Genome Project radio button at the bottom of the screen and click on Select VCF. Click on the Add... label of the File column to select the VCF file to load. Select the VCF downloaded earlier. Only one VCF file is going to be loaded for this tutorial. The VCF file contains differences between the reference genome NCBI36/hg18 and the reference genome GRCh37/hg19.


Group column

Since this tutorial is about comparing reference genomes; a generic group name can be Reference genome. Click on the Group 1 text of the Group column and then click on Add... to enter group (Figure 4).

The Group name editor should looks like the Figure 5 below.

Once the values has been added to the list, it can be saved by closing the "Group name editor window"

value: Reference genome


Genome column

The genome name is an Alias for the selected raw name. In this tutorial, the genome name is going to be Hg18. On the Genome name list editor, user clicks on the plus button to invoke the input text box and fills it (Figure 6).

The Genome name list editor should looks like the Figure 7 below.
Once the values has been added to the list, it can be saved by closing the "Genome name list editor window"

value: Hg18


Type column

This field cannot be edited by the users. The provided VCF file is a Structural Variant type, user therefore has to choose SV (Figure 8).

value: SV

Figure 8: VCF type list


File column

Once the VCF file is downloaded, user has to open the File list editor, user clicks on the plus button to show the file chooser dialog and choose the VCF file according to its location.
value: VCF path

Figure 9: VCF File editor


Raw name(s) column

The raw name list is automatically filled. In the case of this tutorial there is only one genome: NCBI36 (Figure 10).
value: NCBI36

Figure 10: Raw name list

Again, value is saved by closing the windows

Automatically

You can automatically setup the multi-genome project by clicking on the Import Config button at the bottom of the project screen and select the XML file downloaded earlier. You have to make sure that the VCF file and the XML file are in the same directory when you choose this option.

Conclusion

Finally, the screen should be like the one on Figure 11.

Figure 11: VCF loader
Conclusion

The welcome screen should finally be similar to the Figure 12.

Figure 12: Welcome screen

The "Create" button will create the project and will run the synchronization.

GRCh37/hg19 genes loading

Files can be loaded by right clicking on the track handler (left part of the track displaying the track number). Right click on a track handler and then choose "Add Layer(s)". Select the hg19 gene annotation bed file downloaded at the beginning of the tutorial. Select Gene Annotation Layer when prompted. The window showed in figure 13 will appear.

Figure 13: Genome selection dialog for GRCh37/hg19 genes file

You need to specify to which genome were the data of the file aligned. Here, we need to choose "Feb 2009 (GFCh37/hg19)" because the BED file contains data aligned on that genome. The gene file for GRCh37/hg19 reference is now loaded.

NCBI36/hg18 genes loading

Repeat the same operation for the gene annotation from hg18. This time you will need to select "Reference genome - hg18 (NCBI36)" (Figure 14).

Figure 14: Genome selection dialog for NCBI36/hg18 genes file

Conclusion

You can now navigate into the different chromosomes and visualizes differences between both genomes using the stripes. All genes are perfectly synchronized and are display according to the meta-genome coordinates.

The Figure 15 shows an example of the result of this tutorial. It is possible to see deletions (in red) and insertions (in green) in the NCBI36/Hg18 reference genome compare to the GCh37/Hg19 reference genome.
Chromosome: chr1
Position: 143,822,670

Figure 14: GenPlay-MG (chr1:143,822,670)