Difference between revisions of "Documentation"

From GenPlay, Einstein Genome Analyzer

Jump to: navigation, search
(Score Repartition Around Start)
(Select Coordinate System)
 
(32 intermediate revisions by 3 users not shown)
Line 1: Line 1:
__FORCETOC__
+
== Getting started ==
  
== Starting GenPlay ==
+
=== Starting GenPlay ===
 
GenPlay is freely available at http://www.genplay.net/wiki/index.php/Web_Start To start the software, click the button corresponding to the amount of memory that you wish to allocate to the Java virtual machine.
 
GenPlay is freely available at http://www.genplay.net/wiki/index.php/Web_Start To start the software, click the button corresponding to the amount of memory that you wish to allocate to the Java virtual machine.
  
The amount of memory determines how many tracks you will be able to load simultaneously. The programming philosophy behind GenPlay is to provide fast performances once the data is loaded. To achieve that goal the entire genome need to be loaded in memory for multiple tracks at the same time. This results in high quality performance, but requires a lot of memory. The amount of memory needed per track depends on the genome, the track type, the window size, the data precision etc.
+
The amount of memory determines how many layers you will be able to load simultaneously. The programming philosophy behind GenPlay is to provide fast performances once the data is loaded. To achieve that goal the entire genome need to be loaded in memory for multiple layers at the same time. This results in high quality performance, but requires a lot of memory. The amount of memory needed per layer depends on the genome, the layer type, the window size, the data precision etc.
  
You should generally choose as much memory as you can afford on your system (generally about 70% of the total RAM memory that exists on your system). For mammalian genomes we recommend allocating at least 4 GB of RAM although you should be able to load a couple of genome-wide tracks with 1GB or 1.5GB of RAM. Selecting analysis of only one chromosome at a time will drastically reduce the memory requirement and should allow you to load many tracks at very high resolutions. Tracks loaded in GenPlay can also be compressed as explained later in this documentation.
+
You should generally choose as much memory as you can afford on your system (generally about 70% of the total RAM memory that exists on your system). For mammalian genomes we recommend allocating at least 4 GB of RAM although you should be able to load a couple of genome-wide layers with 1GB or 1.5GB of RAM. Selecting analysis of only one chromosome at a time will drastically reduce the memory requirement and should allow you to load many layers at very high resolutions. Layers loaded in GenPlay can also be compressed as explained later in this documentation.
  
 
The amount of RAM memory available to GenPlay is displayed in the lower right corner of the screen.
 
The amount of RAM memory available to GenPlay is displayed in the lower right corner of the screen.
<br/>
+
 
<br/>
+
=== The Welcome screen ===
<br/>
+
The welcome screen is the first screen of GenPlay-MG and allow user to create or to load a project.
 +
 
 +
==== New Project ====
 +
In order to create a new project, users must give it a name.
 +
[[image:mg_basics_project name.png|center|frame|Text field to define the project name]]
 +
 
 +
The precision of the project will change the number of bits used to code numbers.
 +
* High-Precision: Numbers are coded using 32 bits which offers the highest precision level in GenPlay.
 +
* Low-Precision: Numbers are coded using 16 bits. It may be useful to lower memory usage. However, the maximum score is 65504 and decimals may be rounded in a different way ([http://en.wikipedia.org/wiki/Half-precision_floating-point_format here] for more information).
 +
[[image:precision.png|center|frame|Project precision]]
 +
 
 +
The second step is to choose a reference genome. Users can choose it using the different list according to the clade, the genome and the assembly.
 +
[[image:mg_basics_assembly_chooser.png|center|frame|Assembly chooser]]
 +
 
 +
Several chromosomes are available for each assembly but users can choose to select only some of them.
 +
 
 +
To open the chromosome chooser, users have to click on the tools button next to the assembly name.
 +
[[image:mg_basics_chromosome_chooser.png|center|frame|Chromosome chooser]]
 +
 
 +
The third and last step is to choose between a ''Simple Genome Project'' and a ''Multi Genome Project''. If the multi genome project option is selected, the welcome screen should be as the one shown in figure below.
 +
[[image:mg_basics_empty_welcome_screen.png|center|frame|Empty welcome screen for multi-genome project]]
 +
 
 +
===== Single Genome Project =====
 +
The Single Genome Project is the most common/regular project in GenPlay. If you do not know or understand yet what the Multi Genome Project is, please use the Single Genome Project.
 +
 
 +
===== Multi Genome Project =====
 +
 
 +
====== Introduction ======
 +
 
 +
====== VCF Files ======
 +
VCF files describe differences between genomes. Usually, it concerns differences between one or several genomes of interest and the reference genome used for the mapping process. VCF files define multiple type of variations; GenPlay is able to read and represent the followings:
 +
* InDels
 +
* SNPs
 +
* SV (Structural Variation)
 +
 
 +
A complete description of VCF files is given on the 1000 genomes project website:
 +
 
 +
[http://www.1000genomes.org/wiki/analysis/variant-call-format/vcf-variant-call-format-version-42 Variant Call Format specification]
 +
 
 +
====== Tabix ======
 +
: 1. Introduction
 +
VCF files contain a lot of information which  makes the scanning (loading) processes longer.
 +
 
 +
In order to increase the scanning efficiency, VCF files have to be compressed and indexed. The compression is done using BGZip and the indexing with Tabix.
 +
 
 +
[http://samtools.sourceforge.net/tabix.shtml Tabix manual reference pages]
 +
 
 +
[http://sourceforge.net/projects/samtools/files/tabix/ Tabix download]
 +
 
 +
: 2. VCF files indexing methods
 +
:: 2.1. Using GenPlay
 +
GenPlay is now able to compress and index VCF files using the VCF Loader.
 +
 
 +
The way the VCF Loader works is explained below. When you want to select the compressed file (.vcf.gz), simply select the VCF file (.vcf) instead. You may need to change the file extension filter in the file chooser in order to see .vcf files.
 +
 
 +
GenPlay will look then for compressed/indexed files at the same location, if nothing is found, it will offer to compress and index the selected VCF file (Figure 1).
 +
 
 +
[[image:mg_vcf_loader_compress_index.png|center|frame|Figure 1: VCF Loader compress/index]]
 +
 
 +
It is fully automatic and non-platform dependent (works on Windows, Linux and Mac).
 +
 
 +
:: 2.2. Manually
 +
First, please note the following process must be performed in either Linux or Mac environments.
 +
 
 +
Each VCF files must be first compress to a BGZF (.bgz file) format. Tabix provides a tool to perform the compression.
 +
After compression, VCF files must be indexed using the associated command.
 +
Once Tabix is  installed, two commands are necessary to perform the indexation.
 +
 
 +
Available commands from the Tabix folder:
 +
 
 +
''bgzip -f VCF_PATH;''
 +
 
 +
''tabix –p vcf VCF_PATH;''
 +
 
 +
For example, a VCF file named my_vcf.vcf located in the same folder as Tabix can be indexed with the following commands (Figure 2):
 +
 
 +
''bgzip -f ./my_vcf.vcf;''
 +
 
 +
''tabix –p vcf ./my_vcf.vcf.gz;''
 +
[[image:mg_basics_indexation_commands.png|center|frame|Figure 2: VCF file indexation command]]
 +
 
 +
'''Note:''' the first command '''replaces''' the current VCF file by the compressed VCF file (.vcf.gz). The second command '''creates''' the indexed VCF file in the current folder (.vcf.gz.tbi).
 +
 
 +
More options are available on [http://samtools.sourceforge.net/tabix.shtml Tabix manual reference pages].
 +
 
 +
====== The VCF Loader ======
 +
: 1. Introduction
 +
The VCF Loader is the most important part of multi-genome project settings. It allows users to load all necessary VCF files and to define how to extract information from them. It appears when users click on the "Edit" button from the welcome screen.
 +
 
 +
The Figure 3 shows an empty VCF Loader screen.
 +
[[image:Mg_welcome_screen_vcf_loader.png|center|frame|Figure 3: VCF loader]]
 +
 
 +
GenPlay-MG does not use directly the VCF file, it uses a compress version of it (.gz). Moreover, GenPlay-MG also needs the compress VCF file to be indexed with Tabix. Both file versions must be in the '''same folder''' and must have the '''same name''', only file extensions differ (.gz and .tbi).
 +
In order to use GenPlay to generate additional files, please refer to the [[#Tabix|section above]].
 +
 
 +
The user can add or remove rows by right clicking on the table.
 +
 
 +
: 2. Columns description
 +
'''''File'''''
 +
 
 +
This column refers to the VCF file path. Once loaded, the raw name column is automatically filled with every raw genome name contained in the selected VCF file.
 +
 
 +
'''''Raw name'''''
 +
 
 +
The ''Raw name'' column list is automatically filled when a VCF file has been chosen. That list contains every genotype headers contained inside the selected VCF file. Because Genome names might be difficult to remembers, GenPlay-MG offers users the option of adding another name (an alias) using the ''Genome'' column.
 +
 
 +
'''''Nickname'''''
 +
 
 +
The ''Nickname'' column allows users to associate an alias  to the selected genome. This alias will appear in GenPlay-MG and can be useful because genome names in VCF files are often non descriptive numbers that can be hard to remember.
 +
 
 +
'''''Group'''''
 +
 
 +
Users can gather genomes by group. Group names are used to distinguish genomes  and to perform some specific functionalities.
 +
 
 +
: 3. Columns edition
 +
 
 +
''Group'', ''Nickname'' and ''File'' column have their own editable list.To edit a cell, click on it, go over the item you want to edit and choose one of the following action:
 +
 
 +
- Add (green symbol on empty item)
 +
 
 +
- Edit (pen symbol on an item)
 +
 
 +
- Delete (red symbol on an item)
 +
 
 +
That way, users can set up all columns before starting (or at the same time) to fill the table.
 +
 
 +
'''Note: ''' The ''Raw name(s)'' column is automatically filled with genome name from the selected VCF file, that column cannot be edited manually.
 +
 
 +
====== Import/Export ======
 +
Once a project has been set up, it can be saved using the import/export function. Pressing the export button saves an XML files to the hard drive.  This XML file can then be imported to reload the project.
 +
 
 +
The XML file structure is simple. Each row are stored in ''row'' mark containing every attribute names such as ''group'', ''genome, ''file'' and ''raw_name''. The settings file is formatted as shown in Figure 4.
 +
[[image:mg_basics_xml_settings.png|center|frame|Figure 4: XML file settings]]
 +
 
 +
'''Note:''' If the user moves the VCF files or changes one of its genotype headers, the XML file will not work anymore. User has to modify ''file'' and/or ''raw_name'' attribute values.
 +
 
 +
==== Load Project ====
 +
[[image:load_project.png|right|thumb|150px|Load an existing project]]
 +
In order to load a project, the user has to select the "Load an existing project" option.
 +
 
 +
The list of the 5 last projects shows on the lower part of the dialog. An additional option "Other" will let the user select a GenPlay project file to load.
 +
 
 +
The upper part updates automatically when selecting a project in order to remind the following information:
 +
* Name: The name of the project.
 +
* Precision: The precision of the project, either high or low.
 +
* Genome: The genome used.
 +
* Project type: The type of project, either single or multi-genome.
 +
* Last modified: The last time the project has been modified.
 +
* Track number: The number of track in the project.
  
 
== GUI Overview ==
 
== GUI Overview ==
Line 21: Line 169:
 
# Control Panel
 
# Control Panel
 
# Status Bar
 
# Status Bar
<br style="clear: both" />
 
  
 
=== Ruler ===
 
=== Ruler ===
Line 27: Line 174:
  
 
[[image:ruler.png|left|thumb|500px|Ruler 1.Option Button 2.Absolute Positions 3.Relative Positions]]
 
[[image:ruler.png|left|thumb|500px|Ruler 1.Option Button 2.Absolute Positions 3.Relative Positions]]
<br style="clear: both" />
 
  
 
==== General Option Button ====
 
==== General Option Button ====
Line 41: Line 187:
 
The value on the right is the last displayed position. This value range from 1 to 2*(chromosome length).
 
The value on the right is the last displayed position. This value range from 1 to 2*(chromosome length).
  
==== Relative Position ====
+
==== Relative Positions ====
 
The numbers written in black on the second line represent the distance from the middle in base pair.
 
The numbers written in black on the second line represent the distance from the middle in base pair.
  
 
=== Track List ===
 
=== Track List ===
The track list is the cornerstone of the GUI.  From here you can load tracks and execute operations.
+
The track list is the cornerstone of the GUI.  From here you can load layers and execute operations.
  
 
The tracks are divided into two parts.   
 
The tracks are divided into two parts.   
  
On the left, there is the track handler that becomes highlighted when the mouse is over it. By right clicking on the track handler, a contextual menu appears with all the operations that can be executed on the track.
+
On the left, there is the track handler that becomes highlighted when the mouse is over it. By right clicking on the track handler, a contextual menu appears with all the operations that can be executed on the track and its layer(s).
  
 
On the right, the data can be visualized.
 
On the right, the data can be visualized.
<br/>
 
<br/>
 
<br/>
 
  
 
=== Control Panel ===
 
=== Control Panel ===
Line 70: Line 213:
 
# Stop button, allows users to stop the current operation. If the button is not bright red the operation can't be stopped
 
# Stop button, allows users to stop the current operation. If the button is not bright red the operation can't be stopped
 
# Operation description, displays a short text describing the current operation as well as the elapsed time from the beginning of the operation
 
# Operation description, displays a short text describing the current operation as well as the elapsed time from the beginning of the operation
# Memory bar, shows the amount of memory used and the amount of memory available. Make sure that you have enough memory before starting a new operation. You can delete or compress tracks to free up memory.
+
# Memory bar, shows the amount of memory used and the amount of memory available. Make sure that you have enough memory before starting a new operation. You can delete or compress layers to free up memory.
<br/><br/><br/>
 
  
 
== Browsing the Genome ==
 
== Browsing the Genome ==
<br/>
+
 
<br/>
 
<br/>
 
 
=== Changing the Position ===
 
=== Changing the Position ===
 
You can change the position of the displayed window by:
 
You can change the position of the displayed window by:
Line 85: Line 225:
 
# Using the keyboard left and right arrows
 
# Using the keyboard left and right arrows
 
# Double-clicking on a track where you want to center the view
 
# Double-clicking on a track where you want to center the view
<br/>
+
 
<br/>
+
=== Switching Chromosome ===
<br/>
 
=== Changing the Chromosomes ===
 
 
You can switch the selected chromosome by:
 
You can switch the selected chromosome by:
 
# Changing the selection in the chromosome box on the control panel
 
# Changing the selection in the chromosome box on the control panel
 
# Changing the text of the position text field on the control panel
 
# Changing the text of the position text field on the control panel
<br/>
+
 
<br/>
 
<br/>
 
 
=== Changing the Zoom ===
 
=== Changing the Zoom ===
 
The level of the zoom can be modified by:
 
The level of the zoom can be modified by:
Line 100: Line 236:
 
# Using the zoom bar on the control panel
 
# Using the zoom bar on the control panel
 
# Changing the text of the position text field on the control panel
 
# Changing the text of the position text field on the control panel
<br/>
 
<br/>
 
<br/>
 
  
== Loading a Track ==
+
== Loading a Layer ==
To load a track in any row, right click on the handler of any empty track (the blue part on the left of the track). This opens a menu including options to load the various types of tracks that exist in GenPlay.
+
 
[[image:load_track.png|center|thumb|100px|Loading a Track]]
+
=== Introduction ===
Examples of tracks that can be loaded in GenPlay are available for download from the GenPlay Library accessible from the GenPlay.net website.
+
The layers are the way to show information from files. They can represent information in different manners.
 +
 
 +
A layer is created from a track, each track can contain one or several layers.
 +
 
 +
To load a layer in a track, right click on its handler (the blue part on the left of the track). This opens a contextual menu with the different actions available on the track.
 +
 
 +
The menu of a track empty of layer looks like the one in figure 1.
 +
 
 +
By clicking "Add Layer" appears a dialog to select one of the different layer type GenPlay offers (Figure 2).
 +
 
 +
Examples of layers that can be loaded in GenPlay are available for download from the GenPlay Library accessible from the GenPlay.net website.
 +
 
 +
<gallery widths=350px perrow=2>
 +
image:add_layer.png|Figure 1: Track Contextual Menu
 +
image:layer_type.png|Figure 2: Layer Types
 +
</gallery>
 +
 
 +
=== Loading a Sequencing/Microarray Layer ===
 +
The Sequencing/Microarray layer allows the visualization of windows of variable/fix sizes with a score associated to these windows.
 +
Select the “Sequencing/Microarray Layer” option. This opens up a file chooser dialog box. Load the file of your choice from the list of available window files and click the open button.
 +
 
 +
Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a sequencing/microarray layer.
 +
 
 +
This opens a new dialog to set different parameters for the new layer (as shown on the figure below). The dialog is separated in 6 sections detailed below.
 +
[[image:Add_layer_seq_micro.png|right|thumb|300px|New Layer Settings Dialog]]
 +
 
 +
==== Layer Name ====
 +
Gives a name to the layer.
  
=== Loading a Variable Window Track ===
+
==== Bin ====
[[image:load_vwt1.png|right|thumb|100px|File Chooser]]
+
By default, the windows generated in sequencing/microarray layer have a variable size. It represents very precisely the content of the file.
Variable window tracks allow the visualization of windows of variable sizes with a score associated to these windows.
 
  
Select the “Load Variable Window Track” option. This opens up a file chooser dialog box. Load the file of your choice from the list of available fixed window files and click the open button.
+
For some other purposes, users may want to have fixed windows size. They are useful to represent the results of many types of experiments including, but not limited to: CHIP-seq, RNA seq, and TimEX-seq.  Files containing the results of alignments (SAM, bowtie, Eland) and files containing already created bin lists (bed, bgr, etc.) can be loaded using this option. In the case of alignment files, bin lists will be created on the fly as described below. Files containing the results of micro-array experiments can also be loaded as long as they are in one of the accepted formats.
  
Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a variable window track.
+
It lowers the resolution but usually offers better memory usage.
<br style="clear: both" />
 
[[image:load_vwt2.png|left|thumb|100px|Chromosome Selection]]
 
  
==== Chromosome Selection ====
+
This is implemented here by enabling the "Bin Data" option. The "Bin Size" field will then be available in order to give the size of the windows in base pairs.
After selecting your file, a new window will appear and ask which chromosome to extract. By default all the chromosomes of the project are selected. If you want to change this selection, click on the "modify selection" button and uncheck the undesired chromosomes. Working on fewer chromosomes will save memory and loading time.
 
  
'''Important Note:''' GenPlay can accelerate the loading if you know that your file is sorted by chromosome. If you press Yes when GenPlay asks you if the file is sorted when your file is actually not sorted, the file may load incompletely, leading to a loss of valuable information.  The chromosomes must be ordered the same way it is ordered in the chromosome selection combo-box.
+
'''Important Note:''' A bin size of 1 bp will use a lot of memory. According to the experiment, it may be more efficient to disable the bin data option and stay in variable window size mode.
<br style="clear: both" />
 
  
 
==== Score Calculation ====
 
==== Score Calculation ====
[[image:load_vwt3.png|right|thumb|100px|Name and Score Calculation]]
+
[[image:score_calculation_methods.png|right|thumb|100px|Name and Score Calculation]]
Once the chromosome selection is done, a final window will pop-up and ask you to name the track. The default name is the loaded file name.  If there are overlapping windows in your data file, you will also be prompted to select a method for calculating the score of the windowsOverlapping windows will be split into smaller windows using a simple algorithm.
+
It can happen that files contain overlapping windows. In this case, GenPlay splits them into smaller windows using a simple algorithm.
<br style="clear: both" />
+
 
 +
This algorithm can be chosen in that section offering the following possibilities:
 +
 
 +
* Addition
 +
* Average
 +
* Maximum
 +
* Minimum
 +
 
 +
Some examples are shown in the sections below for both [[#For non bined layer|non bined]] and [[#For bined layer|bined]] layers.
 +
 
 +
==== Strand ====
 +
If your input file contains information regarding the strands, you'll be able to choose to load the data from either both or only one strand.
 +
 
 +
You can also decide to shift the reads from both strands as shown in the figure on the left. To shift the strands just put a value in the "Shift" input box.
 +
 
 +
The value you entered is going to be added to the position of the data on the 5' strand and subtracted from the ones on the 3' strand.
 +
 
 +
==== Fragment Length ====
 +
 
 +
==== Selected Chromosomes ====
 +
By default all the chromosomes of the project are selected. If you want to change this selection, click on the "modify selection" button and uncheck the undesired chromosomes. Working on fewer chromosomes will save memory and loading time.
 +
 
 +
'''Important Note:''' GenPlay can accelerate the loading if you know that your file is sorted by chromosome.  If you press Yes when GenPlay asks you if the file is sorted when your file is actually not sorted, the file may load incompletely, leading to a loss of valuable informationThe chromosomes must be ordered the same way it is ordered in the chromosome selection combo-box.
  
 
==== Examples of Score Calculations ====
 
==== Examples of Score Calculations ====
  
<br/>
+
===== For non bined layer =====
<br/>
+
 
<br/>
+
====== Example 1 ======
===== Example 1 =====
 
 
''' Input file '''
 
''' Input file '''
 
{|  cellpadding="4" cellspacing="0" border="1"
 
{|  cellpadding="4" cellspacing="0" border="1"
Line 188: Line 364:
 
| 1
 
| 1
 
|}
 
|}
 +
  
 
''' Result '''
 
''' Result '''
 +
[[image:loadVWT_ex1.png|center|frame|Loading of an alignment file as a variable window layer]]
  
  
[[image:loadVWT_ex1.png|center|frame|Loading of an alignment file as a variable window track]]
+
----
  
  
----
+
====== Example 2 ======
 
 
===== Example 2 =====
 
 
{|  cellpadding="4" cellspacing="0" border="1"
 
{|  cellpadding="4" cellspacing="0" border="1"
 
! Chr  
 
! Chr  
Line 221: Line 397:
  
  
[[image:loadVWT_ex2.png|center|thumb|400px|Loading of an interval file as a variable window track]]
+
[[image:loadVWT_ex2.png|center|thumb|400px|Loading of an interval file as a variable window layer]]
  
  
Line 261: Line 437:
 
| 100
 
| 100
 
|}
 
|}
<br/>
 
<br/>
 
<br/>
 
  
=== Loading Fixed Window Tracks ===
+
===== For binned layer =====
[[image:load_fwt1.png|left|thumb|100px|File Chooser]]
 
Fixed window tracks display bin lists.  They are useful to represent the results of many types of experiments including, but not limited to: CHIP-seq, RNA seq, and TimEX-seq.  Files containing the results of alignments (SAM, bowtie, Eland) and files containing already created bin lists (bed, bgr, etc.) can be loaded using this option. In the case of alignment files, bin lists will be created on the fly as described below. Files containing the results of micro-array experiments can also be loaded as long as they are in one of the accepted formats.
 
  
By right clicking on an empty track handler, the contextual menu will pop up.  Select the “Load Fixed Window Track” option. This opens up a file chooser dialog box as shown in the figure on the left.
+
====== Example 1 ======
 
+
Loading of an alignment file as a fixed window layer with a window size of 100:  
Load the track of your choice from the list of files and click the open button. Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a fixed window track.
 
<br/>
 
<br/>
 
<br/>
 
==== Track Name ====
 
[[image:load_fwt2.png|Right|thumb|200px|Fixed Window Track Options]]
 
The default track name will be the file name. The name of the track can be changed later after the track is loaded.
 
<br/>
 
<br/>
 
<br/>
 
==== Window Size ====
 
This specifies the size of the genomic windows (bins) in base pair (bp) for the track that will be created to summarize the results.
 
<br/>
 
<br/>
 
<br/>
 
==== Score Calculation ====
 
This option allows you to choose how the scores of the bins are calculated. You may choose between three options: average, maximum or sum. The algorithm of the score calculation is explained below.
 
<br/>
 
<br/>
 
<br/>
 
==== Strand Selection ====
 
[[image:strand_shifting.png|left|thumb|100px|Strand Shifting]]
 
If your input file contains information regarding the strands, you'll be able to choose to load the data from either both or only one strand. 
 
 
 
You can also decide to shift the reads from both strands as shown in the figure on the left. To shift the strands just put a value in the "Shift" input box.
 
 
 
The value you entered is going to be added to the position of the data on the 5' strand and subtracted from the ones on the 3' strand.
 
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
==== Data Precision ====
 
Because GenPlay requires a lot of RAM memory, we provide the option of changing the precision at which the score for each bin is stored.
 
* Scores in 64 bit are stored in floating value double precision (which can represent extremely large numbers unlikely to be useful for genomic experiments).
 
* Scores in 32 bit are stored in floating value single precision (which can also represent very large numbers).
 
* Scores stored in 16 bits can range between - 3267.8 and +3267.7 (with one decimal place).
 
* Scores stored in 8 bits can range between 0 and 255 (with no decimal).
 
* Score in 1 bit can be equal to zero or 1 (useful to create masks for instance).
 
 
 
We recommend storing scores in 32 or 16 bits.
 
<br/>
 
<br/>
 
<br/>
 
==== Chromosome Selection ====
 
You can load either the whole genome or only specific chromosomes (which saves time and memory).
 
 
 
'''Important Note:''' When specific chromosomes are selected, you will be prompted to tell if you file is sorted by chromosome.  If you answer that your file is sorted by chromosome when actually it is not your file may load incompletely, leading to a loss of valuable information.  The chromosomes must be ordered the same way it is ordered in the chromosome selection combo-box.
 
 
 
When the OK button is clicked, the track is loaded in the location desired.
 
<br/>
 
<br/>
 
<br/>
 
 
 
==== Examples of Score Calculations ====
 
<br/>
 
<br/>
 
<br/>
 
===== Example 1 =====
 
Loading of an alignment file as a fixed window track with a window size of 100:  
 
  
 
(each line represents one read position, score is always one)
 
(each line represents one read position, score is always one)
Line 387: Line 499:
  
  
[[image:loadFWT_ex1.png|center|frame|Loading of an alignment file as a fixed window track with a window size of 100]]
+
[[image:loadFWT_ex1.png|center|frame|Loading of an alignment file as a fixed window layer with a window size of 100]]
  
  
Line 423: Line 535:
  
 
----
 
----
<br/>
+
 
<br/>
+
 
<br/>
+
====== Example 2 ======
===== Example 2 =====
+
Loading of an alignment file as a fixed window layer with a window size of 100:  
Loading of an alignment file as a fixed window track with a window size of 100:  
 
  
 
(each line represents one read position, score varies)
 
(each line represents one read position, score varies)
 
  
 
''' Input file '''
 
''' Input file '''
Line 486: Line 596:
  
  
[[image:loadFWT_ex2.png|center|frame|Loading of an alignment file as a fixed window track with a window size of 100]]
+
[[image:loadFWT_ex2.png|center|frame|Loading of an alignment file as a fixed window layer with a window size of 100]]
  
  
Line 522: Line 632:
  
 
----
 
----
<br/>
 
<br/>
 
<br/>
 
  
===== Example 3 =====
+
 
Loading of an interval file as a fixed window track with a window size of 100:
+
====== Example 3 ======
 +
Loading of an interval file as a fixed window layer with a window size of 100:
  
 
''' Input file '''
 
''' Input file '''
Line 553: Line 661:
  
  
[[image:loadFWT_ex3.png|center|frame|Loading of an interval file as a fixed window track with a window size of 100]]
+
[[image:loadFWT_ex3.png|center|frame|Loading of an interval file as a fixed window layer with a window size of 100]]
  
  
Line 594: Line 702:
 
|}
 
|}
  
<br/>
+
=== Loading a Gene Annotation Layer ===
<br/>
+
[[image:gene_track.png|left|thumb||A Gene Layer]]
<br/>
 
 
 
=== Loading a Gene Track ===
 
[[image:gene_track.png|left|thumb||A Gene Track]]
 
 
[[image:score_color.png|right|thumb|40px|Score Color]]
 
[[image:score_color.png|right|thumb|40px|Score Color]]
After right clicking on the empty track handler, select the “Load Gene Track” option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a gene track.
+
Select the “Gene Layer" option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a gene layer.
  
Once it's done, just wait until the loading is complete and the gene track will appear in the track you selected.  
+
Once it's done, just wait until the loading is complete and the gene layer will appear in the track you selected.  
  
 
Note that the genes on the plus strand are in red and the genes on the minus strand are in blue. If the file contains expression values, the exons are color coded to represent the expression (red = high, blue = low, as shown on the right).
 
Note that the genes on the plus strand are in red and the genes on the minus strand are in blue. If the file contains expression values, the exons are color coded to represent the expression (red = high, blue = low, as shown on the right).
<br style="clear: both" />
 
  
=== Loading a sequence track ===
+
=== Loading a Repeat Family Layer ===
After right clicking on the empty track handler, select the “Load Sequence Track” option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a sequence track.
+
Select the "Repeat Layer" option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a repeat layer.
[[image:sequence_track.png|center|thumb|300px|A Sequence Track]]
+
 
Sequence tracks show DNA sequences from .2bit files.  
+
This layer type displays repeats organized by family or class.
 +
 
 +
=== Loading a DNA Sequence Layer ===
 +
Select the “DNA Sequence Layer” option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a sequence layer.
 +
[[image:sequence_track.png|center|thumb|300px|A Sequence Layer]]
 +
Sequence layers show DNA sequences from .2bit files.  
  
 
The hg18, hg19, mm8 and mm9 sequence files can be downloaded from the [http://129.98.70.162/wiki/index.php/Library library] of GenPlay.
 
The hg18, hg19, mm8 and mm9 sequence files can be downloaded from the [http://129.98.70.162/wiki/index.php/Library library] of GenPlay.
<br/>
 
<br/>
 
<br/>
 
  
=== Loading a SNP Track ===
+
=== Loading a Mask Layer ===
First, select the “Load SNP Track” option on the track contextual menu. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a SNP track.
+
Select the "Mask Layer" option. The stripes acting as masks can be useful to show regions of interest such as CpG Islands or repeat regions.
 +
 
 +
Check the [[#File Formats|File Formats]] section out if you need to know what kind of file can be loaded as a stripes.
 +
 
 +
=== Loading a Variant Layer ===
 +
[[image:Mg_add_layer_variant_selection.png|right|thumb|200px|Add a Variant Layer]]
 +
 
 +
==== Add a Variant Layer ====
 +
Select the "Variant Layer" option, this option is only available in multi-genome projects. This will pop up a new dialog to select which sample the user wants to load, and which variation(s).
 +
A variant layer is according to only one sample. It is also possible to change the colors of each variation independently by clicking on the colored square next to the variation checkbox.
 +
 
 +
==== Multi-Genome Features ====
 +
 
 +
===== Select Coordinate System =====
 +
[[image:mg_coordinate_chooser.png|left|thumb|150px|Coordinate System chooser]]
 +
The coordinate system of GenPlay can be changed by selecting one on the list located on the bottom right of the main frame. The default system is the one of the Meta Reference Genome; the Reference Genome coordinate system is also available. The user can also choose the one of any of the loaded genome. This does not affect operation, only the red position numbers on the top of the frame as well as the position search bar on the bottom.
  
A SNP track shows the Single-Nucleotide Polymorphisms.
+
===== Multi-Genome Project Properties =====
<br/>
+
[[image:mg_option_button.png|right|thumb|200px|Properties Dialog Button]]
<br/>
+
In Multi-Genome Projects only, a new button appears on the bottom left of the frame. This button leads to the Multi-Genome Project Properties dialog allowing the user to visualize and handle the project settings. Right-clicking on the button opens a contextual menu offering shortcuts to the different sections of the properties dialog.
<br/>
 
  
=== Loading a Repeat Track ===
+
====== General ======
Select the “Load Repeat Track” option on the track contextual menu. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the [[#File formats|File formats]] section if you want to know what kind of file can be loaded as a repeat track.
+
[[image:mg_general.png|right|thumb|300px|General Section]]
 +
The General section is an overview of how the project has been loaded. Projects can be very complex, using many files and samples. This section reminds the user how the project has been set up.
  
This track type displays repeats organized by family or class.
+
====== Settings ======
<br/>
+
[[image:mg_settings.png|right|thumb|300px|Settings Section]]
<br/>
+
The Settings section lets the user choose how to handle multi-genome various options.
<br/>
+
* Properties Dialog
 +
** Default section to open: the default section of the Multi-Genome Project Properties dialog to open when clicking the button.
 +
* VCF Loader
 +
** Default group text name: Default name for groups.
 +
* Stripes transparency: Sets the transparency of stripes reprensenting variations.
 +
* Global display settings
 +
** Show legend: Allow to show the enabled variations and their colors into the track layer.
 +
* Variant stripes settings
 +
** Show filtered variation: Filtered variations can be shown but will be represented with a cross over their stripes.
 +
** Show border of insertion: Insertion stripes have a specific border, it may help to recognize them easily when many layers are loaded, independantly of the color.
 +
** Show border of deletion: Deletion stripes have a specific border, it may help to recognize them easily when many layers are loaded, independantly of the color.
 +
** Show nucleotides of insertion stripes: Added nucleotides will be retrieved from the VCF files if possible.
 +
** Show nucleotides of deletion stripes: Deleted nucleotides will be retrieved from the VCF files if possible.
 +
** Show nucleotides of SNP stripes: SNP nucleotides will be retrieved from the VCF files if possible.
 +
* Reference stripes settings
 +
** Show reference stripes: Stripes representing the reference genome can be either shown or hidden.
 +
** Reference stripes color: Defines a color for reference stripes.
 +
 
 +
====== Files ======
 +
The Files section lists all the VCF files loaded into GenPlay. Their information are separated into two categories:
 +
* Information: the information part shows the name and the location of the file. It also segments the header of the VCF file for an easy reading and interpretation.
 +
* Statistics: This part gives various descriptive statistics of the file and for each sample. All tables can be copied and pasted as regular text tab-delimited.
 +
<gallery widths=350px height=150px perrow=2>
 +
image:mg_file_info.png|File Information
 +
image:mg_file_stat.png|File Statistics
 +
</gallery>
 +
 
 +
====== Filters ======
 +
The filters section is covered in the [[#MGFiltersExplanation|section below]].
  
 
=== Loading Data From a DAS Server ===
 
=== Loading Data From a DAS Server ===
Line 638: Line 786:
  
 
[[image:DAS_dialog.png|left|thumb||DAS Dialog]]
 
[[image:DAS_dialog.png|left|thumb||DAS Dialog]]
The “Load from DAS Server” option from the track contextual menu will show the DAS Dialog.
+
The “Add Layer from DAS Server” option from the track handler menu will show the DAS Dialog.
  
 
Select the server from which you want to retrieve the data in the "Server" box.  
 
Select the server from which you want to retrieve the data in the "Server" box.  
Line 646: Line 794:
 
Once that's done you need to select the data that you want to retrieve in the "Data Type" box.
 
Once that's done you need to select the data that you want to retrieve in the "Data Type" box.
  
GenPlay can either generate a gene track or a variable window track from the retrieved data. You can select what type of output track you want in the "Generate" option.
+
GenPlay can either generate a gene layer or a variable window layer from the retrieved data. You can select what type of output layer you want in the "Generate" option.
  
 
Finally, you can also choose to download data on only a part of the genome. This can be useful because retrieving data from a DAS server can be time consuming.
 
Finally, you can also choose to download data on only a part of the genome. This can be useful because retrieving data from a DAS server can be time consuming.
  
 
'''Note:''' The [[#DAS server|DAS server]] section shows how to add new servers to the list of available servers in the DAS dialog.
 
'''Note:''' The [[#DAS server|DAS server]] section shows how to add new servers to the list of available servers in the DAS dialog.
<br/>
 
<br/>
 
<br/>
 
 
=== Generating a Multi Curves Track ===
 
[[image:Multi_curves_track.png|center|thumb|200px|A Mutli Curves Track]]
 
If more than one fixed or variable window tracks are loaded, you can overlay them in a multi curves track. To do so, first select the "Generate Multi Curves Track" in the track contextual menu.
 
[[image:load_mct1.png|left|thumb|100px|Mutli Curves Dialog]]
 
Then a dialog will appear asking you which tracks you want to see in the multi curves track.
 
 
The available tracks are in the list on the left of the dialog and the selected track appears in the list on the right. Select a track by clicking on its name and use the left and right arrows in the middle of the screen to toggle a track from one list to the other. Double clicking on track produces the same effect.
 
 
The order of the tracks in the right list will determine the order in which the tracks are printed. The track on top of the list will be printed on top the other tracks. You can change the order of the tracks by clicking on the name of a track in the right list and using the up and down arrow in the middle of the dialog.
 
 
'''Note:''' in order to change the appearance of a the multi curve track, you need to change the appearance of the tracks that appear in the multi curves track.
 
<br/>
 
<br/>
 
<br/>
 
 
=== Loading Stripes ===
 
[[image:Stripes.png|right|thumb|200px|CPG Islands Shown As Stripes On a Refseq Gene Track]]
 
By clicking on the "Load Stripes" option of the track contextual menu you can load transparent stripes superimposed on a track. The stripes can be useful to show regions of interest such as CpG Islands or repeat regions.
 
 
Check the [[#File Formats|File Formats]] section out if you need to know what kind of file can be loaded as a stripes.
 
<br/>
 
<br/>
 
<br/>
 
  
 
== Main Menu ==
 
== Main Menu ==
 
[[image:main_menu.png|right|thumb|200px|Main Menu]]
 
[[image:main_menu.png|right|thumb|200px|Main Menu]]
 
On GenPlay’s main screen, click on the top left button (shown by a little hammer and wrench) to pop up the main menu.
 
On GenPlay’s main screen, click on the top left button (shown by a little hammer and wrench) to pop up the main menu.
<br/>
+
 
<br/>
+
=== New Project ===
<br/>
+
This will pop up the welcome screen in order to start a new project. All work not saved will be lost.
  
 
=== Load / Save Project ===
 
=== Load / Save Project ===
This menu allows you to load or to save a whole GenPlay project in a space efficient binary compressed format. When you load a GenPlay project, all the tracks of your current project will be replaced by the ones from the loaded project and all the information that hasn't been saved will be lost.
+
This menu allows you to load or to save a whole GenPlay project in a space efficient binary compressed format. When you load a GenPlay project, all the tracks and layers of your current project will be replaced by the ones from the loaded project and all the information that hasn't been saved will be lost.
 
'''Important Note:''' The GenPlay project files may be dependent on the version of GenPlay you're using. Be sure to remember with which version of GenPlay you saved a project and use the same version next time you load your project.
 
'''Important Note:''' The GenPlay project files may be dependent on the version of GenPlay you're using. Be sure to remember with which version of GenPlay you saved a project and use the same version next time you load your project.
  
'''Important Note 2:''' In the current GenPlay version, The genome selected in the configuration file is not saved with the project. A project will generally not load and give an error message if the genome kept in memory in the GenPlay temp file is different from the genome used when the project was saved.  To change the genome simply go to the upper left corner and access the configuration menu through the options menu.
 
 
<br/>
 
<br/>
 
<br/>
 
 
=== Full Screen ===
 
=== Full Screen ===
 
Click on this item from the main menu to toggle the full screen mode. When the full screen mode is on, the control panel and the status bar are hidden.  
 
Click on this item from the main menu to toggle the full screen mode. When the full screen mode is on, the control panel and the status bar are hidden.  
  
 
You can also toggle the full screen mode by pressing the F11 key.
 
You can also toggle the full screen mode by pressing the F11 key.
 +
 +
=== Warnings report ===
 +
This option will pop up the Warnings report dialog in order to consult previous and current alerts.
  
 
=== Option ===
 
=== Option ===
 
The option menu item allows you to modify the configuration of GenPlay. Please refer to the section [[#Changing the Configuration of GenPlay|Changing the configuration of GenPlay]] for further information.
 
The option menu item allows you to modify the configuration of GenPlay. Please refer to the section [[#Changing the Configuration of GenPlay|Changing the configuration of GenPlay]] for further information.
<br/>
 
<br/>
 
<br/>
 
  
 
=== RNA To DNA Reference ===
 
=== RNA To DNA Reference ===
This option allows you to transformed the coordinate system of the result of a RNA-Seq experiment based on alignment to a transcriptome  (for instance  all refseq genes),  to a genomic coordinate system.
+
This option allows you to transformed the coordinate system of the result of a RNA-Seq experiment based on alignment to a transcriptome  (for instance  all refseq genes),  to a genomic coordinate system.
  
 
You need two files in order to use this functionality.
 
You need two files in order to use this functionality.
Line 739: Line 855:
 
And the result as a GdpGene file is:
 
And the result as a GdpGene file is:
 
  NM_000016 chr1 + 76190042 76229353 76190042,76194085,76198328,76198537,76199212,76200475,76205664,76211490,76215103,76216135,76226806,76228376 76190502,76194173,76198426,76198607,76199313,76200556,76205795,76211599,76215244,76216231,76227055,76229353 667888.95,1506024.1,0,0,0,0,0,0,0,0,0,0
 
  NM_000016 chr1 + 76190042 76229353 76190042,76194085,76198328,76198537,76199212,76200475,76205664,76211490,76215103,76216135,76226806,76228376 76190502,76194173,76198426,76198607,76199313,76200556,76205795,76211599,76215244,76216231,76227055,76229353 667888.95,1506024.1,0,0,0,0,0,0,0,0,0,0
<br/>
 
<br/>
 
<br/>
 
  
 
=== Help and About GenPlay ===
 
=== Help and About GenPlay ===
 
The help and the about GenPlay options open a browser showing respectively the documentation and about pages of GenPlay website.
 
The help and the about GenPlay options open a browser showing respectively the documentation and about pages of GenPlay website.
<br/>
+
 
<br/>
 
<br/>
 
 
=== Exit ===
 
=== Exit ===
 
This option closes the application after asking for confirmation.
 
This option closes the application after asking for confirmation.
  
 
== Changing the Configuration of GenPlay ==
 
== Changing the Configuration of GenPlay ==
Click on the option item of the main menu to open the configuration screen.
+
[[image:main_menu.png|right|thumb|250px|Option Menu]]
[[image:changing_configuration.png|center|thumb|100px|Option Menu]]
+
Click on the option item of the [[#Main Menu|main menu]] to open the configuration screen.
  
 
=== General Options ===
 
=== General Options ===
The following screen lets you set the general options:
+
The following screen lets you set the general options.
[[image:general_options.png|center|thumb|100px|General Options]]
 
The Default Directory lets you specify where the files containing GenPlay tracks will be stored in your file system.
 
  
The Log File is a text file that contains a time-stamped history of the files extracted and loaded on GenPlay.
+
The Default Directory lets the user choose which folder to open by default for any of the file chooser within GenPlay.
  
 
From this screen, you can also modify the appearance of the software by changing the look & feel.
 
From this screen, you can also modify the appearance of the software by changing the look & feel.
<br/>
 
<br/>
 
<br/>
 
 
=== Configuration Files ===
 
[[image:changing_configuration.png|right|thumb|100px|Configuration Files]]
 
The configuration files screen allows you to change the zoom file as well as the genome configuration file. It is necessary to restart GenPlay after modifying this option in order for the changes to take effect.
 
<br/>
 
<br/>
 
<br/>
 
==== Zoom File ====
 
The Zoom configuration file contains the predefined levels of zooming. To change the levels of zoom, just create a text file with one level of zooming (in bp) per line order from the smallest to the greatest. Here is an example:
 
<br style="clear: both" />
 
 
10
 
100
 
1000
 
10000
 
100000
 
1000000
 
10000000
 
100000000
 
 
<br/>
 
<br/>
 
<br/>
 
==== Genome File ====
 
Once GenPlay is started, a configuration file describing the genome that you want to analyze is loaded (the default is human hg19).
 
Configurations are simple text files that specify the name and length of the chromosomes or scaffolds of the current genome. Configuration files for human and mouse recent genome assembly can be downloaded from the GenPlay [http://www.GenPlay.net/wiki/index.php/Library library].  Genome configuration files form human and mouse come in two options full and basic. Basic only contains the standard chromosome. The full version of the fiels also allow the display of chromosome variants.
 
 
Configuration files for any genome can easily be created in any word processor using the provided examples as a model.
 
Here is an example of a genome file:
 
chr1 249250621
 
chr5 180915260
 
chr13 115169878
 
chrX 155270560
 
chrY 59373566
 
 
<br/>
 
<br/>
 
<br/>
 
  
 
=== Track Option ===
 
=== Track Option ===
[[image:track_option.png|right|thumb|100px|Track Option]]
 
 
The Number of Tracks text box defines the maximum number of tracks that can be loaded on GenPlay.
 
The Number of Tracks text box defines the maximum number of tracks that can be loaded on GenPlay.
  
Line 812: Line 879:
  
 
The Undo Count text box defines the number of operations that can be undone. Note that the higher the number of undos selected, the more memory will be required.
 
The Undo Count text box defines the number of operations that can be undone. Note that the higher the number of undos selected, the more memory will be required.
<br/>
+
 
<br/>
+
The reset option allows the user to easily reset a layer in order to come back as if it has been freshly loaded.
<br/>
+
 
 +
The legend showing layers name on the upper right of a track can also be enabled or disabled.
  
 
=== DAS Server ===
 
=== DAS Server ===
[[image:das_server_option.png|right|thumb|100px|DAS Server Option]]
 
 
The DAS server option shows the list of existing DAS servers along with the URL where these servers are located. It also provides the options to add new servers and remove existing servers.
 
The DAS server option shows the list of existing DAS servers along with the URL where these servers are located. It also provides the options to add new servers and remove existing servers.
  
 
GenPlay can communicate and retrieve data from the servers implementing the [http://www.biodas.org/wiki/DAS/1 DAS/1 protocol]
 
GenPlay can communicate and retrieve data from the servers implementing the [http://www.biodas.org/wiki/DAS/1 DAS/1 protocol]
<br/>
 
<br/>
 
<br/>
 
  
 
=== Restore Default ===
 
=== Restore Default ===
 
The Restore Default configuration restores everything back to the factory settings.
 
The Restore Default configuration restores everything back to the factory settings.
 +
<gallery widths=220px heights=100px perrow=3>
 +
image:Options_general.png|General Options
 +
image:options_track.png|Track Option
 +
image:das_server_option.png|DAS Server
 +
</gallery>
  
<br/><br/><br/>
 
 
== File Formats ==
 
== File Formats ==
 
The different file formats used in GenPlay are described on this [[GenPlay File Formats|page]].
 
The different file formats used in GenPlay are described on this [[GenPlay File Formats|page]].
  
<br/><br/><br/>
+
== Using Tracks ==
== Manipulating tracks ==
+
[[image:add_layer.png|right|thumb|150px|Track Menu]]
[[image:manipulating_tracks.png|right|thumb|150px|Track Menu]]
+
 
=== Moving a Track ===
+
=== Handling Tracks ===
 +
 
 +
==== Moving a Track ====
 
To move a track up or down in the track list, just click on the track handler (the left part of the track with the track number) and drag the track to the desired position.
 
To move a track up or down in the track list, just click on the track handler (the left part of the track with the track number) and drag the track to the desired position.
<br/>
+
 
<br/>
+
==== Inserting a Track ====
<br/>
 
=== Inserting a Track ===
 
 
To insert a track, right click on the track handler of the track right under where you want to insert and choose the "Insert" option.
 
To insert a track, right click on the track handler of the track right under where you want to insert and choose the "Insert" option.
<br/>
 
<br/>
 
<br/>
 
=== Copying, Cutting and Pasting a Track ===
 
To copy a track, select the desired track and click on the copy option in the contextual menu or press CTRL+C
 
  
To cut a track, select the desired track and click on the cut option in the contextual menu or press CTRL+X
+
==== Deleting a Track ====
 +
To delete, select  a track and click on the delete option of the contextual menu or press Delete on the keyboard.
 +
 
 +
==== Copying, Cutting and Pasting a Layer ====
 +
[[image:paste_layer.png|right|thumb|200px|Track Menu]]
 +
To copy layers, select the desired track where the layers are and click on the copy option in the contextual menu or press CTRL+C.
 +
A new window will appear showing all layers that can be copied. The user has to select all layers he wants to copy and then click "Ok".
 +
 
 +
To cut layers, select the desired track where the layers are and click on the cut option in the contextual menu or press CTRL+X.
  
To paste a track, select the empty track where you want to paste and click on the paste option in the contextual menu or press CTRL+P
+
To paste a track, select the track where you want to paste and click on the paste option in the contextual menu or press CTRL+P.
<br/>
+
 
<br/>
+
A track can be pasted into a text file in which case the data of the active layer will be pasted as text (the pasted text will be limited to the genomic range currently displayed). It can be pasted in an image editor in which case an image of the track will be pasted. It can also be pasted in a file explorer in which case the layer will be saved as a GPTF (GenPlay Track File) or pasted in an other GenPlay track in which case the copied layers will be added to the track.
<br/>
+
 
=== Deleting a Track ===
+
==== Taking a Screenshot of the Track ====
To delete, select  a track and click on the delete option of the contextual menu or press Delete on the keyboard.
 
<br/>
 
<br/>
 
<br/>
 
=== Renaming a Track ===
 
To rename, select  a track and click on the rename option of the contextual menu or press the F2 key.
 
<br/>
 
<br/>
 
<br/>
 
=== Setting the Height of a Track ===
 
To set the height, select  a track and click on the set height option of the contextual menu or click on the bottom of a track handler and drag the mouse up or down.
 
<br/>
 
<br/>
 
<br/>
 
=== Changing the Appearance of a Track ===
 
[[image:track_appearance.png|left|thumb|150px|Track Appearance]]
 
To change the appearance of a variable or fixed window track, click on the appearance option of the contextual menu.  For any other type of track you can set the number of vertical lines displayed from the contextual menu.
 
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
=== Taking a Screenshot of the Track ===
 
 
To take a screenshot, select a track and choose the "Save as Image" option in the contextual menu.
 
To take a screenshot, select a track and choose the "Save as Image" option in the contextual menu.
<br/>
+
 
<br/>
+
==== Saving an Entire Track ====
<br/>
+
To save an entire track with all its layers in the GenPlay format (GPTF GenPlay Track File), select a track and choose the "Save Track" option in the contextual menu.
=== Showing / Hiding the Stripes ===
+
 
To show stripes on a track, select a track and choose the "Load Stripes" option in the contextual menu. Choose the "Remove Stripes" option to hide the stripes.
+
Please note that the track will only be able to be loaded on project with the exact same assembly (which means that the meta reference should be the same in a Multi-Genome project.
<br/>
+
 
<br/>
+
==== Using the Undo / Redo / Reset Options ====
<br/>
+
The undo, redo and reset options are only available for the Variable and Fixed Window layers. They are accessible from the contextual menu when you right click on the track handler.
=== Using the Undo / Redo / Reset Options ===
 
The undo, redo and reset options are only available for the Variable and Fixed Window tracks. They are accessible from the contextual menu when you right click on the track handler.
 
  
 
The number of undo and redo operations available can be specified as described in the [[#Track Option|Track Option]] section. Note that this operations are memory consuming and reducing the number of undo / redo available can save memory.
 
The number of undo and redo operations available can be specified as described in the [[#Track Option|Track Option]] section. Note that this operations are memory consuming and reducing the number of undo / redo available can save memory.
  
 
The reset operation restore the track to the way it was right after being loaded. A reset operation can also be undone.
 
The reset operation restore the track to the way it was right after being loaded. A reset operation can also be undone.
<br/>
+
 
<br/>
+
=== Track/Layer Settings ===
<br/>
+
 
=== Compressing a Fixed Window Track ===
+
==== General ====
The Fixed Window tracks can also be compressed. To compress a Fixed Window track you need to click on the Compression option of the contextual menu. Compressing a track frees memory but it is not possible to use an operation on a compressed track. Therefore, you need to uncompress the track before using any operation.
+
[[image:track_settings_track.png|right|thumb|300px|Track Settings - General]]
<br/>
+
 
<br/>
+
===== Basic Options =====
<br/>
+
*Name: The name of the track.
 +
*Height: The height of the track.
 +
 
 +
===== Axis Options =====
 +
*Show horizontal lines: Split the track horizontally.
 +
*Horizontal line count: Number of horizontal lines, equally separated.
 +
*Show vertical lines: Split the track vertically.
 +
*Vertical line count: Number of vertical lines, equally separated.
 +
 
 +
===== Score Options =====
 +
*Minimum Score: The minimum score to show.
 +
*Maximum Score: The maximum score to show.
 +
*Auto-rescaled: Enable the automatic score rescaling.
 +
*Score Position: Choose where the score is shown (top/bottom).
 +
*Score Color: Set the font color of the score.
 +
 
 +
==== Layers ====
 +
[[image:track_settings_layer.png|right|thumb|300px|Track Settings - Layers]]
 +
*Name: Click on the name to edit it.
 +
*Type: The type of layer.
 +
*Color: Click to edit the color of the layer.
 +
*Graph Type: Click to change the graph type:
 +
**Curve
 +
**Points
 +
**Bar
 +
**Dense
 +
*Visible: Show/hide the layer.
 +
*Active: Set the layer as "active". The active layer as direct interaction with the mouse pointer and clicks.
 +
*Set For Deletion: If set, the layer(s) will be deleted when clicking "Ok".
  
 
== Operations ==
 
== Operations ==
Once a track is loaded, a right click on the location of the track handler opens a popup menu as shown in the figure below.
+
Once a layer is loaded, a right click on the location of the track handler opens a popup menu as shown in the figure below.
[[image:operation_menu.png|center|thumb|100px|Operation Menu]]
+
[[image:operation_menu.png|center|thumb|600px|Operation Menu]]
The Operation sub-menu of the popup menu contains all the actions that you can use on the selected track.
+
The Operation sub-menu of the popup menu contains all the actions that you can use on the selected layer.
<br/>
+
 
<br/>
+
=== Sequencing/Microarray Layer Operations ===
<br/>
+
Bin-ed and non bin-ed layers do not have all the same operations. They share most of them but some are specific.
=== Variable Window Track Operations ===
+
<gallery widths=250px heights=400px perrow=2>
<br/>
+
image:micro_seq_operations.png|Non bin-ed Microarray/Sequencing Layer Operations
<br/>
+
image:micro_seq_bin_operations.png|Bin-ed Microarray/Sequencing Layer Operations
<br/>
+
</gallery>
==== Operations With a Constant (Addition, Subtraction, Multiplication, Division, Invert) ====
+
 
[[image:operation_constante.png|right|thumb|100px|Operation With Constant]]
+
==== Common operations ====
These operations add, subtract, multiply, divide the score of each window by a constant value. The invert function inverts the socore of each windows. Clicking on any of these operations opens a dialog box where the user can input the value of the constant in a text field, as shown in the figure (example for addition).
+
 
<br style="clear: both" />
+
===== Show History =====
<br/>
+
Show the history of the layer, every changes that have been made since loaded.
<br/>
+
 
<br/>
+
===== Constant Operation =====
==== Two Tracks Operation ====
+
[[image:Micro_seq_constant_operation.png|right|thumb|250px|Operation With Constant]]
[[image:operation_2tracks.png|left|thumb|100px|Two Tracks Operation]]
+
Thes operations use one constant in the following ways:
This allows basic operations between tracks (fixed and variable window tracks only). It can be useful to subtract background, normalize data with a control track or perform many other track manipulations.
+
* Addition: adds the constant to each window (F(x) = x + constant).
The available operations between two tracks are addition, subtraction, multiplication, division, average, minimum, maximum.
+
* Subtraction: substracts the constant to each window (F(x) = x - constant).
<br style="clear: both" />
+
* Multiplication: multiplies the score by the constant(F(x) = x * constant).
<br/>
+
* Division: divides the score by the constant (F(x) = x / constant).
<br/>
+
* Inversion: inverts the score of each windows (F(x) = constant / x).
<br/>
+
* Unique Score: sets all windows to an unique score (F(x) = constant).
==== Indexation ====
+
The function can also be applied to null windows by checking the box.
Indexation can be useful to compare multiple tracks at the same scale. Importantly, indexing does not work well in the presence of outliers. Indexing works best if outliers are eliminated or removed first using a filter (see below). To index the scores of a track based on the greatest and the smallest value of the whole genome you need to choose a new minimum and a new maximum value.
+
 
<br/>
+
===== Two Layers Operation =====
<br/>
+
This allows operations between two Sequencing/Microarray layers, bin-ed and non bin-ed.
<br/>
+
 
==== Indexation Per Chromosome ====
+
In order to set the operations, few windows appear in the following order:
This operation indexes each chromosome separately. Users enter the new minimum and maximum score values in a text field. When the OK button is clicked, the resulting track is displayed.
+
# A first window appears in order to select the second layer.
<br/>
+
# The second window asks in which track the resulting layer will be put.
<br/>
+
# The third and last window offers the algorithms to complete the operation (x1: score first layer; x2: score second layer):
<br/>
+
* Addition: add scores (x = x1 + x2).
==== Log ====
+
* Subtraction: substract scores (x = x1 - x2).
 +
* Multiplication: multiply scores (x = x1 * x2).
 +
* Division: divide scores (x = x1 / x2).
 +
* Average: average score (x = (x1 + x2) / 2).
 +
* Maximum: keeps the highest score.
 +
* Minimum: keeps the lowest score.
 +
 
 +
'''Note:''' The only way the resulting layer would be a bin-ed layer is to make an operation between two bin-ed layer having the same bin size. Any other case will result in a non bin-ed layer.
 +
 
 +
===== Index =====
 +
Indexation can be useful to compare multiple layers at the same scale. It "re-scales" existing scores to a new range defined by the user.
 +
 
 +
If scores go from 10 to 600 but for some reason would need to be observed between 0 and 100, this operation will do the work.
 +
 
 +
It will first ask for the new minimum and the new maximum. The next dialog asks to perfom the re-scaling by chromosome independently or genome wide.
 +
 
 +
Using the previous example, for a new scale of [0; 100] if the first chromosome as a maximum score of 600 and the second one has a maximum score of 800; 800 will become the reference value of 100 for both chromosomes if the operation is processed genome wide. If the operation is processed by chromosome independently, 600 will become the reference value of 100 for the first chromosome, and 800 for the second chromosome.
 +
 
 +
Since this operation uses the minimum and maximum scores, it is very important to note that indexing does not work well in the presence of outliers. Indexing works best if outliers are eliminated or removed first using a filter (see below).
 +
 
 +
===== Log =====
 
[[image:operation_log.png|right|thumb|100px|Logarithm Bases]]
 
[[image:operation_log.png|right|thumb|100px|Logarithm Bases]]
 
For each window, the log operation applies the function f(x) = log(x), where x is the window score. The base of the logarithm function can be selected between either 2 (binary log), e (natural log) or 10 (common log).
 
For each window, the log operation applies the function f(x) = log(x), where x is the window score. The base of the logarithm function can be selected between either 2 (binary log), e (natural log) or 10 (common log).
  
==== Log With Damper ====
+
===== Normalize =====
For each window, this operation applies the function f(x) = log((x + damper)  /  (avg + damper)), where x is the window score. The base of the logarithm function can be either 2 (binary log), e (natural log) or 10 (common log).
+
[[image:operation_Normalize.png|left|thumb|100px|Normalization Coefficient]]
 +
After a normalize operation the score of each window is divided by the result of the Score Count operation and multiplied by a specified fixed value. By default, after normalization the scores are expressed per 10 millions reads.
  
The log with damper operation is useful to normalize some micro array data (Nimblegen for instance) see [http://genome.cshlp.org/content/19/12/2288.short Desprat et al. Genome Res. 2009 Dec;19(12):2288-99]
+
===== Standard Score =====
 +
Calculates the standard score for the selected layer i.e. (x - avg) / stdev; where x is the score, avg is the average score of the layer and stdev is the standard deviation of the scores of the layer.
  
==== Normalize ====
+
===== Show Statistics =====
[[image:operation_Normalize.png|left|thumb|100px|Normalization Coefficient]]
+
Shows the minimum, maximum, average scores per chromosome and genome wide. Also shows the number of windows, the sum of the window lengths with a non zero score and the sum of the scores (normalized by the window lengths).
After a normalize operation the score of each window is divided by the result of the Score Count operation and multiplied by a specified fixed value. By default, after normalization the scores are expressed per 10 millions reads.
 
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
==== Standard Score ====
 
Calculates the standard score for the selected track i.e. (x - avg) / stdev; where x is the score, avg is the average score of the track and stdev is the standard deviation of the scores of the track.
 
<br/>
 
<br/>
 
<br/>
 
==== Minimum, Maximum ====
 
[[image:operation_choosechromo.png|right|thumb|100px|Select Chromosomes]]
 
The maximum and minimum operations display respectively the greatest and the smallest score on the selected chromosomes. It shows a menu asking to select chromosomes.
 
<br/>
 
<br/>
 
<br/>
 
==== Score Count ====
 
The score count operation computes the sum of the window scores on the selected chromosomes.
 
<br/>
 
<br/>
 
<br/>
 
==== Average ====
 
This operation computes the average score of the windows of the selected chromosomes. Note that the score of each window is weighted by the length of the window.
 
  
==== Standard Deviation ====
+
===== Filter =====
[[image:operation_stdev.png|left|thumb|100px|Standard Deviation]]
 
This operation computes the standard deviation of the scores of the windows of the selected chromosomes. Note that the scores of each window are weighted by the length of the window.
 
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
==== Count Non-Null Length ====
 
This operation returns the sum of the lengths of the windows with a score different from zero on the selected chromosomes.
 
<br/>
 
<br/>
 
<br/>
 
==== Filter ====
 
 
GenPlay provides four different filters:
 
GenPlay provides four different filters:
<br/>
+
 
<br/>
+
====== Percentage Filter ======
<br/>
 
===== Percentage Filter =====
 
 
[[image:operation_pfilter.png|right|thumb|130px|Percentage Filter]]
 
[[image:operation_pfilter.png|right|thumb|130px|Percentage Filter]]
 
This option filters the X% lowest values and the Y% greatest values where X and Y are two decimals and where X + Y <= 100.
 
This option filters the X% lowest values and the Y% greatest values where X and Y are two decimals and where X + Y <= 100.
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
<br style="clear: both" />
+
 
<br/>
+
====== Threshold Filter ======
<br/>
 
<br/>
 
===== Threshold Filter =====
 
 
[[image:operation_tfilter.png|right|thumb|130px|Threshold Filter]]
 
[[image:operation_tfilter.png|right|thumb|130px|Threshold Filter]]
 
This option removes the values that are lower than X OR greater than Y, where X and Y are two specified threshold values.
 
This option removes the values that are lower than X OR greater than Y, where X and Y are two specified threshold values.
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
<br style="clear: both" />
+
 
<br/>
+
====== Band-Stop Filter ======
<br/>
 
<br/>
 
===== Band-Stop Filter =====
 
 
[[image:operation_bfilter.png|right|thumb|130px|Band-Stop Filter]]
 
[[image:operation_bfilter.png|right|thumb|130px|Band-Stop Filter]]
 
This option removes values between two specified threshold.
 
This option removes values between two specified threshold.
<br style="clear: both" />
+
 
<br/>
+
====== Count Filter ======
<br/>
 
<br/>
 
===== Count Filter =====
 
 
[[image:operation_cfilter.png|right|thumb|130px|Count Filter]]
 
[[image:operation_cfilter.png|right|thumb|130px|Count Filter]]
 
This option filters the X lowest values and the Y greatest values, where X and Y are two specified integers.
 
This option filters the X lowest values and the Y greatest values, where X and Y are two specified integers.
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
<br style="clear: both" />
+
 
<br/>
+
===== Transfrag =====
<br/>
+
This operation aggregates the windows of the selected layer that are separated by a gap smaller than a specified size (in bp).  
<br/>
 
==== Transfrag ====
 
This operation aggregates the windows of the selected track that are separated by a gap smaller than a specified size (in bp).  
 
  
 
The score of the new window can be the sum, the average or the maximum of the scores of the aggregated windows.
 
The score of the new window can be the sum, the average or the maximum of the scores of the aggregated windows.
<br/>
+
 
<br/>
+
===== Score Distribution Histogram =====
<br/>
+
The show repartition operation generates a graph showing the distribution of the scores of the selected layers. The options for the type of plot are score v/s window count and score v/s base pair count.
==== Show Repartition ====
 
The show repartition operation generates a graph showing the distribution of the scores of the selected tracks. The options for the type of plot are score v/s window count and score v/s base pair count.
 
  
 
The user needs to choose a size for the bins of scores. The graphics will show, depending on the selection, how many windows or how many base pair there is for each bin of scores.
 
The user needs to choose a size for the bins of scores. The graphics will show, depending on the selection, how many windows or how many base pair there is for each bin of scores.
<br/>
+
 
<br/>
+
===== Convert Layer =====
<br/>
+
This operation converts the current layer into another layer among the following:
==== Generate Fixed Window Track ====
+
*Gene Annotation Layer
This operation generates a fixed window track, with the specified bin size and data precision from the selected variable window track.
+
*Microarray/Sequencing Layer bin/non-bin
<br/>
+
*Mask Layer
<br/>
+
 
<br/>
+
==== Non-Binned Layers Only ====
=== Fixed Window Track Operations ===
+
 
<br/>
+
===== CG Methylation Profile =====
<br/>
+
This operation computes the methylation values on CG sequences by combining the value on the C position and the value on the G position.
<br/>
+
 
==== Operations With a Constant (Addition, Subtraction, Multiplication, Division, Invert) ====
+
The result is a list of windows covering the CG sequences and having the sum of the score on the C and the score on the G base.
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
+
 
<br/>
+
This is based on data fron a sequence layer in order to find CG sequences.
<br/>
+
 
<br/>
+
==== Binned Layers Only ====
==== Two Tracks Operation ====
+
 
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
+
===== Smooth =====
<br/>
+
The smooth operation can be processed according to the 3 following algorithms:
<br/>
+
 
<br/>
+
====== Gauss Smoothing ======
==== Gauss ====
 
 
[[image:operation_fwt_gaussian.png|right|thumb|150px|Sigma Value]]
 
[[image:operation_fwt_gaussian.png|right|thumb|150px|Sigma Value]]
This operation applies a [http://en.wikipedia.org/wiki/Gaussian_filter Gaussian filter] to the track, depending on the sigma value provided by the user.
+
This operation applies a [http://en.wikipedia.org/wiki/Gaussian_filter Gaussian filter] to the layer, depending on the sigma value provided by the user.
  
G(x) = (1 / () σ) * e-x2 / 2 σ2    
+
G(x) = (1 / v (2?) s) * e-x2 / 2 s2    
  
Where, x is the score and σ is the standard deviation of the track.
+
Where, x is the score and s is the standard deviation of the layer.
  
 
You can choose the extrapolate option to "fill" the windows with a score of zero.
 
You can choose the extrapolate option to "fill" the windows with a score of zero.
<br/>
 
<br/>
 
<br/>
 
==== Moving Average ====
 
For each window of the track, compute the average on a region of a specified size center on the window and score the window with the result of this average. The half-size of the region is prompted prior to the calculation.
 
  
You can choose the extrapolate option to "fill" the windows with a score of zero.
+
====== Loess Smoothing ======
<br/>
+
This operation computes the Loess regression of degree 1 on the selected layer.  
<br/>
 
<br/>
 
==== Loess Regression ====
 
This operation computes the Loess regression of degree 1 on the selected track.  
 
  
 
For each x value where a y value is to be calculated, the Loess technique performs a regression on points in a moving range around the x value, where the values in the moving range are weighted according to their distance from this X value.
 
For each x value where a y value is to be calculated, the Loess technique performs a regression on points in a moving range around the x value, where the values in the moving range are weighted according to their distance from this X value.
Line 1,082: Line 1,118:
  
 
You can choose the extrapolate option to "fill" the windows with a score of zero.
 
You can choose the extrapolate option to "fill" the windows with a score of zero.
<br/>
 
<br/>
 
<br/>
 
  
==== Indexation ====
+
====== Moving Average Smoothing ======
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
+
For each window of the layer, compute the average on a region of a specified size center on the window and score the window with the result of this average. The half-size of the region is prompted prior to the calculation.
<br/>
 
<br/>
 
<br/>
 
==== Indexation Per Chromosome ====
 
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
 
<br/>
 
<br/>
 
<br/>
 
==== Log ====
 
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
 
<br/>
 
<br/>
 
<br/>
 
==== Log With Damper ====
 
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
 
<br/>
 
<br/>
 
<br/>
 
==== Normalize ====
 
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
 
<br/>
 
<br/>
 
<br/>
 
==== Standard Score ====
 
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
 
<br/>
 
<br/>
 
<br/>
 
==== Minimum, Maximum ====
 
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
 
<br/>
 
<br/>
 
<br/>
 
==== Bin Count ====
 
The bin count operation displays the number of windows (bins) with a score different from 0 on the selected chromosomes. It shows a menu asking to select chromosomes.
 
<br/>
 
<br/>
 
<br/>
 
==== Score Count ====
 
[[image:operation_choosechromo.png|right|thumb|100px|Select Chromosomes]]
 
The score count operation returns the sum of the scores of each window of the selected chromosomes of the selected track. It shows a menu to select the chromosomes. If the track was initially loaded using some of the reads to summarize the data by windows this returns the total number of mapped reads in the experiments.
 
<br/>
 
<br/>
 
<br/>
 
==== Average ====
 
Computes the average score of the windows of the selected chromosomes.
 
<br/>
 
<br/>
 
<br/
 
>==== Standard Deviation ====
 
Computes the standard deviation of the scores of the windows of the selected chromosomes.
 
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
  
==== Correlation ====
+
You can choose the extrapolate option to "fill" the windows with a score of zero.
[[image:operation_fwt_correlation.png|right|thumb|100px|Correlation Report]]
 
The correlation operation computes the Pearson’s correlation between the score values of two tracks. The two tracks need to have the same bin size. The following formula is used to calculate the correlation:
 
  
ρ = ( ∑ xi yi – n x’ y’) / ((n - 1) σx σy)
+
===== Find Peaks =====
 +
The find peak operation offers three different algorithms that can be used to find the peaks:
  
Where:
+
====== Standard Deviation Peak Finder ======
* ρ is the Pearson’s correlation
 
* xi and yi are the scores of the tracks
 
* n is the number of values
 
* x’ and y’ are the means of the scores of the tracks
 
* σx and σy are the standard deviations of the scores of the tracks
 
 
 
The figure on the right shows a correlation report.
 
 
 
'''Note:''' The correlation is computed only on the windows that are different from zero on both track. If one of the track has a zero value window, the window of the other track with the same coordinate will be skipped as well.
 
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
==== Filter ====
 
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
 
<br/>
 
<br/>
 
<br/>
 
==== Find Peaks ====
 
The find peak operation offers three different algorithms that can be used to find the peaks:
 
<br/>
 
<br/>
 
<br/>
 
===== Standard Deviation Peak Finder =====
 
 
[[image:operation_fwt_sfinder.png|left|thumb|150px|Standard Deviation Peak Finder]]
 
[[image:operation_fwt_sfinder.png|left|thumb|150px|Standard Deviation Peak Finder]]
 
The standard deviation peak finder prompts the user to enter two parameters.
 
The standard deviation peak finder prompts the user to enter two parameters.
Line 1,183: Line 1,136:
  
 
For a window to be accepted, its standard deviation needs to be at least ‘T’ times greater than the value of the standard deviation of the chromosome.
 
For a window to be accepted, its standard deviation needs to be at least ‘T’ times greater than the value of the standard deviation of the chromosome.
<br style="clear: both" />
+
 
<br/>
+
====== Density Peak Finder ======
<br/>
 
<br/>
 
===== Density Peak Finder =====
 
 
[[image:operation_fwt_dfinder.png|right|thumb|150px|Density Peak Finder]]
 
[[image:operation_fwt_dfinder.png|right|thumb|150px|Density Peak Finder]]
 
The Density Finder works as follows:
 
The Density Finder works as follows:
Line 1,194: Line 1,144:
  
 
For the window under consideration to be accepted, at least ‘P’ percentage of values must be above the high threshold ‘H’ or at least ‘P’ percentage of values must be below the low threshold ‘L’.
 
For the window under consideration to be accepted, at least ‘P’ percentage of values must be above the high threshold ‘H’ or at least ‘P’ percentage of values must be below the low threshold ‘L’.
<br style="clear: both" />
+
 
<br/>
+
====== Island Finder ======
<br/>
 
<br/>
 
===== Island Finder =====
 
 
[[image:operation_fwt_ifinder.png|left|thumb|150px|Island Finder]]
 
[[image:operation_fwt_ifinder.png|left|thumb|150px|Island Finder]]
 
The Island Finder is based on the algorithm described in the paper  
 
The Island Finder is based on the algorithm described in the paper  
Line 1,208: Line 1,155:
 
* Island score: Depicts the islands by considering the score.
 
* Island score: Depicts the islands by considering the score.
 
* Island Summit: Depicts the island with the summit of the input island as a score.
 
* Island Summit: Depicts the island with the summit of the input island as a score.
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
==== Transfrag ====
 
[[image:operation_transfrag.png|right|thumb|150px|Tracks Before and After Transfrag]]
 
This operation aggregates the bins of the selected track that are separated by a gap (bins with a score of zero) smaller than a specified size.
 
  
The score of the new window can be the sum, the average or the maximum of the scores of the aggregated windows. The result track can either be a fixed window track or a gene track.
+
===== Correlation =====
<br style="clear: both" />
+
[[image:operation_fwt_correlation.png|right|thumb|100px|Correlation Report]]
<br/>
+
The correlation operation computes the Pearson’s correlation between the score values of two layers. The two layers need to have the same bin size. The following formula is used to calculate the correlation:
<br/>
+
 
<br/>
+
? = ( ? xi yi – n x’ y’) / ((n - 1) sx sy)
==== Change Bin Size ====
+
 
The change bin size operation changes the size of the bins of the track. It shows a dialog box allowing the user to enter the new bin size.
+
Where:
<br/>
+
* ? is the Pearson’s correlation
<br/>
+
* xi and yi are the scores of the layers
<br/>
+
* n is the number of values
==== Change Precision ====
+
* x’ and y’ are the means of the scores of the layers
The change precision operation allows you to change the data precision of the selected track. Refer to the [[#Data Precision|Data Precision]] section for further information regarding the data precision.
+
* sx and sy are the standard deviations of the scores of the layers
<br/>
+
 
<br/>
+
The figure on the right shows a correlation report.
<br/>
+
 
==== Density ====
+
'''Note:''' The correlation is computed only on the windows that are different from zero on both layer. If one of the layer has a zero value window, the window of the other layer with the same coordinate will be skipped as well.
This operation generates a new fixed window track where the score of the windows represent the density of non null windows in the neighborhood of the windows.
+
 
 +
===== Density =====
 +
This operation generates a new fixed window layer where the score of the windows represent the density of non null windows in the neighborhood of the windows.
 
You first need to enter the size S of the neighborhood.
 
You first need to enter the size S of the neighborhood.
 
For each window W, the algorithm count how many of the S windows before W and the S windows after W have a score different from zero. This value is then divided by 2 * S + 1 and the result is the score of W.
 
For each window W, the algorithm count how many of the S windows before W and the S windows after W have a score different from zero. This value is then divided by 2 * S + 1 and the result is the score of W.
<br/>
+
 
<br/>
+
===== Intervals Scoring =====
<br/>
+
This operation needs two layers:
==== Show Repartition ====
+
* The selected layer that defines the scores
Please refer to the equivalent operation in the [[#Variable Window Track Operations|Variable Window Track Operations]] section for information about this functionality.
+
* A second layer that defines the intervals
<br/>
+
This operation generates a new layer containing the intervals of the "interval track". For each interval the algorithm then looks at the corresponding scores in the score layer, and compute either the maximum, the average or the sum of all the scores that fall in the interval. This value is the new score value in the result layer.
<br/>
+
 
<br/>
+
You can also choose to use only a certain percentage of the greatest scores that falls in the interval.
==== Concatenate ====
+
 
[[image:operation_select_tracks.png|right|thumb|150px|Select Tracks to Concatenate]]
+
===== Concatenate =====
The concatenate operations allows you to generate a file containing the scores of multiple fixed window tracks that have the same bin size.
+
[[image:operation_select_tracks.png|right|thumb|150px|Select Layers to Concatenate]]
 +
The concatenate operations allows you to generate a file containing the scores of multiple fixed window layers that have the same bin size.
 
The output file contains the following fields:
 
The output file contains the following fields:
 
# chromosome
 
# chromosome
 
# start position
 
# start position
 
# stop position
 
# stop position
# score track 1
+
# score layer 1
# score track 2
+
# score layer 2
# score track 3
+
# score layer 3
 
# ...
 
# ...
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
==== Interval Summarization ====
 
This operation needs two tracks:
 
* The selected track that defines the scores
 
* A second track that defines the intervals
 
This operation generates a new track containing the intervals of the "interval track". For each interval the algorithm then looks at the corresponding scores in the score track, and compute either the maximum, the average or the sum of all the scores that fall in the interval. This value is the new score value in the result track.
 
  
You can also choose to use only a certain percentage of the greatest scores that falls in the interval.
+
=== Gene Layer Operations ===
<br/>
+
Directly on a gene layer, you can:
<br/>
+
# Double click on a gene to open a web page describing the gene. Make sure that your input file contains a geneDBURL line as described in the [[#File Formats|File Formats]] section in order to enable this option.
<br/>
 
==== Generate Variable Window Track ====
 
This operation generate a variable window track from the selected fixed window track.
 
<br/>
 
<br/>
 
<br/>
 
=== Gene Track Operations ===
 
Directly on a gene track, you can:
 
# Double click on a gene to open a web page describing the gene. Make sure that your input file contains a searchURL line as described in the [[#File Formats|File Formats]] section in order to enable this option.
 
 
# Put the mouse over a gene to have some information about the name and the score of the gene. If the exons of the gene have different scores you can put your mouse over an exon to have the exon score.
 
# Put the mouse over a gene to have some information about the name and the score of the gene. If the exons of the gene have different scores you can put your mouse over an exon to have the exon score.
<br/>
+
 
<br/>
+
==== Score Count ====
<br/>
+
This operation count the sum of all scores.
 +
 
 +
A window asks first to select chromosomes to include in the calculation (all by default).
 +
 
 +
==== Average ====
 +
This operation computes the average of all scores.
 +
 
 +
A window asks first to select chromosomes to include in the calculation (all by default).
 +
 
 +
==== Count Genes ====
 +
This operation count the total number of genes.
 +
 
 +
A window asks first to select chromosomes to include in the calculation (all by default).
 +
 
 +
==== Count Genes with Non-Null Score ====
 +
This operation count the total number of genes excluding the ones with a score of 0.
 +
 
 +
A window asks first to select chromosomes to include in the calculation (all by default).
 +
 
 +
==== Count Exons ====
 +
This operation count the total number of exons.
 +
 
 +
A window asks first to select chromosomes to include in the calculation (all by default).
 +
 
 
==== Search Gene ====
 
==== Search Gene ====
[[image:operation_gene_search.png|left|thumb|100px|Find Gene]]
+
[[image:gene_search_gene.png|left|thumb|100px|Search Gene]]
Use this option to search a gene on the selected track by typing the name of the gene.
+
Use this option to search a gene on the selected layer by typing the name of the gene.
  
 
Check the Match Case option if you want the search to be case sensitive.
 
Check the Match Case option if you want the search to be case sensitive.
 
Check the whole word option if you want to search genes where the input match the whole name of the gene.
 
Check the whole word option if you want to search genes where the input match the whole name of the gene.
 
Press next or previous to find respectively the next or previous gene found.
 
Press next or previous to find respectively the next or previous gene found.
You can also open the Find Gene dialog by pressing CTRL+F after selecting a gene track.
+
You can also open the Find Gene dialog by pressing CTRL+F after selecting a gene layer.
<br style="clear: both" />
+
 
<br/>
 
<br/>
 
<br/>
 
 
==== Extract Intervals ====
 
==== Extract Intervals ====
[[image:operation_gene_extract_intervals.png|right|thumb|300px|Extract Intervals]]
+
[[image:gene_extract_intervals.png|right|thumb|200px|Extract Intervals]]
This option allows you to extract intervals defined relatively to the beginning, the end or the middle of a gene and to generate a new gene track showing these intervals.
+
This option allows you to extract intervals defined relatively to the beginning, the end or the middle of a gene and to generate a new gene layer showing these intervals.
 +
 
 +
You can, for example, defined promoters as regions that starts 100bp before the beginning of genes and that ends 150bp after the beginning of genes. This option would allow you to generate a new layer from this parameters.
  
You can, for example, defined promoters as regions that starts 100bp before the beginning of genes and that ends 150bp after the beginning of genes. This option would allow you to generate a new track from this parameters.
 
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
 
==== Extract Exons ====
 
==== Extract Exons ====
[[image:operation_gene_extract_exon.png|right|thumb|300px|Extract Exons]]
+
[[image:gene_extract_exons.png|right|thumb|150px|Extract Exons]]
This option generate a new gene track showing only the exons of the genes of the selected track.  
+
This option generate a new gene layer showing only the exons of the genes of the selected layer.  
  
 
You can choose between the three following options:
 
You can choose between the three following options:
Line 1,309: Line 1,251:
 
# Extract the last exon
 
# Extract the last exon
 
# Extract all the exons
 
# Extract all the exons
<br style="clear: both" />
+
 
<br/>
+
==== Unique Score ====
<br/>
+
[[image:gene_unique_score.png|right|thumb|150px|Unique Score]]
<br/>
+
This operation sets the same score for all exons.
 +
 
 
==== Score Exons ====
 
==== Score Exons ====
To execute this operation you need to have at least one fixed or variable window track loaded. For each exon of each gene of the selected gene track this operation is going to compute either the average, the maximum or the sum of all the windows of the specified fixed or variable window track that falls in the exon.
+
[[image:gene_score_exons.png|right|thumb|200px|Score Exons]]
<br/>
+
To execute this operation you need to have at least one microarray/sequencing layer loaded. For each exon of each gene of the selected gene layer, this operation computes a new score based on the window score from the selected layer that falls into the exon. There are 3 different ways to compute the new score:
<br/>
+
*Base Coverage Sum
<br/>
+
*Maximum coverage
 +
*RPKM
  
 
==== Filter ====
 
==== Filter ====
This option provides four different filters for gene tracks:
+
This option provides four different filters for gene layers:
<br/>
+
 
<br/>
 
<br/>
 
 
===== Percentage Filter =====
 
===== Percentage Filter =====
[[image:operation_pfilter.png|right|thumb|130px|Percentage Filter]]
+
[[image:gene_filter_percentage.png|right|thumb|130px|Percentage Filter]]
 
This option filters the genes with the X% lowest overall score and the Y% greatest overall scores where X and Y are two decimals and where X + Y <= 100.
 
This option filters the genes with the X% lowest overall score and the Y% greatest overall scores where X and Y are two decimals and where X + Y <= 100.
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
<br style="clear: both" />
+
 
<br/>
 
<br/>
 
<br/>
 
 
===== Threshold Filter =====
 
===== Threshold Filter =====
[[image:operation_tfilter.png|right|thumb|130px|Threshold Filter]]
+
[[image:gene_filter_threshold.png|right|thumb|130px|Threshold Filter]]
 
This option filters the genes with an overall score that are lower than X OR greater than Y, where X and Y are two specified threshold values.
 
This option filters the genes with an overall score that are lower than X OR greater than Y, where X and Y are two specified threshold values.
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
<br style="clear: both" />
+
 
<br/>
 
<br/>
 
<br/>
 
 
===== Band-Stop Filter =====
 
===== Band-Stop Filter =====
[[image:operation_bfilter.png|right|thumb|130px|Band-Stop Filter]]
+
[[image:gene_filter_band-stop.png|right|thumb|130px|Band-Stop Filter]]
 
This option removes the genes with an overall score between two specified threshold.
 
This option removes the genes with an overall score between two specified threshold.
<br style="clear: both" />
+
 
<br/>
 
<br/>
 
<br/>
 
 
===== Count Filter =====
 
===== Count Filter =====
[[image:operation_cfilter.png|right|thumb|130px|Count Filter]]
+
[[image:gene_filter_count.png|right|thumb|130px|Count Filter]]
 
This option filters the X lowest scored genes and the Y greatest scored genes, where X and Y are two specified integers.
 
This option filters the X lowest scored genes and the Y greatest scored genes, where X and Y are two specified integers.
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
 
You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).
<br style="clear: both" />
 
<br/>
 
<br/>
 
<br/>
 
  
 
==== Filter Strand ====
 
==== Filter Strand ====
You need to select a strand when prompted.  At the end of the operation the track will contain only the genes on the selected strand. All the other genes will have been removed.
+
You need to select a strand when prompted.  At the end of the operation the layer will contain only the genes on the selected strand. All the other genes will have been removed.
<br/>
 
<br/>
 
<br/>
 
  
 
==== Rename Genes ====
 
==== Rename Genes ====
 
This operation allows you to change the name of the genes. You need to provide a text file where each line contains the current gene name and the new gene name separated by a tabulation. Every time a gene with a name from the first column is found this name will be replace by the new gene name from the second column.
 
This operation allows you to change the name of the genes. You need to provide a text file where each line contains the current gene name and the new gene name separated by a tabulation. Every time a gene with a name from the first column is found this name will be replace by the new gene name from the second column.
<br/>
+
 
<br/>
 
<br/>
 
 
==== Distance Calculation ====
 
==== Distance Calculation ====
 
Development in progress, coming soon.
 
Development in progress, coming soon.
<br/>
+
 
<br/>
 
<br/>
 
 
==== Score Repartition Around Start ====
 
==== Score Repartition Around Start ====
You first need to select a Fixed window track containing the scores.  After that, you need to select the chromosomes on which you want to execute the operation. You also need to specify a bin size S, a bin count C and a method for the calculation of the scores.   
+
You first need to select a Fixed window layer containing the scores.  After that, you need to select the chromosomes on which you want to execute the operation. You also need to specify a bin size S, a bin count C and a method for the calculation of the scores.   
  
 
The operation will create C bins on each side of the start position of each gene.  The size S of each bin is in base-pair.  Depending of the method of calculation chosen the operation is going to compute the sum, the maximum or the average of the scores for each corresponding bin from each gene and display a bar  graph of the result. The data can be exported by right-clicking on the graph and using the "save as" function.  
 
The operation will create C bins on each side of the start position of each gene.  The size S of each bin is in base-pair.  Depending of the method of calculation chosen the operation is going to compute the sum, the maximum or the average of the scores for each corresponding bin from each gene and display a bar  graph of the result. The data can be exported by right-clicking on the graph and using the "save as" function.  
<br/>
+
 
Multi-curve graph can be generated using the following procedure:  
+
Multi-curve graph can be generated using the following procedure:
To generate a comparison between 2 fixed-window tracks: 1) Perform an analysis for the first track as described above. 2) Save it to your hard drive. 3) Close the graph window. 4) Perform the same analysis on the second track. 4) Right click on the second graph and choose the load data option. 5) Load the first analysis. Colors of the curves, type of graphs (bar, points, curve) and scale can be adjusted by right-clicking on the graph.  Procedure can be used to load more than two graphs. To produce more complex graphs we recommend loading the saved data on your favorites spreadsheet software.  
+
 
 +
To generate a comparison between 2 fixed-window layers: 1) Perform an analysis for the first layer as described above. 2) Save it to your hard drive. 3) Close the graph window. 4) Perform the same analysis on the second layer. 4) Right click on the second graph and choose the load data option. 5) Load the first analysis. Colors of the curves, type of graphs (bar, points, curve) and scale can be adjusted by right-clicking on the graph.  Procedure can be used to load more than two graphs. To produce more complex graphs we recommend loading the saved data on your favorites spreadsheet software.  
 
Score Repartition Around Start
 
Score Repartition Around Start
<br/>
 
<br/>
 
<br/>
 
  
=== Sequence Track Operations ===
+
=== Repeat Layer Operations ===
There is currently no operation available for the sequence tracks.
+
 
<br/>
+
==== Convert Into Mask ====
<br/>
+
This operation can be used to convert a repeat layer into a mask layer. The user will be prompted to select the families of repeats that should be included in the conversion. The result layer will contain all the selected repeat families.
<br/>
+
 
=== SNP Track Operations ===
+
=== DNA Sequence Layer Operations ===
Directly on a SNP track, you can put the mouse over a SNP to have some extra information about the name or the base counts ratio of the SNP.
+
 
<br/>
+
==== Compare Sequences ====
<br/>
+
This operation takes two sequence layer in input and generate a variable window layer showing the differences between the two sequence layers. For each position where the sequences are different the result layer will show a window of 1bp with the following score:
<br/>
+
 
==== Find Next / Find Previous ====
+
{|
This operation set the position of the screen middle bar (red line) on the position of the next or the previous SNP on the track.
+
! Nucleotide of the 1st layer
<br/>
+
! Nucleotide of the 2nd layer
<br/>
+
! Score
<br/>
+
|-
==== Threshold Filter ====
+
| A
[[image:operation_SNP_threshold.png|right|thumb|100px|Threshold Filter]]
+
| C
The threshold filter operation removes all the SNPs with a first base count or the second base count smaller than specified thresholds.
+
! 12
<br style="clear: both" />
+
|-
<br/>
+
| A
<br/>
+
| G
<br/>
+
! 13
==== Ratio Filter ====
+
|-
[[image:operation_SNP_ratio.png|left|thumb|100px|Ratio Filter]]
+
| A
The ratio filter operation removes all the SNPs where the ratio (first base count) / (second base count) is smaller or greater than specified values.
+
| T
<br style="clear: both" />
+
! 14
<br/>
+
|-
<br/>
+
| C
<br/>
+
| A
==== Remove SNPs Not In Genes ====
+
! 21
This operation will ask you to select a gene track in order to remove all the SNPs from the selected track that are not inside the genes of the gene track.
+
|-
<br/>
+
| C
<br/>
+
| G
<br/>
+
! 23
=== Repeat Track Operations ===
+
|-
There is currently no operation available for the repeat track.
+
| C
 +
| T
 +
! 24
 +
|-
 +
| G
 +
| A
 +
! 31
 +
|-
 +
| G
 +
| C
 +
! 32
 +
|-
 +
| G
 +
| T
 +
! 34
 +
|-
 +
| T
 +
| A
 +
! 41
 +
|-
 +
| T
 +
| C
 +
! 42
 +
|-
 +
| T
 +
| G
 +
! 43
 +
|}
 +
 
 +
=== Mask Layer Operations ===
 +
 
 +
==== Apply Mask ====
 +
Applying a mask means filtering the data that are not inside the windows of the mask.
 +
 
 +
All information overlapping a mask window will be kept, everything else will be lost.
 +
 
 +
==== Invert Mask ====
 +
This operation simply inverts all windows of the mask. All current windows become empty spaces, all empty spaces become windows.
 +
 
 +
=== Variant Layer Operations ===
 +
 
 +
==== Edit Variant Layer ====
 +
[[image:Mg_add_layer_variant_selection.png|right|thumb|200px|Edit Variant Layer Dialog]]
 +
This feature will popup the same window used to load the Variant Layer offering the possibility to change the variation types to show.
 +
 
 +
==== Generate track statistics ====
 +
This operations generates various statistics about loaded information.
 +
 
 +
It also compares these statistics before and after applying any filters in order to see their effects.
 +
 
 +
==== Filters ====
 +
<div id="MGFiltersExplanation"></div>
 +
Filters can be applied on Variant Layers, they interact directly on data found in the VCF in order to select on data of interest.
 +
All filters are set in the Filters section of the Multi-Genome Project Properties dialog.
 +
 
 +
Simply click on "Add" in order to create a new filter. As shown below, a new window appears to define the filter.
 +
[[image:mg_filter_selection.png|center|thumb|800px|Filter selection dialog]]
 +
* Layer(s): The layers affected by the filter.
 +
* File: A filter is also file specific, if data to filter are separated over different files, several filters must be created.
 +
* ID: A filter can be set on any ID defined on the header of the VCF. IDs can be of different types which affects the selection of the next steps.
 +
* Genome(s): Any "FORMAT" ID will require to know which genome(s) is/are concerned by the filter.
 +
* Operator: If more than one genome has been selected in the previous step, the operator will decide how the result from each genome will be processed in order to have a result for the whole line.
 +
** And: The selected ID value from each genome must pass the filter.
 +
** Or: At least one selected ID value must pass the filter.
 +
** Sum: If the selected ID value is an integer, the sum value from each genome will be filtered.
 +
** Mean: If the selected ID value is an integer, the mean value from all genomes will be filtered.
 +
* Filter: This filter panel will change according to the selected ID type.
 +
** String: The input value will be tested and the user has to choose if the value must be present or must not be present in the ID value.
 +
** Number: The ID value is here tested using one of the given numeric operator against an input value. The ID value can also be tested against two input value using the secong part of the filter, the user then has to choose how both filters are handled.
 +
** Flag: When the ID value is a flag, it reacts as boolean, meaning the value is here, or is not.
 +
** Genotype: The genotype ID has a special filter editor in order to set it up more easily. The regular string editor can be found below. The genotype can be homozygote/heterofygote/phased/unphased.
 +
<gallery widths=160px heights=160px perrow=4>
 +
image:mg_filter_string.png|String Editor
 +
image:mg_filter_number.png|Number Editor
 +
image:mg_filter_flag.png|Flag Editor
 +
image:mg_filter_genotype.png|Genotype Special Editor
 +
</gallery>
 +
 
 +
==== Export as VCF ====
 +
This operation exports all visible variations of the layer into a new VCF file. It includes filters meaning that it exports what can be seen on the layer.
 +
 
 +
==== Convert into variable window track ====
 +
This operation converts the Variant Layer into a Microarray/Sequencing Layer. The new windows match the positions of the variation stripes. The score of the new windows can be set to any integer value present into the VCF lines. For haploid genomes, only one layer will be generated. For diploid genomes, the maternal and paternal alleles will be generated over two different layers.
 +
 
 +
==== Apply Genotype ====
 +
Coming soon...

Latest revision as of 16:13, 27 June 2014

Contents

Getting started

Starting GenPlay

GenPlay is freely available at http://www.genplay.net/wiki/index.php/Web_Start To start the software, click the button corresponding to the amount of memory that you wish to allocate to the Java virtual machine.

The amount of memory determines how many layers you will be able to load simultaneously. The programming philosophy behind GenPlay is to provide fast performances once the data is loaded. To achieve that goal the entire genome need to be loaded in memory for multiple layers at the same time. This results in high quality performance, but requires a lot of memory. The amount of memory needed per layer depends on the genome, the layer type, the window size, the data precision etc.

You should generally choose as much memory as you can afford on your system (generally about 70% of the total RAM memory that exists on your system). For mammalian genomes we recommend allocating at least 4 GB of RAM although you should be able to load a couple of genome-wide layers with 1GB or 1.5GB of RAM. Selecting analysis of only one chromosome at a time will drastically reduce the memory requirement and should allow you to load many layers at very high resolutions. Layers loaded in GenPlay can also be compressed as explained later in this documentation.

The amount of RAM memory available to GenPlay is displayed in the lower right corner of the screen.

The Welcome screen

The welcome screen is the first screen of GenPlay-MG and allow user to create or to load a project.

New Project

In order to create a new project, users must give it a name.

Text field to define the project name

The precision of the project will change the number of bits used to code numbers.

  • High-Precision: Numbers are coded using 32 bits which offers the highest precision level in GenPlay.
  • Low-Precision: Numbers are coded using 16 bits. It may be useful to lower memory usage. However, the maximum score is 65504 and decimals may be rounded in a different way (here for more information).
Project precision

The second step is to choose a reference genome. Users can choose it using the different list according to the clade, the genome and the assembly.

Assembly chooser

Several chromosomes are available for each assembly but users can choose to select only some of them.

To open the chromosome chooser, users have to click on the tools button next to the assembly name.

Chromosome chooser

The third and last step is to choose between a Simple Genome Project and a Multi Genome Project. If the multi genome project option is selected, the welcome screen should be as the one shown in figure below.

Empty welcome screen for multi-genome project
Single Genome Project

The Single Genome Project is the most common/regular project in GenPlay. If you do not know or understand yet what the Multi Genome Project is, please use the Single Genome Project.

Multi Genome Project
Introduction
VCF Files

VCF files describe differences between genomes. Usually, it concerns differences between one or several genomes of interest and the reference genome used for the mapping process. VCF files define multiple type of variations; GenPlay is able to read and represent the followings:

  • InDels
  • SNPs
  • SV (Structural Variation)

A complete description of VCF files is given on the 1000 genomes project website:

Variant Call Format specification

Tabix
1. Introduction

VCF files contain a lot of information which makes the scanning (loading) processes longer.

In order to increase the scanning efficiency, VCF files have to be compressed and indexed. The compression is done using BGZip and the indexing with Tabix.

Tabix manual reference pages

Tabix download

2. VCF files indexing methods
2.1. Using GenPlay

GenPlay is now able to compress and index VCF files using the VCF Loader.

The way the VCF Loader works is explained below. When you want to select the compressed file (.vcf.gz), simply select the VCF file (.vcf) instead. You may need to change the file extension filter in the file chooser in order to see .vcf files.

GenPlay will look then for compressed/indexed files at the same location, if nothing is found, it will offer to compress and index the selected VCF file (Figure 1).

Figure 1: VCF Loader compress/index

It is fully automatic and non-platform dependent (works on Windows, Linux and Mac).

2.2. Manually

First, please note the following process must be performed in either Linux or Mac environments.

Each VCF files must be first compress to a BGZF (.bgz file) format. Tabix provides a tool to perform the compression. After compression, VCF files must be indexed using the associated command. Once Tabix is installed, two commands are necessary to perform the indexation.

Available commands from the Tabix folder:

bgzip -f VCF_PATH;

tabix –p vcf VCF_PATH;

For example, a VCF file named my_vcf.vcf located in the same folder as Tabix can be indexed with the following commands (Figure 2):

bgzip -f ./my_vcf.vcf;

tabix –p vcf ./my_vcf.vcf.gz;

Figure 2: VCF file indexation command

Note: the first command replaces the current VCF file by the compressed VCF file (.vcf.gz). The second command creates the indexed VCF file in the current folder (.vcf.gz.tbi).

More options are available on Tabix manual reference pages.

The VCF Loader
1. Introduction

The VCF Loader is the most important part of multi-genome project settings. It allows users to load all necessary VCF files and to define how to extract information from them. It appears when users click on the "Edit" button from the welcome screen.

The Figure 3 shows an empty VCF Loader screen.

Figure 3: VCF loader

GenPlay-MG does not use directly the VCF file, it uses a compress version of it (.gz). Moreover, GenPlay-MG also needs the compress VCF file to be indexed with Tabix. Both file versions must be in the same folder and must have the same name, only file extensions differ (.gz and .tbi). In order to use GenPlay to generate additional files, please refer to the section above.

The user can add or remove rows by right clicking on the table.

2. Columns description

File

This column refers to the VCF file path. Once loaded, the raw name column is automatically filled with every raw genome name contained in the selected VCF file.

Raw name

The Raw name column list is automatically filled when a VCF file has been chosen. That list contains every genotype headers contained inside the selected VCF file. Because Genome names might be difficult to remembers, GenPlay-MG offers users the option of adding another name (an alias) using the Genome column.

Nickname

The Nickname column allows users to associate an alias to the selected genome. This alias will appear in GenPlay-MG and can be useful because genome names in VCF files are often non descriptive numbers that can be hard to remember.

Group

Users can gather genomes by group. Group names are used to distinguish genomes and to perform some specific functionalities.

3. Columns edition

Group, Nickname and File column have their own editable list.To edit a cell, click on it, go over the item you want to edit and choose one of the following action:

- Add (green symbol on empty item)

- Edit (pen symbol on an item)

- Delete (red symbol on an item)

That way, users can set up all columns before starting (or at the same time) to fill the table.

Note: The Raw name(s) column is automatically filled with genome name from the selected VCF file, that column cannot be edited manually.

Import/Export

Once a project has been set up, it can be saved using the import/export function. Pressing the export button saves an XML files to the hard drive. This XML file can then be imported to reload the project.

The XML file structure is simple. Each row are stored in row mark containing every attribute names such as group, genome, file and raw_name. The settings file is formatted as shown in Figure 4.

Figure 4: XML file settings

Note: If the user moves the VCF files or changes one of its genotype headers, the XML file will not work anymore. User has to modify file and/or raw_name attribute values.

Load Project

Load an existing project

In order to load a project, the user has to select the "Load an existing project" option.

The list of the 5 last projects shows on the lower part of the dialog. An additional option "Other" will let the user select a GenPlay project file to load.

The upper part updates automatically when selecting a project in order to remind the following information:

  • Name: The name of the project.
  • Precision: The precision of the project, either high or low.
  • Genome: The genome used.
  • Project type: The type of project, either single or multi-genome.
  • Last modified: The last time the project has been modified.
  • Track number: The number of track in the project.

GUI Overview

GUI Overview 1.Ruler 2.Track List 3.Control Panel 4.Status Bar

GenPlay main window is divided in 4 main parts:

  1. Ruler
  2. Track List
  3. Control Panel
  4. Status Bar

Ruler

The ruler shows the coordinates of the current displayed position.

Ruler 1.Option Button 2.Absolute Positions 3.Relative Positions

General Option Button

The button on the left of the ruler opens the pop-up menu with all the general options.

Absolute Positions

The numbers written in red on top of the ruler are the absolute position on the selected chromosome or scaffold.

The number on the left is the position of the first displayed base. This value can be negative.

The number in the middle is the position of the red line. This value can go from 0 to the length of the current chromosome or scaffold as specified in the chromosome configuration file.

The value on the right is the last displayed position. This value range from 1 to 2*(chromosome length).

Relative Positions

The numbers written in black on the second line represent the distance from the middle in base pair.

Track List

The track list is the cornerstone of the GUI. From here you can load layers and execute operations.

The tracks are divided into two parts.

On the left, there is the track handler that becomes highlighted when the mouse is over it. By right clicking on the track handler, a contextual menu appears with all the operations that can be executed on the track and its layer(s).

On the right, the data can be visualized.

Control Panel

Control Panel 1.Position Bar 2.Zoom Bar 3.Chromosome Box 4.Position Text Field

The control panel is divided into 4 parts:

  1. Position Bar: the position bar allows you to change the position of the current displayed windows
  2. Zoom Bar: use the zoom bar to modify the level of zoom
  3. Chromosome Box: set the selected chromosome with the chromosome box
  4. Position Text Field: the position text field follows the format of the UCSC Genome Browser position field so it is easy to copy and paste the position from one browser to the other

Status Bar

Status Bar 1.Progress Bar 2.Stop Button 3.Operation Description 4.Memory Bar

The status bar helps monitor the progress of the current operation as well as memory usage. It is divided into 4 sub-components:

  1. Progress bar, shows the level of completion of the current operation
  2. Stop button, allows users to stop the current operation. If the button is not bright red the operation can't be stopped
  3. Operation description, displays a short text describing the current operation as well as the elapsed time from the beginning of the operation
  4. Memory bar, shows the amount of memory used and the amount of memory available. Make sure that you have enough memory before starting a new operation. You can delete or compress layers to free up memory.

Browsing the Genome

Changing the Position

You can change the position of the displayed window by:

  1. Dragging any track on the left or on the right with the left button of the mouse
  2. Clicking with the middle button of the mouse inside a track and then moving the cursor on the left or on the right of the middle red line
  3. Moving the knob of the position bar on the control panel
  4. Changing the value of the position text field on the control panel
  5. Using the keyboard left and right arrows
  6. Double-clicking on a track where you want to center the view

Switching Chromosome

You can switch the selected chromosome by:

  1. Changing the selection in the chromosome box on the control panel
  2. Changing the text of the position text field on the control panel

Changing the Zoom

The level of the zoom can be modified by:

  1. Wheeling up or down inside a track with the mouse wheel
  2. Using the zoom bar on the control panel
  3. Changing the text of the position text field on the control panel

Loading a Layer

Introduction

The layers are the way to show information from files. They can represent information in different manners.

A layer is created from a track, each track can contain one or several layers.

To load a layer in a track, right click on its handler (the blue part on the left of the track). This opens a contextual menu with the different actions available on the track.

The menu of a track empty of layer looks like the one in figure 1.

By clicking "Add Layer" appears a dialog to select one of the different layer type GenPlay offers (Figure 2).

Examples of layers that can be loaded in GenPlay are available for download from the GenPlay Library accessible from the GenPlay.net website.

Loading a Sequencing/Microarray Layer

The Sequencing/Microarray layer allows the visualization of windows of variable/fix sizes with a score associated to these windows. Select the “Sequencing/Microarray Layer” option. This opens up a file chooser dialog box. Load the file of your choice from the list of available window files and click the open button.

Please refer to the File formats section if you want to know what kind of file can be loaded as a sequencing/microarray layer.

This opens a new dialog to set different parameters for the new layer (as shown on the figure below). The dialog is separated in 6 sections detailed below.

New Layer Settings Dialog

Layer Name

Gives a name to the layer.

Bin

By default, the windows generated in sequencing/microarray layer have a variable size. It represents very precisely the content of the file.

For some other purposes, users may want to have fixed windows size. They are useful to represent the results of many types of experiments including, but not limited to: CHIP-seq, RNA seq, and TimEX-seq. Files containing the results of alignments (SAM, bowtie, Eland) and files containing already created bin lists (bed, bgr, etc.) can be loaded using this option. In the case of alignment files, bin lists will be created on the fly as described below. Files containing the results of micro-array experiments can also be loaded as long as they are in one of the accepted formats.

It lowers the resolution but usually offers better memory usage.

This is implemented here by enabling the "Bin Data" option. The "Bin Size" field will then be available in order to give the size of the windows in base pairs.

Important Note: A bin size of 1 bp will use a lot of memory. According to the experiment, it may be more efficient to disable the bin data option and stay in variable window size mode.

Score Calculation

Name and Score Calculation

It can happen that files contain overlapping windows. In this case, GenPlay splits them into smaller windows using a simple algorithm.

This algorithm can be chosen in that section offering the following possibilities:

  • Addition
  • Average
  • Maximum
  • Minimum

Some examples are shown in the sections below for both non bined and bined layers.

Strand

If your input file contains information regarding the strands, you'll be able to choose to load the data from either both or only one strand.

You can also decide to shift the reads from both strands as shown in the figure on the left. To shift the strands just put a value in the "Shift" input box.

The value you entered is going to be added to the position of the data on the 5' strand and subtracted from the ones on the 3' strand.

Fragment Length

Selected Chromosomes

By default all the chromosomes of the project are selected. If you want to change this selection, click on the "modify selection" button and uncheck the undesired chromosomes. Working on fewer chromosomes will save memory and loading time.

Important Note: GenPlay can accelerate the loading if you know that your file is sorted by chromosome. If you press Yes when GenPlay asks you if the file is sorted when your file is actually not sorted, the file may load incompletely, leading to a loss of valuable information. The chromosomes must be ordered the same way it is ordered in the chromosome selection combo-box.

Examples of Score Calculations

For non bined layer
Example 1

Input file

Chr Start Stop Score
Chr1 1125 1126 1
Chr1 1135 1136 1
Chr1 1135 1136 1
Chr1 1149 1150 1
Chr1 1175 1176 1
Chr1 1210 1211 1
Chr1 1230 1231 1
Chr1 1340 1341 1
Chr1 1345 1346 1


Result

Loading of an alignment file as a variable window layer




Example 2
Chr Start Stop Score
Chr1 1020 1120 30
Chr1 1120 1300 120
Chr1 1010 1350 100


Loading of an interval file as a variable window layer


Result

Chr Start Stop Average Maximum Sum
Chr1 1010 1020 100 100 100
Chr1 1020 1120 (100 + 30) / 2 = 65 Max(100, 30) = 100 100 + 30 = 130
Chr1 1120 1300 (100 + 120) / 2 = 110 Max(100, 120) = 120 100 + 120 = 220
Chr1 1300 1350 100 100 100
For binned layer
Example 1

Loading of an alignment file as a fixed window layer with a window size of 100:

(each line represents one read position, score is always one)

Input file

Chr Start Stop Score
Chr1 1125 1126 1
Chr1 1135 1136 1
Chr1 1135 1136 1
Chr1 1149 1150 1
Chr1 1175 1176 1
Chr1 1210 1211 1
Chr1 1230 1231 1
Chr1 1340 1341 1
Chr1 1345 1346 1


Loading of an alignment file as a fixed window layer with a window size of 100


Result

Chr Start Stop Average Maximum Sum
Chr1 1000 1100 1 1 5
Chr1 1100 1200 1 1 2
Chr1 1200 1300 1 1 2




Example 2

Loading of an alignment file as a fixed window layer with a window size of 100:

(each line represents one read position, score varies)

Input file

Chr Start Stop Score
Chr1 1125 1126 1
Chr1 1135 1136 3
Chr1 1145 1146 1
Chr1 1149 1150 1
Chr1 1175 1176 1
Chr1 1210 1211 1
Chr1 1230 1231 1
Chr1 1340 1341 6
Chr1 1345 1346 1


Loading of an alignment file as a fixed window layer with a window size of 100


Result

Chr Start Stop Average Maximum Sum
Chr1 1000 1100 7 / 5 = 1.4 3 7
Chr1 1100 1200 1 1 2
Chr1 1200 1300 7 / 2 = 3.5 6 7




Example 3

Loading of an interval file as a fixed window layer with a window size of 100:

Input file

Chr Start Stop Score
Chr1 1020 1120 30
Chr1 1120 1300 120
Chr1 1010 1350 100


Loading of an interval file as a fixed window layer with a window size of 100


Result

Chr Start Stop Average Maximum Sum
Chr1 1000 1100 (26.47 + 24) / 2 = 25.23 Max(26.47, 24) = 26.47 26.47 + 24 = 50.47
Chr1 1100 1200 (29.41 + 6 + 60) / 3 = 31.80 Max(29.41, 6, 60) = 60 29.41 + 6 + 60 = 95.41
Chr1 1200 1300 (29.41 + 60) / 2 = 44.70 Max(29.41 +60) = 60 29.41 +60 = 89.41
Chr1 1300 1400 14.70 14.70 14.70

Loading a Gene Annotation Layer

A Gene Layer
Score Color

Select the “Gene Layer" option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a gene layer.

Once it's done, just wait until the loading is complete and the gene layer will appear in the track you selected.

Note that the genes on the plus strand are in red and the genes on the minus strand are in blue. If the file contains expression values, the exons are color coded to represent the expression (red = high, blue = low, as shown on the right).

Loading a Repeat Family Layer

Select the "Repeat Layer" option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a repeat layer.

This layer type displays repeats organized by family or class.

Loading a DNA Sequence Layer

Select the “DNA Sequence Layer” option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a sequence layer.

A Sequence Layer

Sequence layers show DNA sequences from .2bit files.

The hg18, hg19, mm8 and mm9 sequence files can be downloaded from the library of GenPlay.

Loading a Mask Layer

Select the "Mask Layer" option. The stripes acting as masks can be useful to show regions of interest such as CpG Islands or repeat regions.

Check the File Formats section out if you need to know what kind of file can be loaded as a stripes.

Loading a Variant Layer

Add a Variant Layer

Add a Variant Layer

Select the "Variant Layer" option, this option is only available in multi-genome projects. This will pop up a new dialog to select which sample the user wants to load, and which variation(s). A variant layer is according to only one sample. It is also possible to change the colors of each variation independently by clicking on the colored square next to the variation checkbox.

Multi-Genome Features

Select Coordinate System
Coordinate System chooser

The coordinate system of GenPlay can be changed by selecting one on the list located on the bottom right of the main frame. The default system is the one of the Meta Reference Genome; the Reference Genome coordinate system is also available. The user can also choose the one of any of the loaded genome. This does not affect operation, only the red position numbers on the top of the frame as well as the position search bar on the bottom.

Multi-Genome Project Properties
Properties Dialog Button

In Multi-Genome Projects only, a new button appears on the bottom left of the frame. This button leads to the Multi-Genome Project Properties dialog allowing the user to visualize and handle the project settings. Right-clicking on the button opens a contextual menu offering shortcuts to the different sections of the properties dialog.

General
General Section

The General section is an overview of how the project has been loaded. Projects can be very complex, using many files and samples. This section reminds the user how the project has been set up.

Settings
Settings Section

The Settings section lets the user choose how to handle multi-genome various options.

  • Properties Dialog
    • Default section to open: the default section of the Multi-Genome Project Properties dialog to open when clicking the button.
  • VCF Loader
    • Default group text name: Default name for groups.
  • Stripes transparency: Sets the transparency of stripes reprensenting variations.
  • Global display settings
    • Show legend: Allow to show the enabled variations and their colors into the track layer.
  • Variant stripes settings
    • Show filtered variation: Filtered variations can be shown but will be represented with a cross over their stripes.
    • Show border of insertion: Insertion stripes have a specific border, it may help to recognize them easily when many layers are loaded, independantly of the color.
    • Show border of deletion: Deletion stripes have a specific border, it may help to recognize them easily when many layers are loaded, independantly of the color.
    • Show nucleotides of insertion stripes: Added nucleotides will be retrieved from the VCF files if possible.
    • Show nucleotides of deletion stripes: Deleted nucleotides will be retrieved from the VCF files if possible.
    • Show nucleotides of SNP stripes: SNP nucleotides will be retrieved from the VCF files if possible.
  • Reference stripes settings
    • Show reference stripes: Stripes representing the reference genome can be either shown or hidden.
    • Reference stripes color: Defines a color for reference stripes.
Files

The Files section lists all the VCF files loaded into GenPlay. Their information are separated into two categories:

  • Information: the information part shows the name and the location of the file. It also segments the header of the VCF file for an easy reading and interpretation.
  • Statistics: This part gives various descriptive statistics of the file and for each sample. All tables can be copied and pasted as regular text tab-delimited.
Filters

The filters section is covered in the section below.

Loading Data From a DAS Server

The distributed annotation system (DAS) is a client-server system in which a client can retrieve data from one or multiple servers. GenPlay can connect to any server that follows the DAS/1 protocol as specified by BioDAS

DAS Dialog

The “Add Layer from DAS Server” option from the track handler menu will show the DAS Dialog.

Select the server from which you want to retrieve the data in the "Server" box.

Then select the "Data Source". Most of the time, the Data Source corresponds to the reference genome that you want to work on.

Once that's done you need to select the data that you want to retrieve in the "Data Type" box.

GenPlay can either generate a gene layer or a variable window layer from the retrieved data. You can select what type of output layer you want in the "Generate" option.

Finally, you can also choose to download data on only a part of the genome. This can be useful because retrieving data from a DAS server can be time consuming.

Note: The DAS server section shows how to add new servers to the list of available servers in the DAS dialog.

Main Menu

Main Menu

On GenPlay’s main screen, click on the top left button (shown by a little hammer and wrench) to pop up the main menu.

New Project

This will pop up the welcome screen in order to start a new project. All work not saved will be lost.

Load / Save Project

This menu allows you to load or to save a whole GenPlay project in a space efficient binary compressed format. When you load a GenPlay project, all the tracks and layers of your current project will be replaced by the ones from the loaded project and all the information that hasn't been saved will be lost. Important Note: The GenPlay project files may be dependent on the version of GenPlay you're using. Be sure to remember with which version of GenPlay you saved a project and use the same version next time you load your project.

Full Screen

Click on this item from the main menu to toggle the full screen mode. When the full screen mode is on, the control panel and the status bar are hidden.

You can also toggle the full screen mode by pressing the F11 key.

Warnings report

This option will pop up the Warnings report dialog in order to consult previous and current alerts.

Option

The option menu item allows you to modify the configuration of GenPlay. Please refer to the section Changing the configuration of GenPlay for further information.

RNA To DNA Reference

This option allows you to transformed the coordinate system of the result of a RNA-Seq experiment based on alignment to a transcriptome (for instance all refseq genes), to a genomic coordinate system.

You need two files in order to use this functionality.

  1. The result of the RNA-Seq experiment, called "Coverage File" in GenPlay. This file must be in bedGraph file format.
  2. An annotation file in bed format.

Two output files can be generated:

  1. A bedGraph file with the position based on a reference genome
  2. A annotation GdpGene file

Here is an example: Coverage File:

NM_000016	0	413	0
NM_000016	413	456	1
NM_000016	456	471	2
NM_000016	471	488	3
NM_000016	488	494	2
NM_000016	494	504	3

Annotation File:

chr1	76190042	76229353	NM_000016	0	+	76190472	76228448	0	12	460,88,98,70,101,81,131,109,141,96,249,977,	0,4043,8286,8495,9170,10433,15622,21448,25061,26093,36764,38334,

The result as a bedGraph file is:

chr1	76190455	76190498	43.0
chr1	76190498	76190502	8.0
chr1	76194085	76194096	22.0
chr1	76194096	76194113	51.0
chr1	76194113	76194119	12.0
chr1	76194119	76194129	30.0

And the result as a GdpGene file is:

NM_000016	chr1	+	76190042	76229353	76190042,76194085,76198328,76198537,76199212,76200475,76205664,76211490,76215103,76216135,76226806,76228376	76190502,76194173,76198426,76198607,76199313,76200556,76205795,76211599,76215244,76216231,76227055,76229353	667888.95,1506024.1,0,0,0,0,0,0,0,0,0,0

Help and About GenPlay

The help and the about GenPlay options open a browser showing respectively the documentation and about pages of GenPlay website.

Exit

This option closes the application after asking for confirmation.

Changing the Configuration of GenPlay

Option Menu

Click on the option item of the main menu to open the configuration screen.

General Options

The following screen lets you set the general options.

The Default Directory lets the user choose which folder to open by default for any of the file chooser within GenPlay.

From this screen, you can also modify the appearance of the software by changing the look & feel.

Track Option

The Number of Tracks text box defines the maximum number of tracks that can be loaded on GenPlay.

The Default Track Height text box defines the height of each of the tracks.

The Undo Count text box defines the number of operations that can be undone. Note that the higher the number of undos selected, the more memory will be required.

The reset option allows the user to easily reset a layer in order to come back as if it has been freshly loaded.

The legend showing layers name on the upper right of a track can also be enabled or disabled.

DAS Server

The DAS server option shows the list of existing DAS servers along with the URL where these servers are located. It also provides the options to add new servers and remove existing servers.

GenPlay can communicate and retrieve data from the servers implementing the DAS/1 protocol

Restore Default

The Restore Default configuration restores everything back to the factory settings.

File Formats

The different file formats used in GenPlay are described on this page.

Using Tracks

Track Menu

Handling Tracks

Moving a Track

To move a track up or down in the track list, just click on the track handler (the left part of the track with the track number) and drag the track to the desired position.

Inserting a Track

To insert a track, right click on the track handler of the track right under where you want to insert and choose the "Insert" option.

Deleting a Track

To delete, select a track and click on the delete option of the contextual menu or press Delete on the keyboard.

Copying, Cutting and Pasting a Layer

Track Menu

To copy layers, select the desired track where the layers are and click on the copy option in the contextual menu or press CTRL+C. A new window will appear showing all layers that can be copied. The user has to select all layers he wants to copy and then click "Ok".

To cut layers, select the desired track where the layers are and click on the cut option in the contextual menu or press CTRL+X.

To paste a track, select the track where you want to paste and click on the paste option in the contextual menu or press CTRL+P.

A track can be pasted into a text file in which case the data of the active layer will be pasted as text (the pasted text will be limited to the genomic range currently displayed). It can be pasted in an image editor in which case an image of the track will be pasted. It can also be pasted in a file explorer in which case the layer will be saved as a GPTF (GenPlay Track File) or pasted in an other GenPlay track in which case the copied layers will be added to the track.

Taking a Screenshot of the Track

To take a screenshot, select a track and choose the "Save as Image" option in the contextual menu.

Saving an Entire Track

To save an entire track with all its layers in the GenPlay format (GPTF GenPlay Track File), select a track and choose the "Save Track" option in the contextual menu.

Please note that the track will only be able to be loaded on project with the exact same assembly (which means that the meta reference should be the same in a Multi-Genome project.

Using the Undo / Redo / Reset Options

The undo, redo and reset options are only available for the Variable and Fixed Window layers. They are accessible from the contextual menu when you right click on the track handler.

The number of undo and redo operations available can be specified as described in the Track Option section. Note that this operations are memory consuming and reducing the number of undo / redo available can save memory.

The reset operation restore the track to the way it was right after being loaded. A reset operation can also be undone.

Track/Layer Settings

General

Track Settings - General
Basic Options
  • Name: The name of the track.
  • Height: The height of the track.
Axis Options
  • Show horizontal lines: Split the track horizontally.
  • Horizontal line count: Number of horizontal lines, equally separated.
  • Show vertical lines: Split the track vertically.
  • Vertical line count: Number of vertical lines, equally separated.
Score Options
  • Minimum Score: The minimum score to show.
  • Maximum Score: The maximum score to show.
  • Auto-rescaled: Enable the automatic score rescaling.
  • Score Position: Choose where the score is shown (top/bottom).
  • Score Color: Set the font color of the score.

Layers

Track Settings - Layers
  • Name: Click on the name to edit it.
  • Type: The type of layer.
  • Color: Click to edit the color of the layer.
  • Graph Type: Click to change the graph type:
    • Curve
    • Points
    • Bar
    • Dense
  • Visible: Show/hide the layer.
  • Active: Set the layer as "active". The active layer as direct interaction with the mouse pointer and clicks.
  • Set For Deletion: If set, the layer(s) will be deleted when clicking "Ok".

Operations

Once a layer is loaded, a right click on the location of the track handler opens a popup menu as shown in the figure below.

Operation Menu

The Operation sub-menu of the popup menu contains all the actions that you can use on the selected layer.

Sequencing/Microarray Layer Operations

Bin-ed and non bin-ed layers do not have all the same operations. They share most of them but some are specific.

Common operations

Show History

Show the history of the layer, every changes that have been made since loaded.

Constant Operation
Operation With Constant

Thes operations use one constant in the following ways:

  • Addition: adds the constant to each window (F(x) = x + constant).
  • Subtraction: substracts the constant to each window (F(x) = x - constant).
  • Multiplication: multiplies the score by the constant(F(x) = x * constant).
  • Division: divides the score by the constant (F(x) = x / constant).
  • Inversion: inverts the score of each windows (F(x) = constant / x).
  • Unique Score: sets all windows to an unique score (F(x) = constant).

The function can also be applied to null windows by checking the box.

Two Layers Operation

This allows operations between two Sequencing/Microarray layers, bin-ed and non bin-ed.

In order to set the operations, few windows appear in the following order:

  1. A first window appears in order to select the second layer.
  2. The second window asks in which track the resulting layer will be put.
  3. The third and last window offers the algorithms to complete the operation (x1: score first layer; x2: score second layer):
  • Addition: add scores (x = x1 + x2).
  • Subtraction: substract scores (x = x1 - x2).
  • Multiplication: multiply scores (x = x1 * x2).
  • Division: divide scores (x = x1 / x2).
  • Average: average score (x = (x1 + x2) / 2).
  • Maximum: keeps the highest score.
  • Minimum: keeps the lowest score.

Note: The only way the resulting layer would be a bin-ed layer is to make an operation between two bin-ed layer having the same bin size. Any other case will result in a non bin-ed layer.

Index

Indexation can be useful to compare multiple layers at the same scale. It "re-scales" existing scores to a new range defined by the user.

If scores go from 10 to 600 but for some reason would need to be observed between 0 and 100, this operation will do the work.

It will first ask for the new minimum and the new maximum. The next dialog asks to perfom the re-scaling by chromosome independently or genome wide.

Using the previous example, for a new scale of [0; 100] if the first chromosome as a maximum score of 600 and the second one has a maximum score of 800; 800 will become the reference value of 100 for both chromosomes if the operation is processed genome wide. If the operation is processed by chromosome independently, 600 will become the reference value of 100 for the first chromosome, and 800 for the second chromosome.

Since this operation uses the minimum and maximum scores, it is very important to note that indexing does not work well in the presence of outliers. Indexing works best if outliers are eliminated or removed first using a filter (see below).

Log
Logarithm Bases

For each window, the log operation applies the function f(x) = log(x), where x is the window score. The base of the logarithm function can be selected between either 2 (binary log), e (natural log) or 10 (common log).

Normalize
Normalization Coefficient

After a normalize operation the score of each window is divided by the result of the Score Count operation and multiplied by a specified fixed value. By default, after normalization the scores are expressed per 10 millions reads.

Standard Score

Calculates the standard score for the selected layer i.e. (x - avg) / stdev; where x is the score, avg is the average score of the layer and stdev is the standard deviation of the scores of the layer.

Show Statistics

Shows the minimum, maximum, average scores per chromosome and genome wide. Also shows the number of windows, the sum of the window lengths with a non zero score and the sum of the scores (normalized by the window lengths).

Filter

GenPlay provides four different filters:

Percentage Filter
Percentage Filter

This option filters the X% lowest values and the Y% greatest values where X and Y are two decimals and where X + Y <= 100. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).

Threshold Filter
Threshold Filter

This option removes the values that are lower than X OR greater than Y, where X and Y are two specified threshold values. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).

Band-Stop Filter
Band-Stop Filter

This option removes values between two specified threshold.

Count Filter
Count Filter

This option filters the X lowest values and the Y greatest values, where X and Y are two specified integers. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).

Transfrag

This operation aggregates the windows of the selected layer that are separated by a gap smaller than a specified size (in bp).

The score of the new window can be the sum, the average or the maximum of the scores of the aggregated windows.

Score Distribution Histogram

The show repartition operation generates a graph showing the distribution of the scores of the selected layers. The options for the type of plot are score v/s window count and score v/s base pair count.

The user needs to choose a size for the bins of scores. The graphics will show, depending on the selection, how many windows or how many base pair there is for each bin of scores.

Convert Layer

This operation converts the current layer into another layer among the following:

  • Gene Annotation Layer
  • Microarray/Sequencing Layer bin/non-bin
  • Mask Layer

Non-Binned Layers Only

CG Methylation Profile

This operation computes the methylation values on CG sequences by combining the value on the C position and the value on the G position.

The result is a list of windows covering the CG sequences and having the sum of the score on the C and the score on the G base.

This is based on data fron a sequence layer in order to find CG sequences.

Binned Layers Only

Smooth

The smooth operation can be processed according to the 3 following algorithms:

Gauss Smoothing
Sigma Value

This operation applies a Gaussian filter to the layer, depending on the sigma value provided by the user.

G(x) = (1 / v (2?) s) * e-x2 / 2 s2

Where, x is the score and s is the standard deviation of the layer.

You can choose the extrapolate option to "fill" the windows with a score of zero.

Loess Smoothing

This operation computes the Loess regression of degree 1 on the selected layer.

For each x value where a y value is to be calculated, the Loess technique performs a regression on points in a moving range around the x value, where the values in the moving range are weighted according to their distance from this X value.

The Loess regression is a smoothing function. You will need to precise the half size of the moving window on which the regression will be computed.

The weight function of the Loess regression is computed as follow: W(i) = (1 - X(i)^3)^3, where X(i) is the normalized distance: current distance / maximum distance among points in the moving regression.

You can choose the extrapolate option to "fill" the windows with a score of zero.

Moving Average Smoothing

For each window of the layer, compute the average on a region of a specified size center on the window and score the window with the result of this average. The half-size of the region is prompted prior to the calculation.

You can choose the extrapolate option to "fill" the windows with a score of zero.

Find Peaks

The find peak operation offers three different algorithms that can be used to find the peaks:

Standard Deviation Peak Finder
Standard Deviation Peak Finder

The standard deviation peak finder prompts the user to enter two parameters.

The parameter ‘S’ specifies the number of windows to be considered for each window on either side in order to calculate the standard deviation.

For example, if S = 10, it means that for each window we consider 10 windows to the left and 10 windows to the right to calculate the standard deviation.

For a window to be accepted, its standard deviation needs to be at least ‘T’ times greater than the value of the standard deviation of the chromosome.

Density Peak Finder
Density Peak Finder

The Density Finder works as follows:

The parameter ‘S’ specifies the number of windows to be considered for each window on either side of the window under consideration.

For the window under consideration to be accepted, at least ‘P’ percentage of values must be above the high threshold ‘H’ or at least ‘P’ percentage of values must be below the low threshold ‘L’.

Island Finder
Island Finder

The Island Finder is based on the algorithm described in the paper Zang, C., Schones, D. E., Zeng, C., Cui, K., Zhao, K., and Peng, W. (2009). A clustering approach for identification of enriched domains from histone modification chip-seq data. Bioinformatics (Oxford, England), 25(15):1952-1958.

The parameters window value and gap of the island finder are the parameters ‘l0’ and ‘g’ respectively. The island score allows the user to select the scores greater than or equal to a particular value. The island length parameter allows the user to select islands encompassing at least specified number of windows. There are two result types:

  • Start values: Depicts only those islands that are selected and removes the ones that are rejected.
  • Island score: Depicts the islands by considering the score.
  • Island Summit: Depicts the island with the summit of the input island as a score.
Correlation
Correlation Report

The correlation operation computes the Pearson’s correlation between the score values of two layers. The two layers need to have the same bin size. The following formula is used to calculate the correlation:

? = ( ? xi yi – n x’ y’) / ((n - 1) sx sy)

Where:

  •  ? is the Pearson’s correlation
  • xi and yi are the scores of the layers
  • n is the number of values
  • x’ and y’ are the means of the scores of the layers
  • sx and sy are the standard deviations of the scores of the layers

The figure on the right shows a correlation report.

Note: The correlation is computed only on the windows that are different from zero on both layer. If one of the layer has a zero value window, the window of the other layer with the same coordinate will be skipped as well.

Density

This operation generates a new fixed window layer where the score of the windows represent the density of non null windows in the neighborhood of the windows. You first need to enter the size S of the neighborhood. For each window W, the algorithm count how many of the S windows before W and the S windows after W have a score different from zero. This value is then divided by 2 * S + 1 and the result is the score of W.

Intervals Scoring

This operation needs two layers:

  • The selected layer that defines the scores
  • A second layer that defines the intervals

This operation generates a new layer containing the intervals of the "interval track". For each interval the algorithm then looks at the corresponding scores in the score layer, and compute either the maximum, the average or the sum of all the scores that fall in the interval. This value is the new score value in the result layer.

You can also choose to use only a certain percentage of the greatest scores that falls in the interval.

Concatenate
Select Layers to Concatenate

The concatenate operations allows you to generate a file containing the scores of multiple fixed window layers that have the same bin size. The output file contains the following fields:

  1. chromosome
  2. start position
  3. stop position
  4. score layer 1
  5. score layer 2
  6. score layer 3
  7. ...

Gene Layer Operations

Directly on a gene layer, you can:

  1. Double click on a gene to open a web page describing the gene. Make sure that your input file contains a geneDBURL line as described in the File Formats section in order to enable this option.
  2. Put the mouse over a gene to have some information about the name and the score of the gene. If the exons of the gene have different scores you can put your mouse over an exon to have the exon score.

Score Count

This operation count the sum of all scores.

A window asks first to select chromosomes to include in the calculation (all by default).

Average

This operation computes the average of all scores.

A window asks first to select chromosomes to include in the calculation (all by default).

Count Genes

This operation count the total number of genes.

A window asks first to select chromosomes to include in the calculation (all by default).

Count Genes with Non-Null Score

This operation count the total number of genes excluding the ones with a score of 0.

A window asks first to select chromosomes to include in the calculation (all by default).

Count Exons

This operation count the total number of exons.

A window asks first to select chromosomes to include in the calculation (all by default).

Search Gene

Search Gene

Use this option to search a gene on the selected layer by typing the name of the gene.

Check the Match Case option if you want the search to be case sensitive. Check the whole word option if you want to search genes where the input match the whole name of the gene. Press next or previous to find respectively the next or previous gene found. You can also open the Find Gene dialog by pressing CTRL+F after selecting a gene layer.

Extract Intervals

Extract Intervals

This option allows you to extract intervals defined relatively to the beginning, the end or the middle of a gene and to generate a new gene layer showing these intervals.

You can, for example, defined promoters as regions that starts 100bp before the beginning of genes and that ends 150bp after the beginning of genes. This option would allow you to generate a new layer from this parameters.

Extract Exons

Extract Exons

This option generate a new gene layer showing only the exons of the genes of the selected layer.

You can choose between the three following options:

  1. Extract the first exon of the genes
  2. Extract the last exon
  3. Extract all the exons

Unique Score

Unique Score

This operation sets the same score for all exons.

Score Exons

Score Exons

To execute this operation you need to have at least one microarray/sequencing layer loaded. For each exon of each gene of the selected gene layer, this operation computes a new score based on the window score from the selected layer that falls into the exon. There are 3 different ways to compute the new score:

  • Base Coverage Sum
  • Maximum coverage
  • RPKM

Filter

This option provides four different filters for gene layers:

Percentage Filter
Percentage Filter

This option filters the genes with the X% lowest overall score and the Y% greatest overall scores where X and Y are two decimals and where X + Y <= 100. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).

Threshold Filter
Threshold Filter

This option filters the genes with an overall score that are lower than X OR greater than Y, where X and Y are two specified threshold values. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).

Band-Stop Filter
Band-Stop Filter

This option removes the genes with an overall score between two specified threshold.

Count Filter
Count Filter

This option filters the X lowest scored genes and the Y greatest scored genes, where X and Y are two specified integers. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).

Filter Strand

You need to select a strand when prompted. At the end of the operation the layer will contain only the genes on the selected strand. All the other genes will have been removed.

Rename Genes

This operation allows you to change the name of the genes. You need to provide a text file where each line contains the current gene name and the new gene name separated by a tabulation. Every time a gene with a name from the first column is found this name will be replace by the new gene name from the second column.

Distance Calculation

Development in progress, coming soon.

Score Repartition Around Start

You first need to select a Fixed window layer containing the scores. After that, you need to select the chromosomes on which you want to execute the operation. You also need to specify a bin size S, a bin count C and a method for the calculation of the scores.

The operation will create C bins on each side of the start position of each gene. The size S of each bin is in base-pair. Depending of the method of calculation chosen the operation is going to compute the sum, the maximum or the average of the scores for each corresponding bin from each gene and display a bar graph of the result. The data can be exported by right-clicking on the graph and using the "save as" function.

Multi-curve graph can be generated using the following procedure:

To generate a comparison between 2 fixed-window layers: 1) Perform an analysis for the first layer as described above. 2) Save it to your hard drive. 3) Close the graph window. 4) Perform the same analysis on the second layer. 4) Right click on the second graph and choose the load data option. 5) Load the first analysis. Colors of the curves, type of graphs (bar, points, curve) and scale can be adjusted by right-clicking on the graph. Procedure can be used to load more than two graphs. To produce more complex graphs we recommend loading the saved data on your favorites spreadsheet software. Score Repartition Around Start

Repeat Layer Operations

Convert Into Mask

This operation can be used to convert a repeat layer into a mask layer. The user will be prompted to select the families of repeats that should be included in the conversion. The result layer will contain all the selected repeat families.

DNA Sequence Layer Operations

Compare Sequences

This operation takes two sequence layer in input and generate a variable window layer showing the differences between the two sequence layers. For each position where the sequences are different the result layer will show a window of 1bp with the following score:

Nucleotide of the 1st layer Nucleotide of the 2nd layer Score
A C 12
A G 13
A T 14
C A 21
C G 23
C T 24
G A 31
G C 32
G T 34
T A 41
T C 42
T G 43

Mask Layer Operations

Apply Mask

Applying a mask means filtering the data that are not inside the windows of the mask.

All information overlapping a mask window will be kept, everything else will be lost.

Invert Mask

This operation simply inverts all windows of the mask. All current windows become empty spaces, all empty spaces become windows.

Variant Layer Operations

Edit Variant Layer

Edit Variant Layer Dialog

This feature will popup the same window used to load the Variant Layer offering the possibility to change the variation types to show.

Generate track statistics

This operations generates various statistics about loaded information.

It also compares these statistics before and after applying any filters in order to see their effects.

Filters

Filters can be applied on Variant Layers, they interact directly on data found in the VCF in order to select on data of interest. All filters are set in the Filters section of the Multi-Genome Project Properties dialog.

Simply click on "Add" in order to create a new filter. As shown below, a new window appears to define the filter.

Filter selection dialog
  • Layer(s): The layers affected by the filter.
  • File: A filter is also file specific, if data to filter are separated over different files, several filters must be created.
  • ID: A filter can be set on any ID defined on the header of the VCF. IDs can be of different types which affects the selection of the next steps.
  • Genome(s): Any "FORMAT" ID will require to know which genome(s) is/are concerned by the filter.
  • Operator: If more than one genome has been selected in the previous step, the operator will decide how the result from each genome will be processed in order to have a result for the whole line.
    • And: The selected ID value from each genome must pass the filter.
    • Or: At least one selected ID value must pass the filter.
    • Sum: If the selected ID value is an integer, the sum value from each genome will be filtered.
    • Mean: If the selected ID value is an integer, the mean value from all genomes will be filtered.
  • Filter: This filter panel will change according to the selected ID type.
    • String: The input value will be tested and the user has to choose if the value must be present or must not be present in the ID value.
    • Number: The ID value is here tested using one of the given numeric operator against an input value. The ID value can also be tested against two input value using the secong part of the filter, the user then has to choose how both filters are handled.
    • Flag: When the ID value is a flag, it reacts as boolean, meaning the value is here, or is not.
    • Genotype: The genotype ID has a special filter editor in order to set it up more easily. The regular string editor can be found below. The genotype can be homozygote/heterofygote/phased/unphased.

Export as VCF

This operation exports all visible variations of the layer into a new VCF file. It includes filters meaning that it exports what can be seen on the layer.

Convert into variable window track

This operation converts the Variant Layer into a Microarray/Sequencing Layer. The new windows match the positions of the variation stripes. The score of the new windows can be set to any integer value present into the VCF lines. For haploid genomes, only one layer will be generated. For diploid genomes, the maternal and paternal alleles will be generated over two different layers.

Apply Genotype

Coming soon...