Difference between revisions of "Documentation"

From GenPlay, Einstein Genome Analyzer

Jump to: navigation, search
(Loading a track)
Line 91: Line 91:
  
 
When the OK button is clicked, the track is loaded as below in the location desired (in this case track 1).
 
When the OK button is clicked, the track is loaded as below in the location desired (in this case track 1).
 +
 +
== Loading a gene track ==
 +
 +
== Loading a sequence track ==
 +
 +
== Loading a SNP track ==
 +
 +
== Loading a repeat track ==
 +
 +
== Loading data from a DAS server ==
 +
 +
== File formats ==
 +
 +
= Browsing the genome =
 +
 +
= Using the operations =

Revision as of 18:10, 22 November 2010


Starting GenPlay

To start GenPlay, click the button corresponding to the amount of memory that you wish to allocate to the Java virtual machine. This amount of memory determines how many tracks you will be able to load at the same time. The programming philosophy behind GenPlay is to provide extremely fast performances once the data are loaded. To achieve that goal the entire genome can be loaded in memory for multiple tracks at the same time. This results in really nice performances but the cost is a requirement for a lot of memory. The amount of memory needed per track depends on the genome, the track type, the window size, the data precision etc. You should generally choose as much memory as you can afford on your system (generally about 70% of the total RAM memory that exists on your system). For mammalian genomes we recommend allocating at least 4 GB of RAM although you should be able to load a couple of genome-wide tracks with 1GB or 1.5GB of RAM. Selecting analysis of only one chromosome at a time will drastically reduce the memory requirement and should allow you to load many tracks at very high resolutions. Tracks loaded in GenPlay can also be compressed (see below.) The amount of RAM memory available to GenPlay is displayed in the lower right corner of the screen

Changing the configuration of GenPlay

On GenPlay’s main screen click on the top left button (shown by a little hammer and spanner) to pop up the following screen

Option Menu

General options

The following screen let you set the general options:

File:General options.png
General Options

The Default Directory lets you specify where the files containing GenPlay tracks will be stored in your file system.

The Log File is a text file that contains a time-stamped history of the files extracted and loaded on GenPlay.

From this screen, you can also modify the appearance of the software by changing the look&feel.

Configuration files

The configuration files screen allows the user to change the zoom file as well as the genome configuration file. It is necessary to restart GenPlay after modifying this option in order to take them into account.

Configuration Files

Zoom file

The Zoom configuration file is a file that contains the predefined levels of zooming. To change this levels of zooming just create a text file with one level of zooming (in bp) per line order from the smallest to the greatest. Here is an exemple:

10
100
1000
10000
100000
1000000
10000000
100000000

Genome file

Once GenPlay is started a configuration file describing the genome that you want to analyze (the default is human hg19). Configurations are simple text file that specify the name and length of the chromosome of the current genome. Configuration files for human and mouse recent assembly can be downloaded from the GenPlay library accessible from the GenPlay.net web page (please see below). Configuration files for any genome can easily be created in any word processor using the provided examples as a model.

Track option

Track Option

The Number of Tracks text box defines the maximum number of tracks that can be loaded on GenPlay.

The Default Track Height text box defines the height of each of the tracks.

The Undo Count text box defines the number of operations that can be undone. Note that the higher the number of undo you select, the more memory will be required.

DAS server

DAS Server Option

The DAS server option shows the list of existing DAS servers along with the URL where these servers are located. It also provides options to add new servers and remove existing servers.

GenPlay can communicate and retrieve data from the servers implementing the DAS/1 protocol

Restore default

The Restore Default configuration restores everything back to the factory settings.

Loading a track

To load a track in any row, right click on any of the tracks numbered 1, 2, 3 etc. This opens a menu including options to load the various types of tracks that exist in GenPlay.

Example of tracks that can be loaded in GenPlay can be downloaded from the GenPlay Library accessible from the GenPlay.net website.

Loading a variable window track

In fixed window track the entire genome is divided into bins of equal size and the data is summarized inside each bin (see above). Variable windows tracks allow the visualization of bins of variable sizes. Overlapping bins are split into smaller bins using a simple algorithm.

Select the “Load Variable Window Track” option. This opens up a File Chooser Dialog Box as shown in the figure below. Load the track of your choice from the list of available fixed window tracks (CD36-H3K36me3_Cui_2009.bgr in this example) and click the open button.


Loading Fixed Window Track

Fixed window tracks display bin lists which are useful to represent the results many types of experiments including CHIP-seq, RNA seq, TimEX-seq etc. Files containing the results of alignment (SAM, bowtie, Eland) and files containing already created bin lists (bed, sgr, etc.) can be loaded using this option. In the case of alignment files, bin lists will be created on the fly as described below. Files containing the results of micro-array experiments can also be loaded as long as they are in one of accepted format.

Once the menu pops up, select the “Load Fixed Window Track” option. This opens up a File Chooser Dialog Box as shown in the figure below. Load the track of your choice from the list of available fixed window tracks (CD36-H3K36me3_Cui_2009.bgr in this example) and click the open button.

Window Size

This specifies the size of the genomic windows (the bins) that will be created to summarize the results.

Score Calculation

The figure below illustrates how score calculation is done using different methods for a window size of 1000.

Data precision

Because GenPlay requires a lot of RAM memory, we provide the option of changing the precision at which the score for each bin is stored.

  • Scores in 64 bit are stored in floating value double precision (which can represent extremely large numbers unlikely to be useful for genomic experiments).
  • Scores in 32 bit are stored in floating value single precision (which can also represent very large number). Scores stored in 16 bits can range between - 3267.8 and +3267.7 (with one decimal digit).
  • Scores stored in 8 bits can range between 0 and 255 (with no decimal).
  • Score in 1 bit can be equal to zero or 1 (useful to create masks for instance).

We recommend storing scores in 32 or 16 bits

Chromosome selection

Either the whole genome can be loaded or only specific chromosomes (which saves time and memory).

Important Note: When specific chromosomes are selected, GenPlay works accurately only if the files that are loaded are sorted by chromosomes. If they are not sorted, then it may result in the loss of valuable information when the track is loaded onto GenPlay. In addition to being sorted by chromosomes, BED and BEDGraph files need to be sorted by the chromosome start values too.

When the OK button is clicked, the track is loaded as below in the location desired (in this case track 1).

Loading a gene track

Loading a sequence track

Loading a SNP track

Loading a repeat track

Loading data from a DAS server

File formats

Browsing the genome

Using the operations