Difference between revisions of "Documentation"

From GenPlay, Einstein Genome Analyzer

Jump to: navigation, search
(Loading data from a DAS server)
(Loading data from a DAS server)
Line 180: Line 180:
 
Finally, you can also choose to download data on only a part of the genome. This can be useful because retrieving data from a DAS server can be time consuming.
 
Finally, you can also choose to download data on only a part of the genome. This can be useful because retrieving data from a DAS server can be time consuming.
  
'''Note:''' The [[#DAS server|DAS server] section shows how to add new servers to the list of available servers in the DAS dialog.
+
'''Note:''' The [[#DAS server|DAS server]] section shows how to add new servers to the list of available servers in the DAS dialog.
  
 
=== Loading stripes ===
 
=== Loading stripes ===

Revision as of 15:46, 1 December 2010


Starting GenPlay

GenPlay is freely available at http://www.genplay.net/wiki/index.php/Web_Start To start the software, click the button corresponding to the amount of memory that you wish to allocate to the Java virtual machine.

This amount of memory determines how many tracks you will be able to load at the same time. The programming philosophy behind GenPlay is to provide extremely fast performances once the data are loaded. To achieve that goal the entire genome can be loaded in memory for multiple tracks at the same time. This results in really nice performances but the cost is a requirement for a lot of memory. The amount of memory needed per track depends on the genome, the track type, the window size, the data precision etc.

You should generally choose as much memory as you can afford on your system (generally about 70% of the total RAM memory that exists on your system). For mammalian genomes we recommend allocating at least 4 GB of RAM although you should be able to load a couple of genome-wide tracks with 1GB or 1.5GB of RAM. Selecting analysis of only one chromosome at a time will drastically reduce the memory requirement and should allow you to load many tracks at very high resolutions. Tracks loaded in GenPlay can also be compressed (see below.)

The amount of RAM memory available to GenPlay is displayed in the lower right corner of the screen

GUI Overview

GUI Overview 1.Ruler, 2.Track List, 3.Control Panel, 4.Status Bar

GenPlay main window is divided in 4 main parts:

  1. Ruler
  2. Track List
  3. Control Panel
  4. Status Bar

Ruler

The ruler display the current displayed position.

Ruler 1.Option Button, 2.Absolute Positions, 3.Relative positions





Absolute positions

The numbers written in red on top of the ruler are the absolute position on the selected chromosome or scaffold.

The number on the left is the position of the first displayed bases. This value can be negative.

The number in the middle is the position of the red line. This value can go from 0 to the length of the current chromosome or scaffold as specified in the chromosome configuration file.

The value on the right is the last displayed position. This value range from 1 to 2*(chromosome length).

Relative position

The numbers written in black on the second line represent the distance from the middle in base pair.

General Option Button

The button on the left of the ruler opens the popup-menu with all the general options.

Track List

The track list is the corner stone of the GUI. It's where you can load your tracks and execute operations.

The tracks are divided into two parts, on the left there is the track handler that becomes highlighted when the mouse is over it. A right click on the track handler pops up a contextual menu with all the operation that can be executed on the track.

The right part of the track is where the data can be visualized.

Control Panel

Control Panel 1.Position Bar, 2.Zoom Bar, 3.Chromosome Box, 4.Position Text Field

The control panel is divided into 4 parts:

  1. The position bar: the position bar allows you to change the position of the current displayed windows
  2. The zoom bar: use the zoom bar to modify the level of zoom
  3. The chromosome box: set the selected chromosome with the chromosome box
  4. The position text field: the position text field follows the format of the UCSC genome browser position field so it's easy to copy and paste the position from one browser to the other.

Status Bar

Status Bar 1.Progress Bar, 2.Stop Button, 3.Operation Description, 4.Memory Bar

The status bar helps you to monitor the progress of the current operation as well as the memory usage. It is divided into 4 sub-components:

  1. Progress bar, shows the level of completion of the current operation
  2. Stop button, allows you to stop the current operation. If the button is not bright red the operation is not stoppable
  3. Operation description, displays a short text describing the current operation as well as the elapsed time from the beginning of the operation
  4. Memory bar, shows the amount of memory used and the amount of memory available. Make sure that you have enough memory before starting a new operation. You can delete tracks to free some memory.

Browsing the genome

Changing the position

You can change the position of the displayed windows by:

  1. Dragging any track on the left or on the right with the left button of the mouse
  2. Click with the middle button of the mouse inside a track and then moving the cursor on the left or on the right of the middle red line
  3. Changing the position of the position bar of the control panel
  4. Changing the value of the position text field of the control panel
  5. Using the keyboard left and right arrows

Changing the chromosomes

Switching the selected chromosome can be done by:

  1. Changing the selection in the chromosome box of the control panel
  2. Changing the text of the position text field of the control panel

Changing the zoom

The level of the zoom can be modified by:

  1. Wheeling up or down inside a track with the mouse wheel
  2. Using the zoom bar of the control panel
  3. Changing the text of the position text field of the control panel

Loading a track

To load a track in any row, right click on the handler of any empty track (the blue part with a number on the left of the track). This opens a menu including options to load the various types of tracks that exist in GenPlay.

Loading a Track

Example of tracks that can be loaded in GenPlay can be downloaded from the GenPlay Library accessible from the GenPlay.net website.

Loading a variable window track

Variable window tracks allow the visualization of windows of variable sizes with a score associated to this windows.

Select the “Load Variable Window Track” option. This opens up a file chooser dialog box. Load the file of your choice from the list of available fixed window files and click the open button.

File Chooser

Please refer to the File formats section if you want to know what kind of file can be loaded as a variable window track.

Chromosome Selection

Then a new windows is going to appear and to ask which chromosome to extract. By default all the chromosomes of the project are selected. If you want to change this selection click on the "modify selection" button and uncheck the undesired chromosomes. Working on less chromosomes will save memory and loading time.

Select Chromosomes

Important Note: When specific chromosomes are selected, GenPlay works accurately only if the files that are loaded are sorted by chromosomes. Unsorted files may load incompletely, leading to loss of valuable information.

Score Calculation

Once it's done, a last window will pop-up and ask you to name the track. The default name is the file name without its extension. On the same window, but only if there is some overlapping windows in your input file, you'll have to tell GenPlay what to do with this overlapping windows. Overlapping windows are split into smaller windows using a simple algorithm.

Name and Overlapping

Loading Fixed Window Track

Fixed window tracks display bin lists are useful to represent the results of many types of experiments including CHIP-seq, RNA seq, TimEX-seq etc. Files containing the results of alignment (SAM, bowtie, Eland) and files containing already created bin lists (bed, bgr, etc.) can be loaded using this option. In the case of alignment files, bin lists will be created on the fly as described below. Files containing the results of micro-array experiments can also be loaded as long as they are in one of the accepted format.

Once the track contextual menu pops up, select the “Load Fixed Window Track” option. This opens up a file chooser dialog box as shown in the figure below.

File Chooser

Load the track of your choice from the list of available fixed window tracks and click the open button. Please refer to the File formats section if you want to know what kind of file can be loaded as a fixed window track.

Track Name

Fixed Window Track Options

Name of the track. The default name will be the file name without the extension. The name of the track can also be change later after track is loaded.

Window Size

This specifies the size in base pair (bp) of the genomic windows (the bins) that will be created to summarize the results.

Score Calculation

This option allows you to choose how the scores of the bins are calculated. You may choose between three options: average, maximum or sum. The algorithm of the score calculation is explained bellow.

Strand Selection

If your input file contains information regarding the strand you'll be able choose to load either only the data from the 3', only the data from the 5' or the data from both strands. You can also decide to shift the reads from both strands as shown in the figure bellow.

Data precision

Because GenPlay requires a lot of RAM memory, we provide the option of changing the precision at which the score for each bin is stored.

  • Scores in 64 bit are stored in floating value double precision (which can represent extremely large numbers unlikely to be useful for genomic experiments).
  • Scores in 32 bit are stored in floating value single precision (which can also represent very large number). Scores stored in 16 bits can range between - 3267.8 and +3267.7 (with one decimal digit).
  • Scores stored in 8 bits can range between 0 and 255 (with no decimal).
  • Score in 1 bit can be equal to zero or 1 (useful to create masks for instance).

We recommend storing scores in 32 or 16 bits

Chromosome selection

Either the whole genome can be loaded or only specific chromosomes (which saves time and memory).

Important Note: When specific chromosomes are selected, GenPlay works accurately only if the files that are loaded are sorted by chromosomes. Unsorted files may load incompletely, leading to loss of valuable information.

When the OK button is clicked, the track is loaded as below in the location desired.

Loading a gene track

A Gene Track
Score Color

Select the “Load Gene Track” option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a gene track.

Once it's done, just wait until the loading is complete and the gene track will appear in the track you selected. Note that the genes on the plus strand are in red and the genes on the minus strand are in blue. If the file contains expression values, the exons are color coded to represent the expression (red = high, blue = low as shown on the right).

Loading a sequence track

Select the “Load Sequence Track” option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a sequence track.

These kinds of tracks show a DNA sequence from .2bit files. The hg18, hg19, mm8 and mm9 sequence files can be downloaded from the library of GenPlay.

Loading a SNP track

Select the “Load SNP Track” option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a SNP track.

A SNP track shows the Single-Nucleotide Polymorphisms.

Loading a repeat track

Select the “Load Repeat Track” option on the track contextual menu. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a repeat track.

This track type display repeats organized by family or class.

Loading data from a DAS server

The distributed annotation system (DAS) is a client-server system in which a client can retrieve data from one or multiple server. GenPlay can connect to any server that follows the DAS/1 protocol as specified by BioDAS

DAS Dialog

The “Load from DAS Server” option from the track contextual menu will show the DAS Dialog.

Select the server from which you want to retrieve the data in the "Server" box.

Then you need to select the "Data Source". Most of the time the Data Source correspond to the reference genome that you want to work on.

Once that's done you need to select the data that you want to retrieve in the "Data Type" box.

GenPlay can either generate a gene track or a variable window track from the retrieve data. You can select what type of output track you want in the "Generate" option.

Finally, you can also choose to download data on only a part of the genome. This can be useful because retrieving data from a DAS server can be time consuming.

Note: The DAS server section shows how to add new servers to the list of available servers in the DAS dialog.

Loading stripes

This operation loads the stripes along the start and stop positions of the genes. It can be used to superimpose on a track to coincide with its start and stop positions. As the figure below indicates, the width of a stripe is equal to the difference between the stop position and the start position of the gene.

Main Menu

Main Menu

On GenPlay’s main screen click on the top left button (shown by a little hammer and spanner) to pop up the main menu.

Load / Save Project

This menu allow you to load or to save a whole GenPlay project in a really HDD-space efficient binary compressed format. When you load a project of GenPlay all the track of your current project will be replaced by the one from the project you loaded and all the information that hadn't be saved will be lost. Important Note: The GenPlay project files may be dependent to the version of GenPlay you're using. Be sure to remember with which version of GenPlay you saved a project and use the same version next time you load your project.

Full Screen

Click on this item of the main menu to toggle the full screen mode. When the full screen mode is on, the control panel and the status bar are hidden. You can also toggle the full screen mode by pressing the F11 key.

Option

The option menu item allows you to modify the configuration of GenPlay. Please refer to the section Changing the configuration of GenPlay for further information.

RNA To DNA Reference

Help and About GenPlay

The help and the about GenPlay options open a browser showing respectively the documentation and about pages of GenPlay website.

Exit

This option closes the application after asking for confirmation.

Changing the configuration of GenPlay

Click on the option item of the main menu to open the configuration screen.

Option Menu

General options

The following screen let you set the general options:

File:General options.png
General Options

The Default Directory lets you specify where the files containing GenPlay tracks will be stored in your file system.

The Log File is a text file that contains a time-stamped history of the files extracted and loaded on GenPlay.

From this screen, you can also modify the appearance of the software by changing the look&feel.

Configuration files

The configuration files screen allows the user to change the zoom file as well as the genome configuration file. It is necessary to restart GenPlay after modifying this option in order to take them into account.

Configuration Files

Zoom file

The Zoom configuration file is a file that contains the predefined levels of zooming. To change this levels of zooming just create a text file with one level of zooming (in bp) per line order from the smallest to the greatest. Here is an exemple:

10
100
1000
10000
100000
1000000
10000000
100000000

Genome file

Once GenPlay is started a configuration file describing the genome that you want to analyze (the default is human hg19). Configurations are simple text file that specify the name and length of the chromosome or scaffold of the current genome. Configuration files for human and mouse recent assembly can be downloaded from the GenPlay library accessible from the GenPlay.net web page (please see below). Configuration files for any genome can easily be created in any word processor using the provided examples as a model. Here is an example of genome file:

chr1	249250621
chr5	180915260
chr13	115169878
chrX	155270560
chrY	59373566

Track option

Track Option

The Number of Tracks text box defines the maximum number of tracks that can be loaded on GenPlay.

The Default Track Height text box defines the height of each of the tracks.

The Undo Count text box defines the number of operations that can be undone. Note that the higher the number of undo you select, the more memory will be required.

DAS server

DAS Server Option

The DAS server option shows the list of existing DAS servers along with the URL where these servers are located. It also provides options to add new servers and remove existing servers.

GenPlay can communicate and retrieve data from the servers implementing the DAS/1 protocol

Restore default

The Restore Default configuration restores everything back to the factory settings.

File formats

The different file formats used in GenPlay are described on this page.

Manipulating tracks

Move a track

To move a track, just click on the track handler (the left part of the track with the track number) and drag the track to the desired position.

Insert a track

Copy, cut and paste a track

Insert or delete a track

Rename a track

Set the height of a track

Change the number of vertical lines displayed

Take a screenshot of the track

Show / hide stripes

Using the operations