Documentation

From GenPlay, Einstein Genome Analyzer

Revision as of 11:45, 11 June 2011 by Bouhassi (talk | contribs) (Score Repartition Around Start)
Jump to: navigation, search


Contents

Starting GenPlay

GenPlay is freely available at http://www.genplay.net/wiki/index.php/Web_Start To start the software, click the button corresponding to the amount of memory that you wish to allocate to the Java virtual machine.

The amount of memory determines how many tracks you will be able to load simultaneously. The programming philosophy behind GenPlay is to provide fast performances once the data is loaded. To achieve that goal the entire genome need to be loaded in memory for multiple tracks at the same time. This results in high quality performance, but requires a lot of memory. The amount of memory needed per track depends on the genome, the track type, the window size, the data precision etc.

You should generally choose as much memory as you can afford on your system (generally about 70% of the total RAM memory that exists on your system). For mammalian genomes we recommend allocating at least 4 GB of RAM although you should be able to load a couple of genome-wide tracks with 1GB or 1.5GB of RAM. Selecting analysis of only one chromosome at a time will drastically reduce the memory requirement and should allow you to load many tracks at very high resolutions. Tracks loaded in GenPlay can also be compressed as explained later in this documentation.

The amount of RAM memory available to GenPlay is displayed in the lower right corner of the screen.


GUI Overview

GUI Overview 1.Ruler 2.Track List 3.Control Panel 4.Status Bar

GenPlay main window is divided in 4 main parts:

  1. Ruler
  2. Track List
  3. Control Panel
  4. Status Bar


Ruler

The ruler shows the coordinates of the current displayed position.

Ruler 1.Option Button 2.Absolute Positions 3.Relative Positions


General Option Button

The button on the left of the ruler opens the pop-up menu with all the general options.

Absolute Positions

The numbers written in red on top of the ruler are the absolute position on the selected chromosome or scaffold.

The number on the left is the position of the first displayed base. This value can be negative.

The number in the middle is the position of the red line. This value can go from 0 to the length of the current chromosome or scaffold as specified in the chromosome configuration file.

The value on the right is the last displayed position. This value range from 1 to 2*(chromosome length).

Relative Position

The numbers written in black on the second line represent the distance from the middle in base pair.

Track List

The track list is the cornerstone of the GUI. From here you can load tracks and execute operations.

The tracks are divided into two parts.

On the left, there is the track handler that becomes highlighted when the mouse is over it. By right clicking on the track handler, a contextual menu appears with all the operations that can be executed on the track.

On the right, the data can be visualized.


Control Panel

Control Panel 1.Position Bar 2.Zoom Bar 3.Chromosome Box 4.Position Text Field

The control panel is divided into 4 parts:

  1. Position Bar: the position bar allows you to change the position of the current displayed windows
  2. Zoom Bar: use the zoom bar to modify the level of zoom
  3. Chromosome Box: set the selected chromosome with the chromosome box
  4. Position Text Field: the position text field follows the format of the UCSC Genome Browser position field so it is easy to copy and paste the position from one browser to the other

Status Bar

Status Bar 1.Progress Bar 2.Stop Button 3.Operation Description 4.Memory Bar

The status bar helps monitor the progress of the current operation as well as memory usage. It is divided into 4 sub-components:

  1. Progress bar, shows the level of completion of the current operation
  2. Stop button, allows users to stop the current operation. If the button is not bright red the operation can't be stopped
  3. Operation description, displays a short text describing the current operation as well as the elapsed time from the beginning of the operation
  4. Memory bar, shows the amount of memory used and the amount of memory available. Make sure that you have enough memory before starting a new operation. You can delete or compress tracks to free up memory.




Browsing the Genome




Changing the Position

You can change the position of the displayed window by:

  1. Dragging any track on the left or on the right with the left button of the mouse
  2. Clicking with the middle button of the mouse inside a track and then moving the cursor on the left or on the right of the middle red line
  3. Moving the knob of the position bar on the control panel
  4. Changing the value of the position text field on the control panel
  5. Using the keyboard left and right arrows
  6. Double-clicking on a track where you want to center the view




Changing the Chromosomes

You can switch the selected chromosome by:

  1. Changing the selection in the chromosome box on the control panel
  2. Changing the text of the position text field on the control panel




Changing the Zoom

The level of the zoom can be modified by:

  1. Wheeling up or down inside a track with the mouse wheel
  2. Using the zoom bar on the control panel
  3. Changing the text of the position text field on the control panel




Loading a Track

To load a track in any row, right click on the handler of any empty track (the blue part on the left of the track). This opens a menu including options to load the various types of tracks that exist in GenPlay.

Loading a Track

Examples of tracks that can be loaded in GenPlay are available for download from the GenPlay Library accessible from the GenPlay.net website.

Loading a Variable Window Track

File Chooser

Variable window tracks allow the visualization of windows of variable sizes with a score associated to these windows.

Select the “Load Variable Window Track” option. This opens up a file chooser dialog box. Load the file of your choice from the list of available fixed window files and click the open button.

Please refer to the File formats section if you want to know what kind of file can be loaded as a variable window track.

Chromosome Selection

Chromosome Selection

After selecting your file, a new window will appear and ask which chromosome to extract. By default all the chromosomes of the project are selected. If you want to change this selection, click on the "modify selection" button and uncheck the undesired chromosomes. Working on fewer chromosomes will save memory and loading time.

Important Note: GenPlay can accelerate the loading if you know that your file is sorted by chromosome. If you press Yes when GenPlay asks you if the file is sorted when your file is actually not sorted, the file may load incompletely, leading to a loss of valuable information. The chromosomes must be ordered the same way it is ordered in the chromosome selection combo-box.

Score Calculation

Name and Score Calculation

Once the chromosome selection is done, a final window will pop-up and ask you to name the track. The default name is the loaded file name. If there are overlapping windows in your data file, you will also be prompted to select a method for calculating the score of the windows. Overlapping windows will be split into smaller windows using a simple algorithm.

Examples of Score Calculations




Example 1

Input file

Chr Start Stop Score
Chr1 1125 1126 1
Chr1 1135 1136 1
Chr1 1135 1136 1
Chr1 1149 1150 1
Chr1 1175 1176 1
Chr1 1210 1211 1
Chr1 1230 1231 1
Chr1 1340 1341 1
Chr1 1345 1346 1

Result


Loading of an alignment file as a variable window track



Example 2
Chr Start Stop Score
Chr1 1020 1120 30
Chr1 1120 1300 120
Chr1 1010 1350 100


Loading of an interval file as a variable window track


Result

Chr Start Stop Average Maximum Sum
Chr1 1010 1020 100 100 100
Chr1 1020 1120 (100 + 30) / 2 = 65 Max(100, 30) = 100 100 + 30 = 130
Chr1 1120 1300 (100 + 120) / 2 = 110 Max(100, 120) = 120 100 + 120 = 220
Chr1 1300 1350 100 100 100




Loading Fixed Window Tracks

File Chooser

Fixed window tracks display bin lists. They are useful to represent the results of many types of experiments including, but not limited to: CHIP-seq, RNA seq, and TimEX-seq. Files containing the results of alignments (SAM, bowtie, Eland) and files containing already created bin lists (bed, bgr, etc.) can be loaded using this option. In the case of alignment files, bin lists will be created on the fly as described below. Files containing the results of micro-array experiments can also be loaded as long as they are in one of the accepted formats.

By right clicking on an empty track handler, the contextual menu will pop up. Select the “Load Fixed Window Track” option. This opens up a file chooser dialog box as shown in the figure on the left.

Load the track of your choice from the list of files and click the open button. Please refer to the File formats section if you want to know what kind of file can be loaded as a fixed window track.


Track Name

Fixed Window Track Options

The default track name will be the file name. The name of the track can be changed later after the track is loaded.


Window Size

This specifies the size of the genomic windows (bins) in base pair (bp) for the track that will be created to summarize the results.


Score Calculation

This option allows you to choose how the scores of the bins are calculated. You may choose between three options: average, maximum or sum. The algorithm of the score calculation is explained below.


Strand Selection

Strand Shifting

If your input file contains information regarding the strands, you'll be able to choose to load the data from either both or only one strand.

You can also decide to shift the reads from both strands as shown in the figure on the left. To shift the strands just put a value in the "Shift" input box.

The value you entered is going to be added to the position of the data on the 5' strand and subtracted from the ones on the 3' strand.



Data Precision

Because GenPlay requires a lot of RAM memory, we provide the option of changing the precision at which the score for each bin is stored.

  • Scores in 64 bit are stored in floating value double precision (which can represent extremely large numbers unlikely to be useful for genomic experiments).
  • Scores in 32 bit are stored in floating value single precision (which can also represent very large numbers).
  • Scores stored in 16 bits can range between - 3267.8 and +3267.7 (with one decimal place).
  • Scores stored in 8 bits can range between 0 and 255 (with no decimal).
  • Score in 1 bit can be equal to zero or 1 (useful to create masks for instance).

We recommend storing scores in 32 or 16 bits.


Chromosome Selection

You can load either the whole genome or only specific chromosomes (which saves time and memory).

Important Note: When specific chromosomes are selected, you will be prompted to tell if you file is sorted by chromosome. If you answer that your file is sorted by chromosome when actually it is not your file may load incompletely, leading to a loss of valuable information. The chromosomes must be ordered the same way it is ordered in the chromosome selection combo-box.

When the OK button is clicked, the track is loaded in the location desired.


Examples of Score Calculations




Example 1

Loading of an alignment file as a fixed window track with a window size of 100:

(each line represents one read position, score is always one)

Input file

Chr Start Stop Score
Chr1 1125 1126 1
Chr1 1135 1136 1
Chr1 1135 1136 1
Chr1 1149 1150 1
Chr1 1175 1176 1
Chr1 1210 1211 1
Chr1 1230 1231 1
Chr1 1340 1341 1
Chr1 1345 1346 1


Loading of an alignment file as a fixed window track with a window size of 100


Result

Chr Start Stop Average Maximum Sum
Chr1 1000 1100 1 1 5
Chr1 1100 1200 1 1 2
Chr1 1200 1300 1 1 2






Example 2

Loading of an alignment file as a fixed window track with a window size of 100:

(each line represents one read position, score varies)


Input file

Chr Start Stop Score
Chr1 1125 1126 1
Chr1 1135 1136 3
Chr1 1145 1146 1
Chr1 1149 1150 1
Chr1 1175 1176 1
Chr1 1210 1211 1
Chr1 1230 1231 1
Chr1 1340 1341 6
Chr1 1345 1346 1


Loading of an alignment file as a fixed window track with a window size of 100


Result

Chr Start Stop Average Maximum Sum
Chr1 1000 1100 7 / 5 = 1.4 3 7
Chr1 1100 1200 1 1 2
Chr1 1200 1300 7 / 2 = 3.5 6 7






Example 3

Loading of an interval file as a fixed window track with a window size of 100:

Input file

Chr Start Stop Score
Chr1 1020 1120 30
Chr1 1120 1300 120
Chr1 1010 1350 100


Loading of an interval file as a fixed window track with a window size of 100


Result

Chr Start Stop Average Maximum Sum
Chr1 1000 1100 (26.47 + 24) / 2 = 25.23 Max(26.47, 24) = 26.47 26.47 + 24 = 50.47
Chr1 1100 1200 (29.41 + 6 + 60) / 3 = 31.80 Max(29.41, 6, 60) = 60 29.41 + 6 + 60 = 95.41
Chr1 1200 1300 (29.41 + 60) / 2 = 44.70 Max(29.41 +60) = 60 29.41 +60 = 89.41
Chr1 1300 1400 14.70 14.70 14.70




Loading a Gene Track

A Gene Track
Score Color

After right clicking on the empty track handler, select the “Load Gene Track” option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a gene track.

Once it's done, just wait until the loading is complete and the gene track will appear in the track you selected.

Note that the genes on the plus strand are in red and the genes on the minus strand are in blue. If the file contains expression values, the exons are color coded to represent the expression (red = high, blue = low, as shown on the right).

Loading a sequence track

After right clicking on the empty track handler, select the “Load Sequence Track” option. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a sequence track.

A Sequence Track

Sequence tracks show DNA sequences from .2bit files.

The hg18, hg19, mm8 and mm9 sequence files can be downloaded from the library of GenPlay.


Loading a SNP Track

First, select the “Load SNP Track” option on the track contextual menu. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a SNP track.

A SNP track shows the Single-Nucleotide Polymorphisms.


Loading a Repeat Track

Select the “Load Repeat Track” option on the track contextual menu. This opens up a file chooser dialog box that allows you to select the file that you want to load. Please refer to the File formats section if you want to know what kind of file can be loaded as a repeat track.

This track type displays repeats organized by family or class.


Loading Data From a DAS Server

The distributed annotation system (DAS) is a client-server system in which a client can retrieve data from one or multiple servers. GenPlay can connect to any server that follows the DAS/1 protocol as specified by BioDAS

DAS Dialog

The “Load from DAS Server” option from the track contextual menu will show the DAS Dialog.

Select the server from which you want to retrieve the data in the "Server" box.

Then select the "Data Source". Most of the time, the Data Source corresponds to the reference genome that you want to work on.

Once that's done you need to select the data that you want to retrieve in the "Data Type" box.

GenPlay can either generate a gene track or a variable window track from the retrieved data. You can select what type of output track you want in the "Generate" option.

Finally, you can also choose to download data on only a part of the genome. This can be useful because retrieving data from a DAS server can be time consuming.

Note: The DAS server section shows how to add new servers to the list of available servers in the DAS dialog.


Generating a Multi Curves Track

A Mutli Curves Track

If more than one fixed or variable window tracks are loaded, you can overlay them in a multi curves track. To do so, first select the "Generate Multi Curves Track" in the track contextual menu.

Mutli Curves Dialog

Then a dialog will appear asking you which tracks you want to see in the multi curves track.

The available tracks are in the list on the left of the dialog and the selected track appears in the list on the right. Select a track by clicking on its name and use the left and right arrows in the middle of the screen to toggle a track from one list to the other. Double clicking on track produces the same effect.

The order of the tracks in the right list will determine the order in which the tracks are printed. The track on top of the list will be printed on top the other tracks. You can change the order of the tracks by clicking on the name of a track in the right list and using the up and down arrow in the middle of the dialog.

Note: in order to change the appearance of a the multi curve track, you need to change the appearance of the tracks that appear in the multi curves track.


Loading Stripes

CPG Islands Shown As Stripes On a Refseq Gene Track

By clicking on the "Load Stripes" option of the track contextual menu you can load transparent stripes superimposed on a track. The stripes can be useful to show regions of interest such as CpG Islands or repeat regions.

Check the File Formats section out if you need to know what kind of file can be loaded as a stripes.


Main Menu

Main Menu

On GenPlay’s main screen, click on the top left button (shown by a little hammer and wrench) to pop up the main menu.


Load / Save Project

This menu allows you to load or to save a whole GenPlay project in a space efficient binary compressed format. When you load a GenPlay project, all the tracks of your current project will be replaced by the ones from the loaded project and all the information that hasn't been saved will be lost. Important Note: The GenPlay project files may be dependent on the version of GenPlay you're using. Be sure to remember with which version of GenPlay you saved a project and use the same version next time you load your project.

Important Note 2: In the current GenPlay version, The genome selected in the configuration file is not saved with the project. A project will generally not load and give an error message if the genome kept in memory in the GenPlay temp file is different from the genome used when the project was saved. To change the genome simply go to the upper left corner and access the configuration menu through the options menu.




Full Screen

Click on this item from the main menu to toggle the full screen mode. When the full screen mode is on, the control panel and the status bar are hidden.

You can also toggle the full screen mode by pressing the F11 key.

Option

The option menu item allows you to modify the configuration of GenPlay. Please refer to the section Changing the configuration of GenPlay for further information.


RNA To DNA Reference

This option allows you to transformed the coordinate system of the result of a RNA-Seq experiment based on alignment to a transcriptome  (for instance  all refseq genes),  to a genomic coordinate system.

You need two files in order to use this functionality.

  1. The result of the RNA-Seq experiment, called "Coverage File" in GenPlay. This file must be in bedGraph file format.
  2. An annotation file in bed format.

Two output files can be generated:

  1. A bedGraph file with the position based on a reference genome
  2. A annotation GdpGene file

Here is an example: Coverage File:

NM_000016	0	413	0
NM_000016	413	456	1
NM_000016	456	471	2
NM_000016	471	488	3
NM_000016	488	494	2
NM_000016	494	504	3

Annotation File:

chr1	76190042	76229353	NM_000016	0	+	76190472	76228448	0	12	460,88,98,70,101,81,131,109,141,96,249,977,	0,4043,8286,8495,9170,10433,15622,21448,25061,26093,36764,38334,

The result as a bedGraph file is:

chr1	76190455	76190498	43.0
chr1	76190498	76190502	8.0
chr1	76194085	76194096	22.0
chr1	76194096	76194113	51.0
chr1	76194113	76194119	12.0
chr1	76194119	76194129	30.0

And the result as a GdpGene file is:

NM_000016	chr1	+	76190042	76229353	76190042,76194085,76198328,76198537,76199212,76200475,76205664,76211490,76215103,76216135,76226806,76228376	76190502,76194173,76198426,76198607,76199313,76200556,76205795,76211599,76215244,76216231,76227055,76229353	667888.95,1506024.1,0,0,0,0,0,0,0,0,0,0




Help and About GenPlay

The help and the about GenPlay options open a browser showing respectively the documentation and about pages of GenPlay website.


Exit

This option closes the application after asking for confirmation.

Changing the Configuration of GenPlay

Click on the option item of the main menu to open the configuration screen.

Option Menu

General Options

The following screen lets you set the general options:

File:General options.png
General Options

The Default Directory lets you specify where the files containing GenPlay tracks will be stored in your file system.

The Log File is a text file that contains a time-stamped history of the files extracted and loaded on GenPlay.

From this screen, you can also modify the appearance of the software by changing the look & feel.


Configuration Files

Configuration Files

The configuration files screen allows you to change the zoom file as well as the genome configuration file. It is necessary to restart GenPlay after modifying this option in order for the changes to take effect.


Zoom File

The Zoom configuration file contains the predefined levels of zooming. To change the levels of zoom, just create a text file with one level of zooming (in bp) per line order from the smallest to the greatest. Here is an example:

10
100
1000
10000
100000
1000000
10000000
100000000




Genome File

Once GenPlay is started, a configuration file describing the genome that you want to analyze is loaded (the default is human hg19). Configurations are simple text files that specify the name and length of the chromosomes or scaffolds of the current genome. Configuration files for human and mouse recent genome assembly can be downloaded from the GenPlay library. Genome configuration files form human and mouse come in two options full and basic. Basic only contains the standard chromosome. The full version of the fiels also allow the display of chromosome variants.

Configuration files for any genome can easily be created in any word processor using the provided examples as a model. Here is an example of a genome file:

chr1	249250621
chr5	180915260
chr13	115169878
chrX	155270560
chrY	59373566




Track Option

Track Option

The Number of Tracks text box defines the maximum number of tracks that can be loaded on GenPlay.

The Default Track Height text box defines the height of each of the tracks.

The Undo Count text box defines the number of operations that can be undone. Note that the higher the number of undos selected, the more memory will be required.


DAS Server

DAS Server Option

The DAS server option shows the list of existing DAS servers along with the URL where these servers are located. It also provides the options to add new servers and remove existing servers.

GenPlay can communicate and retrieve data from the servers implementing the DAS/1 protocol


Restore Default

The Restore Default configuration restores everything back to the factory settings.




File Formats

The different file formats used in GenPlay are described on this page.




Manipulating tracks

Track Menu

Moving a Track

To move a track up or down in the track list, just click on the track handler (the left part of the track with the track number) and drag the track to the desired position.


Inserting a Track

To insert a track, right click on the track handler of the track right under where you want to insert and choose the "Insert" option.


Copying, Cutting and Pasting a Track

To copy a track, select the desired track and click on the copy option in the contextual menu or press CTRL+C

To cut a track, select the desired track and click on the cut option in the contextual menu or press CTRL+X

To paste a track, select the empty track where you want to paste and click on the paste option in the contextual menu or press CTRL+P


Deleting a Track

To delete, select a track and click on the delete option of the contextual menu or press Delete on the keyboard.


Renaming a Track

To rename, select a track and click on the rename option of the contextual menu or press the F2 key.


Setting the Height of a Track

To set the height, select a track and click on the set height option of the contextual menu or click on the bottom of a track handler and drag the mouse up or down.


Changing the Appearance of a Track

Track Appearance

To change the appearance of a variable or fixed window track, click on the appearance option of the contextual menu. For any other type of track you can set the number of vertical lines displayed from the contextual menu.



Taking a Screenshot of the Track

To take a screenshot, select a track and choose the "Save as Image" option in the contextual menu.


Showing / Hiding the Stripes

To show stripes on a track, select a track and choose the "Load Stripes" option in the contextual menu. Choose the "Remove Stripes" option to hide the stripes.


Using the Undo / Redo / Reset Options

The undo, redo and reset options are only available for the Variable and Fixed Window tracks. They are accessible from the contextual menu when you right click on the track handler.

The number of undo and redo operations available can be specified as described in the Track Option section. Note that this operations are memory consuming and reducing the number of undo / redo available can save memory.

The reset operation restore the track to the way it was right after being loaded. A reset operation can also be undone.


Compressing a Fixed Window Track

The Fixed Window tracks can also be compressed. To compress a Fixed Window track you need to click on the Compression option of the contextual menu. Compressing a track frees memory but it is not possible to use an operation on a compressed track. Therefore, you need to uncompress the track before using any operation.


Operations

Once a track is loaded, a right click on the location of the track handler opens a popup menu as shown in the figure below.

Operation Menu

The Operation sub-menu of the popup menu contains all the actions that you can use on the selected track.


Variable Window Track Operations




Operations With a Constant (Addition, Subtraction, Multiplication, Division, Invert)

Operation With Constant

These operations add, subtract, multiply, divide the score of each window by a constant value. The invert function inverts the socore of each windows. Clicking on any of these operations opens a dialog box where the user can input the value of the constant in a text field, as shown in the figure (example for addition).



Two Tracks Operation

Two Tracks Operation

This allows basic operations between tracks (fixed and variable window tracks only). It can be useful to subtract background, normalize data with a control track or perform many other track manipulations. The available operations between two tracks are addition, subtraction, multiplication, division, average, minimum, maximum.



Indexation

Indexation can be useful to compare multiple tracks at the same scale. Importantly, indexing does not work well in the presence of outliers. Indexing works best if outliers are eliminated or removed first using a filter (see below). To index the scores of a track based on the greatest and the smallest value of the whole genome you need to choose a new minimum and a new maximum value.


Indexation Per Chromosome

This operation indexes each chromosome separately. Users enter the new minimum and maximum score values in a text field. When the OK button is clicked, the resulting track is displayed.


Log

Logarithm Bases

For each window, the log operation applies the function f(x) = log(x), where x is the window score. The base of the logarithm function can be selected between either 2 (binary log), e (natural log) or 10 (common log).

Log With Damper

For each window, this operation applies the function f(x) = log((x + damper) / (avg + damper)), where x is the window score. The base of the logarithm function can be either 2 (binary log), e (natural log) or 10 (common log).

The log with damper operation is useful to normalize some micro array data (Nimblegen for instance) see Desprat et al. Genome Res. 2009 Dec;19(12):2288-99

Normalize

Normalization Coefficient

After a normalize operation the score of each window is divided by the result of the Score Count operation and multiplied by a specified fixed value. By default, after normalization the scores are expressed per 10 millions reads.



Standard Score

Calculates the standard score for the selected track i.e. (x - avg) / stdev; where x is the score, avg is the average score of the track and stdev is the standard deviation of the scores of the track.


Minimum, Maximum

Select Chromosomes

The maximum and minimum operations display respectively the greatest and the smallest score on the selected chromosomes. It shows a menu asking to select chromosomes.


Score Count

The score count operation computes the sum of the window scores on the selected chromosomes.


Average

This operation computes the average score of the windows of the selected chromosomes. Note that the score of each window is weighted by the length of the window.

Standard Deviation

Standard Deviation

This operation computes the standard deviation of the scores of the windows of the selected chromosomes. Note that the scores of each window are weighted by the length of the window.



Count Non-Null Length

This operation returns the sum of the lengths of the windows with a score different from zero on the selected chromosomes.


Filter

GenPlay provides four different filters:


Percentage Filter
Percentage Filter

This option filters the X% lowest values and the Y% greatest values where X and Y are two decimals and where X + Y <= 100. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).



Threshold Filter
Threshold Filter

This option removes the values that are lower than X OR greater than Y, where X and Y are two specified threshold values. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).



Band-Stop Filter
Band-Stop Filter

This option removes values between two specified threshold.



Count Filter
Count Filter

This option filters the X lowest values and the Y greatest values, where X and Y are two specified integers. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).



Transfrag

This operation aggregates the windows of the selected track that are separated by a gap smaller than a specified size (in bp).

The score of the new window can be the sum, the average or the maximum of the scores of the aggregated windows.


Show Repartition

The show repartition operation generates a graph showing the distribution of the scores of the selected tracks. The options for the type of plot are score v/s window count and score v/s base pair count.

The user needs to choose a size for the bins of scores. The graphics will show, depending on the selection, how many windows or how many base pair there is for each bin of scores.


Generate Fixed Window Track

This operation generates a fixed window track, with the specified bin size and data precision from the selected variable window track.


Fixed Window Track Operations




Operations With a Constant (Addition, Subtraction, Multiplication, Division, Invert)

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Two Tracks Operation

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Gauss

Sigma Value

This operation applies a Gaussian filter to the track, depending on the sigma value provided by the user.

G(x) = (1 / √ (2Π) σ) * e-x2 / 2 σ2

Where, x is the score and σ is the standard deviation of the track.

You can choose the extrapolate option to "fill" the windows with a score of zero.


Moving Average

For each window of the track, compute the average on a region of a specified size center on the window and score the window with the result of this average. The half-size of the region is prompted prior to the calculation.

You can choose the extrapolate option to "fill" the windows with a score of zero.


Loess Regression

This operation computes the Loess regression of degree 1 on the selected track.

For each x value where a y value is to be calculated, the Loess technique performs a regression on points in a moving range around the x value, where the values in the moving range are weighted according to their distance from this X value.

The Loess regression is a smoothing function. You will need to precise the half size of the moving window on which the regression will be computed.

The weight function of the Loess regression is computed as follow: W(i) = (1 - X(i)^3)^3, where X(i) is the normalized distance: current distance / maximum distance among points in the moving regression.

You can choose the extrapolate option to "fill" the windows with a score of zero.


Indexation

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Indexation Per Chromosome

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Log

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Log With Damper

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Normalize

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Standard Score

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Minimum, Maximum

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Bin Count

The bin count operation displays the number of windows (bins) with a score different from 0 on the selected chromosomes. It shows a menu asking to select chromosomes.


Score Count

Select Chromosomes

The score count operation returns the sum of the scores of each window of the selected chromosomes of the selected track. It shows a menu to select the chromosomes. If the track was initially loaded using some of the reads to summarize the data by windows this returns the total number of mapped reads in the experiments.


Average

Computes the average score of the windows of the selected chromosomes.


==== Standard Deviation ==== Computes the standard deviation of the scores of the windows of the selected chromosomes.



Correlation

Correlation Report

The correlation operation computes the Pearson’s correlation between the score values of two tracks. The two tracks need to have the same bin size. The following formula is used to calculate the correlation:

ρ = ( ∑ xi yi – n x’ y’) / ((n - 1) σx σy)

Where:

  • ρ is the Pearson’s correlation
  • xi and yi are the scores of the tracks
  • n is the number of values
  • x’ and y’ are the means of the scores of the tracks
  • σx and σy are the standard deviations of the scores of the tracks

The figure on the right shows a correlation report.

Note: The correlation is computed only on the windows that are different from zero on both track. If one of the track has a zero value window, the window of the other track with the same coordinate will be skipped as well.



Filter

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Find Peaks

The find peak operation offers three different algorithms that can be used to find the peaks:


Standard Deviation Peak Finder
Standard Deviation Peak Finder

The standard deviation peak finder prompts the user to enter two parameters.

The parameter ‘S’ specifies the number of windows to be considered for each window on either side in order to calculate the standard deviation.

For example, if S = 10, it means that for each window we consider 10 windows to the left and 10 windows to the right to calculate the standard deviation.

For a window to be accepted, its standard deviation needs to be at least ‘T’ times greater than the value of the standard deviation of the chromosome.



Density Peak Finder
Density Peak Finder

The Density Finder works as follows:

The parameter ‘S’ specifies the number of windows to be considered for each window on either side of the window under consideration.

For the window under consideration to be accepted, at least ‘P’ percentage of values must be above the high threshold ‘H’ or at least ‘P’ percentage of values must be below the low threshold ‘L’.



Island Finder
Island Finder

The Island Finder is based on the algorithm described in the paper Zang, C., Schones, D. E., Zeng, C., Cui, K., Zhao, K., and Peng, W. (2009). A clustering approach for identification of enriched domains from histone modification chip-seq data. Bioinformatics (Oxford, England), 25(15):1952-1958.

The parameters window value and gap of the island finder are the parameters ‘l0’ and ‘g’ respectively. The island score allows the user to select the scores greater than or equal to a particular value. The island length parameter allows the user to select islands encompassing at least specified number of windows. There are two result types:

  • Start values: Depicts only those islands that are selected and removes the ones that are rejected.
  • Island score: Depicts the islands by considering the score.
  • Island Summit: Depicts the island with the summit of the input island as a score.





Transfrag

Tracks Before and After Transfrag

This operation aggregates the bins of the selected track that are separated by a gap (bins with a score of zero) smaller than a specified size.

The score of the new window can be the sum, the average or the maximum of the scores of the aggregated windows. The result track can either be a fixed window track or a gene track.



Change Bin Size

The change bin size operation changes the size of the bins of the track. It shows a dialog box allowing the user to enter the new bin size.


Change Precision

The change precision operation allows you to change the data precision of the selected track. Refer to the Data Precision section for further information regarding the data precision.


Density

This operation generates a new fixed window track where the score of the windows represent the density of non null windows in the neighborhood of the windows. You first need to enter the size S of the neighborhood. For each window W, the algorithm count how many of the S windows before W and the S windows after W have a score different from zero. This value is then divided by 2 * S + 1 and the result is the score of W.


Show Repartition

Please refer to the equivalent operation in the Variable Window Track Operations section for information about this functionality.


Concatenate

Select Tracks to Concatenate

The concatenate operations allows you to generate a file containing the scores of multiple fixed window tracks that have the same bin size. The output file contains the following fields:

  1. chromosome
  2. start position
  3. stop position
  4. score track 1
  5. score track 2
  6. score track 3
  7. ...





Interval Summarization

This operation needs two tracks:

  • The selected track that defines the scores
  • A second track that defines the intervals

This operation generates a new track containing the intervals of the "interval track". For each interval the algorithm then looks at the corresponding scores in the score track, and compute either the maximum, the average or the sum of all the scores that fall in the interval. This value is the new score value in the result track.

You can also choose to use only a certain percentage of the greatest scores that falls in the interval.


Generate Variable Window Track

This operation generate a variable window track from the selected fixed window track.


Gene Track Operations

Directly on a gene track, you can:

  1. Double click on a gene to open a web page describing the gene. Make sure that your input file contains a searchURL line as described in the File Formats section in order to enable this option.
  2. Put the mouse over a gene to have some information about the name and the score of the gene. If the exons of the gene have different scores you can put your mouse over an exon to have the exon score.




Search Gene

Find Gene

Use this option to search a gene on the selected track by typing the name of the gene.

Check the Match Case option if you want the search to be case sensitive. Check the whole word option if you want to search genes where the input match the whole name of the gene. Press next or previous to find respectively the next or previous gene found. You can also open the Find Gene dialog by pressing CTRL+F after selecting a gene track.



Extract Intervals

Extract Intervals

This option allows you to extract intervals defined relatively to the beginning, the end or the middle of a gene and to generate a new gene track showing these intervals.

You can, for example, defined promoters as regions that starts 100bp before the beginning of genes and that ends 150bp after the beginning of genes. This option would allow you to generate a new track from this parameters.



Extract Exons

Extract Exons

This option generate a new gene track showing only the exons of the genes of the selected track.

You can choose between the three following options:

  1. Extract the first exon of the genes
  2. Extract the last exon
  3. Extract all the exons





Score Exons

To execute this operation you need to have at least one fixed or variable window track loaded. For each exon of each gene of the selected gene track this operation is going to compute either the average, the maximum or the sum of all the windows of the specified fixed or variable window track that falls in the exon.


Filter

This option provides four different filters for gene tracks:


Percentage Filter
Percentage Filter

This option filters the genes with the X% lowest overall score and the Y% greatest overall scores where X and Y are two decimals and where X + Y <= 100. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).



Threshold Filter
Threshold Filter

This option filters the genes with an overall score that are lower than X OR greater than Y, where X and Y are two specified threshold values. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).



Band-Stop Filter
Band-Stop Filter

This option removes the genes with an overall score between two specified threshold.



Count Filter
Count Filter

This option filters the X lowest scored genes and the Y greatest scored genes, where X and Y are two specified integers. You can choose between removing the filtered values (remove) or setting the filtered values to the boundary values (saturate).



Filter Strand

You need to select a strand when prompted. At the end of the operation the track will contain only the genes on the selected strand. All the other genes will have been removed.


Rename Genes

This operation allows you to change the name of the genes. You need to provide a text file where each line contains the current gene name and the new gene name separated by a tabulation. Every time a gene with a name from the first column is found this name will be replace by the new gene name from the second column.


Distance Calculation

Development in progress, coming soon.


Score Repartition Around Start

You first need to select a Fixed window track containing the scores. After that, you need to select the chromosomes on which you want to execute the operation. You also need to specify a bin size S, a bin count C and a method for the calculation of the scores.

The operation will create C bins on each side of the start position of each gene. The size S of each bin is in base-pair. Depending of the method of calculation chosen the operation is going to compute the sum, the maximum or the average of the scores for each corresponding bin from each gene and display a bar graph of the result. The data can be exported by right-clicking on the graph and using the "save as" function.

Multi-curve graph can be generated using the following procedure: To generate a comparison between 2 fixed-window tracks: 1) Perform an analysis for the first track as described above. 2) Save it to your hard drive. 3) Close the graph window. 4) Perform the same analysis on the second track. 4) Right click on the second graph and choose the load data option. 5) Load the first analysis. Colors of the curves, type of graphs (bar, points, curve) and scale can be adjusted by right-clicking on the graph. Procedure can be used to load more than two graphs. To produce more complex graphs we recommend loading the saved data on your favorites spreadsheet software. Score Repartition Around Start


Sequence Track Operations

There is currently no operation available for the sequence tracks.


SNP Track Operations

Directly on a SNP track, you can put the mouse over a SNP to have some extra information about the name or the base counts ratio of the SNP.


Find Next / Find Previous

This operation set the position of the screen middle bar (red line) on the position of the next or the previous SNP on the track.


Threshold Filter

Threshold Filter

The threshold filter operation removes all the SNPs with a first base count or the second base count smaller than specified thresholds.



Ratio Filter

Ratio Filter

The ratio filter operation removes all the SNPs where the ratio (first base count) / (second base count) is smaller or greater than specified values.



Remove SNPs Not In Genes

This operation will ask you to select a gene track in order to remove all the SNPs from the selected track that are not inside the genes of the gene track.


Repeat Track Operations

There is currently no operation available for the repeat track.