TimEX Tutorial

From GenPlay, Einstein Genome Analyzer

Revision as of 15:24, 13 September 2011 by Julien (talk | contribs) (Created page with "'''Goal:''' This tutorial illustrates how GenPlay can be used to show timing of replication profiles. The goal of the tutorial is to compute the correlation coefficient between ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Goal: This tutorial illustrates how GenPlay can be used to show timing of replication profiles. The goal of the tutorial is to compute the correlation coefficient between the replication timing in human embryonic stem (ES) cells and in primary basophilic erythroblasts derived in culture from primary CD34 positive cells.

The TimEX procedure is described in Desprat et al. (Genome Res. 2009 Dec;19(12):2288-99). Briefly, the timing of DNA replication can be estimated by measuring the number of copy of each DNA segment in cells that are undergoing replication ( S phase cells) as compared to the number of copies of the same DNA segment for cells that have not yet started to replicate (cells in the G1 phase of the cell cycle). When both alleles of any DNA segment replicate early in S phase, there are four copies during most of S phase and the average numbers of copies in a population of cells in S phase is close to 4. Hence, the S/G1 ratio is close to 2 (since G1 cells have 2 copies of each DNA segments). By contrast when a DNA segment replicates late in S phase, the average numbers of copies in a population of cells in S phase is close to 2 and the S/G1 ratio close to 1. In this tutorial we will calculate the S/G1 ratio genome wide in 5,000bp windows for both cell types and then compare the results.


Note: The following tutorial is based on the hg19 genome assembly which is the default genome assembly of GenPlay. If you previously changed the genome assembly used by GenPlay in the configuration menu you would need to restore the hg19 assembly. Please refer to the Documentation section of this website for more information on how to change the reference assembly.


Note: The final result of this tutorial is available as a project that can be loaded from the Web Start page of this website.


Load the Files

The first thing to do is to download the four files used during this tutorial. The files are available here (if your web-browser opens the files in a new tab or window, please select the Save As option of the File menu of your browser to retrieve the files.

After downloading the files, you can start GenPlay from the Web Start link that is located on top of this page. The 1 GB link is enough for this tutorial, but generally you should allocate as much memory as available on your computer.

Once GenPlay is started, right click on the track handler of the 1st row, in order to open a contextual menu that will allow you to load tracks (figure 1). Select the Load Fixed Window Track option.

Figure 1: Load Menu

After selecting the ES G1 file, an option window is going to prompt you to enter information on how to load the data. You can keep the default name for the track or change it if you prefer. Then, you need to choose a window size and a method of score calculation. The size of the windows that you should choose depends on the number of reads that are available. For timing of replication studies, we can choose a window size of 5,000bp. The option for the score calculation are discussed in the Documentation. For the type of files used in this example, you should choose sum as the method for the score calculation. You can keep the default data precision.

Figure 2: Load Fixed Window Track Dialog

Once your track is loaded you need to repeat the same operation for the 3 other files. Beside the name of the track, all the other options are the same.

Filtering the G1 tracks

Now that our tracks are loaded, we need to filter the windows with less than 8 reads for not being statistically significant.

This step is necessary becasue we found that windows with low number of reads were increasing the signal to noise ratio because of excessive sampling errors.

To filter the windows with less than 8 reads, right-click on the hES-G1 track handler, select the operation menu and then select the filter sub-menu (figure 3).

Then select the Threshold filter option and set the values as shown on figure 4. Click on OK in order to remove all the windows with a score smaller than 8.

Figure 3: Filter Menu
Figure 4: Threshold Filter


Normalizing the tracks

In order to be comparable, all the tracks need to be normalized.

The normalize operation of GenPlay divides each score by the some of all the scores of the track and multiply the result by a large constant specified by the user. The only purpose of the constant is to make the score more readable. Let's start by normalizing the hES-G1 track. You need first to right click on the track handler. After that, select the normalize option of the operation sub-menu as shown on figure 5. You can keep the default constant, just click OK. After that operation the score are expressed as number of read per window per 10 million reads (if you kept the default constant) because the window score represents the sum of the reads that mapped to that window, the total of all the score is therefore the total number of reads. After normalization each score is equal to (# of read per window *10,000,000) / total # of reads.

Once normalization is done for the first track. The three other tracks are processed in the same manner.

Figure 5: Normalize


Computing the ratio S / G1

The next step is to generate new tracks representing the ratio S / G1 for each cell type. To do that you need to right click on the hES-S track handler and then to select the "Two Tracks Operation" option of the Operation menu. After that, you need to select the hES-G1 track when prompted. Select a track where you want to generate the result, select the division operation and keep the default data precision.

After repeating the same operation with the erythroid tracks the result should be similar to the one on figure 6.

Figure 6: S / G1


Gaussing the tracks

If you zoom out, you can already see that the S / G1 ratio varies and that there seem to be replication timing domains that can be several megabases in size. We now want to smooth and remove the noise from the curves to obtained a more usable curve.

So far, two smoothing algorithms had been incorporated to GenPlay: the Gaussian smoothing and the Loess smoothing. In practice these two operations produce pretty similar results.

To gauss the tracks, first right click on one of the two S / G1 track handler and select the Gauss option of the operation menu. We need to set the gaussian smoothing parameter Sigma. We set Sigma to 200,000 kb for this experiment. Don't extrapolate the result to null windows when prompted.

Repeat the operation for the second track. The figure 7 show the result of the smoothing.

Figure 7: Gaussian Smoothing


Saturating and Indexing the tracks

To make the tracks easier to compare we need to index (rescale) them between 0 and 100.

Before indexing the data we want to saturate the 1% greatest and smallest value in order to reduce the effect of eventual outliers. To saturate the track you need to right click on the track handler and select the Operation > Filter menu. On the filter option select the percentage filter and set the parameters as shown on figure 8. You need to do this operation for the ES and ERY tracks.

Now we need to index the two tracks. Select one of the track, right click on the track handler to show the contextual menu and click on the Operation > Index option. Set the new minimum to 0 and the new maximum to 100. Repeat the operation for the second track. The result should be similar to what is shown on figure 9.


Figure 8: Saturate Menu
Figure 9: Index Result


Computing the Correlation Coefficient

The last step is to compute the correlation coefficient between these two tracks. To do so you need to select the Operation > Correlation option of the contextual menu. When asked choose the second track for the correlation. A window showing the correlation coefficient for each chromosome as well as the genome wide correlation should pop-up (figure 10).

Figure 10: Correlation Coefficient