From GenPlay, Einstein Genome Analyzer
Goal: This tutorial illustrates how GenPlay can be used to show timing of replication profiles. The goal of the tutorial is to compute the correlation coefficient between the replication timing in human embryonic stem (ES) cells and in primary basophilic erythroblasts derived in culture from primary CD34 positive cells.
The TimEX procedure is described in Desprat et al. (Genome Res. 2009 Dec;19(12):2288-99). Briefly, the timing of DNA replication can be estimated by measuring the number of copy of each DNA segment in cells that are undergoing replication ( S phase cells) as compared to the number of copies of the same DNA segment for cells that have not yet started to replicate (cells in the G1 phase of the cell cycle). When both alleles of any DNA segment replicate early in S phase, there are four copies during most of S phase and the average numbers of copies in a population of cells in S phase is close to 4. Hence, the S/G1 ratio is close to 2 (since G1 cells have 2 copies of each DNA segments). By contrast when a DNA segment replicates late in S phase, the average numbers of copies in a population of cells in S phase is close to 2 and the S/G1 ratio close to 1. In this tutorial we will calculate the S/G1 ratio genome wide in 5,000bp windows for both cell types and then compare the results.
Prerequisite: GenPlay need to be installed on your computer. If you haven't installed GenPlay yet, please visit the Downloads page and follow the instruction to download and install GenPlay.
Note: The final result of this tutorial is available as a project that can be loaded from the Projects page of this website.
Start a New Project
After starting GenPlay you will be prompted to select a name, a clade, a genome and an assembly for your project. You can enter "TimEX Tutorial" for the name, select the mammal clade and human genome (figure 1).
Then, click on the tool box button on the assembly line. A new window will appear allowing you to select chromosomes. For this tutorial we will only keep the basic chromosomes. Click on the Basic button to exclude the random chromosomes and alternative loci (figure 2).
Load the Files
The first thing to do is to download the four files used during this tutorial. The files are available here (if your web-browser opens the files in a new tab or window, please select the Save As option of the File menu of your browser to retrieve the files).
Once GenPlay is started, right click on the track handler of the 1st row, in order to open a contextual menu that will allow you to load layers (figure 3). Select Add Layer(s) (figure 4), then select the hES-G1.bed file and choose Variable or Fixed Window Layer.
After selecting the ES G1 file and the layer type, an option window is going to prompt you to enter information on how to load the data. You can keep the default name for the layer or change it if you prefer. Then, you need to check the Bin Data option and to choose a window size and a method of score calculation. The size of the windows that you should choose depends on the number of reads that are available. For timing of replication studies, we can choose a window size of 5,000bp. The option for the score calculation are discussed in the Documentation. For the type of files used in this example, you should choose sum as the method for the score calculation.
Once your layer is loaded you need to repeat the same operation for the 3 other files. Beside the name of the layers, all the other options are the same.
Filtering the G1 layers
Now that our tracks are loaded, we need to filter the windows with less than 8 reads for not being statistically significant.
This step is necessary becasue we found that windows with low number of reads were increasing the signal to noise ratio because of excessive sampling errors.
To filter the windows with less than 8 reads, right-click on the hES-G1 track handler, select the layer operation menu at the bottom of the contextual menu and then select the filter sub-menu (figure 6).
Then select the Threshold filter option and set the values as shown on figure 7. Click on OK in order to remove all the windows with a score smaller than 8. The result is shown on figure 8.
Normalizing the layers
In order to be comparable, all the layers need to be normalized.
The normalize operation of GenPlay divides each score by the some of all the scores of the layer and multiply the result by a large constant specified by the user. The only purpose of the constant is to make the score more readable. Let's start by normalizing the hES-G1 layer. You need first to right click on the track handler. After that, select the normalize option of the layer sub-menu as shown on figure 9. You can keep the default constant, just click OK. After that operation the score are expressed as number of read per window per 100 billion reads (if you kept the default constant) because the window score represents the sum of the reads that mapped to that window, the total of all the score is therefore the total number of reads. After normalization each score is equal to (# of read per window *100,000,000,000) / total # of reads.
Once normalization is done for the first layer. The three other layers are processed in the same manner.
Computing the ratio S / G1
The next step is to generate new layers representing the ratio S / G1 for each cell type. To do that you need to right click on the hES-S track handler and then to select the "Two Tracks Operation" option of the Layer menu (Figure 10). After that, you need to select the hES-G1 layer when prompted (Figure 11). Select a track where you want to generate the result and select the division operation.
After repeating the same operation with the erythroid tracks the result should be similar to the one on figure 12.
Gaussing the layers
If you zoom out, you can already see that the S / G1 ratio varies and that there seem to be replication timing domains that can be several megabases in size. We now want to smooth and remove the noise from the curves to obtained a more usable curve.
So far, three smoothing algorithms had been incorporated to GenPlay: the moving average smoothing, the Gaussian smoothing and the Loess smoothing. In practice these three operations produce pretty similar results.
To gauss the layers, first right click on one of the two S / G1 track handler and select the Smoothing option of the layer menu (Figure 13). Then, select the Gaussian algorithm. We need to set the gaussian smoothing parameter Sigma. We set Sigma to 100kb for this experiment (which correspond to a moving window of 400kb, Figure 14). Don't extrapolate the result to null windows when prompted.
Repeat the operation for the second layer. The figure 15 show the result of the smoothing.
Saturating and Indexing the layers
To make the tracks easier to compare we need to index (rescale) them between 0 and 100.
Before indexing the data we want to saturate the 1% greatest and smallest value in order to reduce the effect of eventual outliers. To saturate the track you need to right click on the track handler and select the Layer > Filter menu (Figure 16). On the filter option select the percentage filter and set the parameters as shown on figure 17. You need to do this operation for the ES and ERY tracks.
Now we need to index the two tracks. Select one of the track, right click on the track handler to show the contextual menu and click on the Layer > Index option (Figure 18). Set the new minimum to 0 and the new maximum to 100. Select No when asked to index the chromosomes independently. Repeat the operation for the second track. The result should be similar to what is shown on figure 19.
Computing the Correlation Coefficient
The last step is to compute the correlation coefficient between these two layers. To do so you need to select the Layer > Correlation option of the contextual menu. When asked choose the second layer for the correlation. A window showing the correlation coefficient for each chromosome as well as the genome wide correlation should pop-up (figure 20).