|
|
| (134 intermediate revisions by 4 users not shown) |
| Line 2: |
Line 2: |
| | | | |
| | == ChIP-Seq Analysis == | | == ChIP-Seq Analysis == |
| − | '''Goal:''' The objective is first to isolate the peaks from the data generated from a ChIP-Seq experiment. Then, we want to generate a list of genes that have a peak in their promoter and associate for each promoter the score of the peak summit.
| + | The objective of the [[ChIP-Seq Tutorial]] is to illustrate how GenPlay can be used to isolate peaks from the data generated from a ChIP-Seq experiment. |
| | | | |
| − | === Load the File === | + | == TimEX Analysis == |
| − | The first thing to do is to download the file CHiP-Seq file and the RefSeq gene annotation file from the tutorial directory [http://www.genplay.net/tutorials/CHiP-Seq/ here]. | + | The [[TimEX Tutorial]] illustrates how GenPlay can be used to show timing of replication profiles. |
| − | After that, you can start GenPlay from the Web Start link that is located on top of this page. The 1 GB link is enough for this tutorial. For this experiment we're going to work only on the first chromosome so the loading is shorter and the amount of memory needed is smaller.
| |
| | | | |
| − | We first want to know if the position between the two strands are shifted. This shifting can happen during sequencing because the sequencers reads only the first bases of the DNA fragments. To determine how much the strands are shifted we need to load the file twice, on the 3' and on the 5' separately.
| + | == Multi-Genome Analysis == |
| | + | The [[GRCh37/hg19 GRCh38/hg38 Multi-Genome Tutorial]] explains how to use the multi-genome functionality of GenPlay. |
| | | | |
| − | You need to right click on the track handler of the 1st row, in order to open the menu that will allow you to load the track (figure 1). Select the Load Fixed Window Track option.
| + | It shows how data aligned on hg38 and hg19 can be displayed simultaneously and compared using GenPlay Multi-Genome |
| − | [[image:tutorial1_empty_track_menu.png|center|frame|Figure 1: Load Menu]]
| |
| | | | |
| − | After selecting the file, an option window is going to prompt you to enter information on how to load the data. You can keep the default name for the track. You need to choose a window size of 100 bp, and the sum option for the score calculation. Please refer to the documentation page for more information about these options. You can keep the default data precision. But we need to select a strand. Let's start with the 5' strand. You will also need to select the 1st chromosome. To do so, click on the "Modify Selection" button on the bottom right corner of the screen and then uncheck all the chromosomes but the first one. The figure 2 shows how the screen should like before you click on the OK button.
| + | '''Note:''' Here is a version for the comparison of hg18 and hg19: [[Multi-Genome Tutorial]]. |
| − | [[image:tutorial1_Load_FWT_menu.png|center|frame|Figure 2: Load Fixed Window Track Menu]] | |
| | | | |
| − | The operation needs to be repeated for the 3' strand. Once the tracks are loaded you can modify the Y axis by right clicking on the track handlers and selecting the "Set Y Axis" option. Set the maximum to 100. Now that the two tracks are loaded we can graphically determine how much the strands need to be shifted. Select a peak, zoom on it with the mouse wheel and check how far the summits of the same peak on the 5' and the 3' strands are. Verify that this value is the same on other peaks. When you're sure about the value, divide it by two and note this result. We notice that the summits are 300 bp away (figure 3) so the shifting value is 150 bp (meaning that the 5' is shifted 150 bp forward and the backward strand is shifted 150 bp backward).
| + | == How to Create a VCF File From a Chain File == |
| − | [[image:tutorial1_strand_shifting.png|center|frame|Figure 3: Find Strand Shifting]]
| + | The goal of [[How to Create a VCF File From a Chain File|this tutorial]] is to show how to generate a VCF file such as the one used in the [[GRCh37/hg19 GRCh38/hg38 Multi-Genome Tutorial]] from a Chain file that can be downloaded from the UCSC genome browser website. |
| − | | |
| − | We need to load the file again but this time we're going to load both strands with the appropriate strand shifting. This time the loading screen should look like on the figure 4.
| |
| − | [[image:tutorial1_Load_FWT_menu2.png|center|frame|Figure 4: Load Fixed Window Track Menu, both strands]]
| |
| − | <br style="clear: both" />
| |
| − | | |
| − | === Remove Outliers ===
| |
| − | For this analysis we decide to remove the tallest peaks. The reason is that these peaks might be artifacts resulting from error in the reference genome used for the alignment.
| |
| − | To get rid of the 0.05% windows with the greatest score you need to right click on the track handler of the last loaded track. A menu will pop-up. Select the "Operation" sub-menu and then select the "Filter" option (figure 5).
| |
| − | [[image:tutorial1_filter1.png|center|frame|Figure 5: Filter Menu]] | |
| − | | |
| − | Set the parameters of the filter as shown on the figure 6 and validate by clicking on Ok.
| |
| − | [[image:tutorial1_filter2.png|center|frame|Figure 6: Filter Dialog]]
| |
| − | | |
| − | The figure 7 shows the result of the operation. The track 4 is the one with the outliers removed.
| |
| − | | |
| − | Note that the color of the tracks had been modified by right clicking on the track handler and selecting the "Appearance" option.
| |
| − | [[image:tutorial1_filter3.png|center|frame|Figure 7: Filter Result]]
| |
| − | | |
| − | Now we need to remove the background noise and to keep only the islands.
| |
| − | | |
| − | === Isolate Peaks ===
| |
| − | This goal of this step is to remove the background noise from the track so just the peaks remain.
| |
| − | | |
| − | To do so, right click on the track handler, choose the "Operation" sub-menu and click on the "Find Peaks" option (figure 8).
| |
| − | [[image:Tutorial1 find peaks1.png|center|frame|Figure 8: Find Peaks Operation]]
| |
| − | | |
| − | After the find peaks dialog opens, choose the "Island Finder" option on the right panel and set the parameters as shown on the figure 9.
| |
| − | [[image:Tutorial1 find peaks2.png|center|frame|Figure 9: Find Peaks Menu]]
| |
| − | | |
| − | The island finder is described in the documentation section of this website.
| |
| − | You'll notice that the selected output is "Peak Summits". This means that for each island, the score of the windows on the output track will be the greatest score of the windows of the input track.
| |
| − | | |
| − | The result should be similar to what is shown on figure 10.
| |
| − | [[image:Tutorial1 find peaks3.png|center|frame|Figure 10: Find Peaks Result]]
| |
| − | | |
| − | === Extract Gene Promoters ===
| |
| − | First, we need to load the gene track. Right click on an empty track handler and select "Load Gene Track". Select the RefSeq file that we've already downloaded when prompted.
| |
| − | | |
| − | When it's done, right click on the track handler of the gene track and select "Extract Intervals" in the Operation sub-menu (Figure 11).
| |
| − | [[image:Tutorial1 extract promoters1.png|center|frame|Figure 11: Extract Intervals Menu]] | |
| − | | |
| − | A dialog box will pop-up. We decide to define a promoter as a region that starts 100bp before a gene start position and ends 50bp after. In order to do so, fill in the parameters as shown in figure 12.
| |
| − | [[image:Tutorial1 extract promoters2.png|center|frame|Figure 12: Extract Intervals Dialog]]
| |
| − | | |
| − | You'll finally be asked to select the result track position in the track list. The result track represents only the promoters of the genes of the input track (figure 13).
| |
| − | [[image:Tutorial1 extract promoters3.png|center|frame|Figure 13: Gene Promoters]]
| |
| − | | |
| − | === Score Promoters ===
| |
| − | [[image:Tutorial1 score exons1.png|right|thumb|100px|Figure 12: Score Exons Menu]]
| |
| − | Now that we have a track with the peaks and a track with the promoters we can score the promoters using the score of the peaks and export the result as a bed file.
| |
| − | | |
| − | To score the promoters, right click on the handler of the track with the promoters and select the "Score Exons" option of the "Operation" sub-menu (figure 12).
| |
| − | | |
| − | You'll be prompted to choose the track containing the scores. Select the track with the peaks extracted.
| |
| − | | |
| − | Then select average for the method of calculation and select a track where the result should appear.
| |
| − | | |
| − | The last thing we need to do is to export the result of our analysis. Right click on the newly created track handler. Select "Save As". Choose where you want to save the track and make sure that the file type is set to Bed Files. You can open the file that you created with a text editor such as notepad. You'll notice that the result file contains the position (field 1 to 3) of the promoters, the name of the genes (field 4), the strand of the gene (field 6) as well as the scores of the promoters (field 5). For more details about the result file you can refer to the File Type section of the documentation.
| |
The following tutorials aim to give you some of the basic concept on the track manipulation techniques.
It shows how data aligned on hg38 and hg19 can be displayed simultaneously and compared using GenPlay Multi-Genome