Difference between revisions of "How to Create a VCF File From a Chain File"

From GenPlay, Einstein Genome Analyzer

Jump to: navigation, search
(Generate a VCF with the Insertions and the Deletions)
Line 10: Line 10:
  
 
'''Scala''' needs to be installed on you computer. Scala is available to download from http://www.scala-lang.org/download/
 
'''Scala''' needs to be installed on you computer. Scala is available to download from http://www.scala-lang.org/download/
 
'''Perl''' also needs to be installed if it's not already available on your system. Perl can be download from http://www.perl.org/get.html
 
  
 
'''GenPlay''' needs to be installed on your computer. If you haven't installed GenPlay yet, please visit the [[Downloads]] page and follow the instructions to download and install GenPlay.
 
'''GenPlay''' needs to be installed on your computer. If you haven't installed GenPlay yet, please visit the [[Downloads]] page and follow the instructions to download and install GenPlay.
Line 36: Line 34:
  
 
Modify the paths to ChainToVCF.jar and hg19ToHg38.chain if needed.
 
Modify the paths to ChainToVCF.jar and hg19ToHg38.chain if needed.
 +
 +
== Add SNPs to the VCF File ==

Revision as of 00:37, 11 August 2014

Goal: This tutorial illustrates how to generate a VCF file describing the differences between two reference genomes from a Chain file. In this tutorial we will create a hg19 to hg38 VCF file. This means that the reference genome of the VCF file is hg38.

The tutorial is divided into two steps. The first step consists in generating a VCF containing the insertions and the deletions using a Chain file and a program developed in Scala called ChainToVCF.

In the second step we will use GenPlay and a Perl script to add the SNPs to our VCF file.


Prerequisite: You will need to have a Linux or Mac computer.

Scala needs to be installed on you computer. Scala is available to download from http://www.scala-lang.org/download/

GenPlay needs to be installed on your computer. If you haven't installed GenPlay yet, please visit the Downloads page and follow the instructions to download and install GenPlay.

Getting started

First, let's download the files needed from the UCSC genome browser. All the file are available from the download page of the UCSC genome browser. We will need the following:

1. hg19 to hg38 chain file (this file can be found in the LiftOver section): http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz

2. hg19 reference file in 2bit format (from the full dataset section): http://hgdownload.cse.ucsc.edu/goldenPath/hg19/bigZips/hg19.2bit

3. h38 reference file in 2bit format: http://hgdownload.cse.ucsc.edu/goldenPath/hg38/bigZips/hg38.2bit

Note that the chain file needs to be uncompressed.

Generate a VCF with the Insertions and the Deletions

In order to generate our VCF file we will need the Scala program ChainToVCF available at https://github.com/JulienLajugie/ChainToVCF

First, make sure that Scala is properly installed on your system.

Then, from a terminal, run the following command:

scala -classpath ./ChainToVCF.jar edu.yu.einstein.chainToVCF.ChainToVCF --chain ./hg19ToHg38.chain --source hg19 --target hg38 > hg19ToHg38.vcf

Modify the paths to ChainToVCF.jar and hg19ToHg38.chain if needed.

Add SNPs to the VCF File