Updated GATK workflow to HaplotypeCaller and gVCF

I’ve updated my GATK workflow to GATK’s joint genotyping genomic VCF (gVCF) workflow, implemented in GATK3.4. I’ll provide the entire workflow here but it’s only the HaplotypeCaller step that is changed from:

java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 -stand_call_conf 30 -stand_emit_conf 10 -o raw.vcf

To:

#Call variants (HaplotypeCaller) and prepare for genotyping

java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 –emitRefConfidence GVCF -o raw_gVCF.vcf
and then:

#Genotype GVCFs

java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta –variant raw_gVCF.vcf -o raw.vcf

  • Full workflow

Filtering is advisable once variants are called

#Index reference
bwa index reference.fasta

#Sort reference
samtools faidx reference.fasta

#Create sequence dictionary
java -jar ~/bin/picard-tools-1.8.5/CreateSequenceDictionary.jar REFERENCE=reference.fasta OUTPUT=reference.dict

#Align reads and assign read group
bwa mem -R “@RG\tID:FLOWCELL1.LANE1\tPL:ILLUMINA\tLB:test\tSM:PA01” reference.fasta R1.fastq.gz R2.fastq.gz > aln.sam

#Sort sam file
java -jar ~/bin/picard-tools-1.8.5/SortSam.jar I=aln.sam O=sorted.bam SORT_ORDER=coordinate

#Mark duplicates
java -jar ~/bin/picard-tools-version/MarkDuplicates.jar I=sorted.bam O=dedup.bam METRICS_FILE=metrics.txt

#Sort bam file
java -jar ~/bin/picard-tools-version/BuildBamIndex.jar INPUT=dedup.bam

#Create realignment targets
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I dedup.bam -o targetintervals.list

#Indel realignment
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T IndelRealigner -R reference.fasta -I dedup.bam -targetIntervals targetintervals.list -o realigned.bam

#Call variants (HaplotypeCaller) and prepare for genotyping
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 –emitRefConfidence GVCF -o raw_gVCF.vcf

#Genotype GVCFs
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta –variant raw_gVCF.vcf -o raw.vcf

Advertisements

One thought on “Updated GATK workflow to HaplotypeCaller and gVCF

  1. Pingback: Variant calling with GATK | approachedinthelimit

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s