Updated GATK workflow to HaplotypeCaller and gVCF

I’ve updated my GATK workflow to GATK’s joint genotyping genomic VCF (gVCF) workflow, implemented in GATK3.4. I’ll provide the entire workflow here but it’s only the HaplotypeCaller step that is changed from:

java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 -stand_call_conf 30 -stand_emit_conf 10 -o raw.vcf

To:

#Call variants (HaplotypeCaller) and prepare for genotyping

java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 –emitRefConfidence GVCF -o raw_gVCF.vcf
and then:

#Genotype GVCFs

java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta –variant raw_gVCF.vcf -o raw.vcf

  • Full workflow

Filtering is advisable once variants are called

#Index reference
bwa index reference.fasta

#Sort reference
samtools faidx reference.fasta

#Create sequence dictionary
java -jar ~/bin/picard-tools-1.8.5/CreateSequenceDictionary.jar REFERENCE=reference.fasta OUTPUT=reference.dict

#Align reads and assign read group
bwa mem -R “@RG\tID:FLOWCELL1.LANE1\tPL:ILLUMINA\tLB:test\tSM:someID” reference.fasta R1.fastq.gz R2.fastq.gz > aln.sam

#Sort sam file
java -jar ~/bin/picard-tools-1.8.5/SortSam.jar I=aln.sam O=sorted.bam SORT_ORDER=coordinate

#Mark duplicates
java -jar ~/bin/picard-tools-version/MarkDuplicates.jar I=sorted.bam O=dedup.bam METRICS_FILE=metrics.txt

#Sort bam file
java -jar ~/bin/picard-tools-version/BuildBamIndex.jar INPUT=dedup.bam

#Create realignment targets
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I dedup.bam -o targetintervals.list

#Indel realignment
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T IndelRealigner -R reference.fasta -I dedup.bam -targetIntervals targetintervals.list -o realigned.bam

#Call variants (HaplotypeCaller) and prepare for genotyping
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 –emitRefConfidence GVCF -o raw_gVCF.vcf

#Genotype GVCFs
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta –variant raw_gVCF.vcf -o raw.vcf

Advertisements

5 thoughts on “Updated GATK workflow to HaplotypeCaller and gVCF

  1. Pingback: Variant calling with GATK | approachedinthelimit

    1. mattmoorebioinf Post author

      Hi Arup,

      I can look into this further but I suspect it’s due to the picard tools version (1.8.5). In the most up to date versions the commands are totally different! I may update the post to reflect this

      Reply
  2. Samantha

    Thank you for your detailed pipeline.

    Can you explain why you are using HaplotypeCaller instead of UnifiedGenotyper? It seems like UnifiedGenotyper is better suited for haploid organisms.

    Reply
    1. mattmoorebioinf Post author

      Hi Samantha,

      So initially I had to use UnifiedGenotyper because HaplotypeCaller didn’t deal with anything other than diploid organisms. They updated it (I’m not sure from which version) to then include non-diploid organisms including haploid and pooled samples of multiple-ploidy.

      I’ve previously generated simulated reads (with putative SNPs and indels) and compared the major variant callers, the results were basically the same for UnifiedGenotyper and HaplotypeCaller. HaplotypeCaller though should work better for indels due to its de novo assembly around variants (https://software.broadinstitute.org/gatk/documentation/tooldocs/current/org_broadinstitute_gatk_tools_walkers_haplotypecaller_HaplotypeCaller.php).

      Also it seems that GATK will capture large insertions but not necessarily large deletions, it’s always worth checking genes of interest by assembly or from the sam or bam files that a huge chunk is missing anyway.

      Let me know if this answers your question!

      Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s