Category Archives: Uncategorized

Updated GATK workflow to HaplotypeCaller and gVCF

I’ve updated my GATK workflow to GATK’s joint genotyping genomic VCF (gVCF) workflow, implemented in GATK3.4. I’ll provide the entire workflow here but it’s only the HaplotypeCaller step that is changed from:

java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 -stand_call_conf 30 -stand_emit_conf 10 -o raw.vcf

To:

#Call variants (HaplotypeCaller) and prepare for genotyping

java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 –emitRefConfidence GVCF -o raw_gVCF.vcf
and then:

#Genotype GVCFs

java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta –variant raw_gVCF.vcf -o raw.vcf

  • Full workflow

Filtering is advisable once variants are called

#Index reference
bwa index reference.fasta

#Sort reference
samtools faidx reference.fasta

#Create sequence dictionary
java -jar ~/bin/picard-tools-1.8.5/CreateSequenceDictionary.jar REFERENCE=reference.fasta OUTPUT=reference.dict

#Align reads and assign read group
bwa mem -R “@RG\tID:FLOWCELL1.LANE1\tPL:ILLUMINA\tLB:test\tSM:someID” reference.fasta R1.fastq.gz R2.fastq.gz > aln.sam

#Sort sam file
java -jar ~/bin/picard-tools-1.8.5/SortSam.jar I=aln.sam O=sorted.bam SORT_ORDER=coordinate

#Mark duplicates
java -jar ~/bin/picard-tools-version/MarkDuplicates.jar I=sorted.bam O=dedup.bam METRICS_FILE=metrics.txt

#Sort bam file
java -jar ~/bin/picard-tools-version/BuildBamIndex.jar INPUT=dedup.bam

#Create realignment targets
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I dedup.bam -o targetintervals.list

#Indel realignment
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T IndelRealigner -R reference.fasta -I dedup.bam -targetIntervals targetintervals.list -o realigned.bam

#Call variants (HaplotypeCaller) and prepare for genotyping
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 –emitRefConfidence GVCF -o raw_gVCF.vcf

#Genotype GVCFs
java -jar ~/bin/GATK3.4/GenomeAnalysisTK.jar -T GenotypeGVCFs -R reference.fasta –variant raw_gVCF.vcf -o raw.vcf

Advertisements