First you need to edit the filtered vcf files to remove the sample name - as the program is intended to find differences between samples of the same species.
vi varX.flt.vcf
vi varY.flt.vcf
Next you need to use bgzip and tabix from Heng Li to get a compressed and indexed datafile.
bgzip varX.flt.vcf
bgzip varY.flt.vcf
tabix -p vcf varX.flt.vcf.gz
tabix -p vcf varY.flt.vcf.gz
Next you can use the vcftools function vcf-isec to find the complements of the two datasets. These will be the variants that are unique to the different species.
vcf-isec -c varX.flt.vcf.gz varY.flt.vcf.gz | bgzip -c > unique_varX.vcf.gz
vcf-isec -c varY.flt.vcf.gz varX.flt.vcf.gz | bgzip -c > unique_varY.vcf.gz
You can also create a Venn diagram of the overlap of variants between the different species.
vcf-compare var0.flt.vcf.gz var15.flt.vcf.gz > venn.out
And also look at the overlap in variants
vcf-isec -o -n +2 var15.flt.vcf.gz var0.flt.vcf.gz | bgzip -c > overlap_var15.vcf.gz
No comments:
Post a Comment