A Simple Pipeline to Assess Genetic Diversity Between Bacterial Genomes
This simple tool calculates pairwise SNPs for every possible combination in the genomes provided as input.
- I. queryNCBI.pl downloads genomes from the NCBI Genomes FTP site. The Tab/CSV-delimited input lists can be generated from NCBI's Genome Assembly and Annotation reports (e.g. http://www.ncbi.nlm.nih.gov/genome/genomes/186?#).
- II. SSRG.pl generates FASTQ datasets from fasta files at user specified read lengths, with coverages of 50X or 100X. Note that this tool is valid only for haploid genomes. This tool is especially useful to compare genomes in databases for which sequencing reads are unavailable.
- III. get_SNPs.pl maps FASTQ files provided against references genomes using BWA, Bowtie2 or HISAT2, as specified by the user. SNPs and indels (optional) are then calculated with Samtools + VarScan2, BCFtools, or FreeBayes. Both synthetic reads generated by SSRG.pl and/or regular FASTQ files using the Sanger Q33 scoring scheme can be used as input.
- IV. SNPs reported in the VCF files can be summarized with count_SNPs.pl.
With the exception of the SSRG.pl script, the pipeline should work fine with eukaryotes.
Requirements:
- Unix/Linux or MacOS X
- Perl 5
- Samtools version 1.3.1+ - http://www.htslib.org/ (Li et al. 2009. DOI: 10.1093/bioinformatics/btp352)
Compatible read aligners
- BWA version 0.7.12 - http://bio-bwa.sourceforge.net/ (Li et al. 2010. DOI: 10.1093/bioinformatics/btp698)
- Bowtie2 version 2.2.9 or above - http://bowtie-bio.sourceforge.net/bowtie2/index.shtml (Langmead et al. 2012. DOI: 10.1038/nmeth.1923)
- HISAT2 version 2.0 or above - https://ccb.jhu.edu/software/hisat2/index.shtml (Kim et al. 2015. DOI: 10.1038/nmeth.3317)
Compatible variant callers
- VarScan2 version 2.4.2 or above - http://dkoboldt.github.io/varscan/ (Koboldt et al. 2012. DOI: 10.1101/gr.129684.111)
- BCFtools version 1.3.1 or above - http://samtools.github.io/bcftools/
- FreeBayes - https://github.com/ekg/freebayes (Garrison et al. 2012. arXiv preprint arXiv:1207.3907 [q-bio.GN])
2016-08-28 - Added support for Mash - https://github.com/marbl/Mash (Ondov et al. 2016. DOI: 10.1186/s13059-016-0997-x)
- Mash genetic distances can be computed using get_SNPs.pl or with the standalone run_Mash script
dnf install bwa; dnf install ncurses-devel