https://doi.org/10.1038/s41564-023-01558-w
Much of the code in this repo was not used in the final analyses. The most important scripts are listed below.
run_clean_assemble_bin.sh is the master script for read QC, metagenome assembly, contig binning, bin QC, and taxonomic assignment. Parts of this script are hard-coded to work with the Cornell BioHPC SGE scheduler and the Brito Lab server structure.
Genes were called in metagenomic bins using run_prokka.sh. gtf annotations output by prokka can be converted to R objects using gtf2tibble.R.
The main scripts for cleaning up transcript reads and mapping those reads to references can be found in the Danko Lab proseq2.0 repo. Once you have bam files, per-base coverage reports can be generated with get_pileup_correct.sh.
EC_peaks.rmd and Stool_PRO-seq.Rmd contain the R code used for the E. coli and human microbiome analyses, respectively. The Rmarkdown documents are ordered by main sections (#) and subsections (##/###).
Post-review, analyses were conducted in separate notebooks. These notebooks can be found in data_processing_and_figures.
E. coli sequencing reads: https://www.ncbi.nlm.nih.gov/sra/PRJNA800038
microbiome sequencing reads: https://www.ncbi.nlm.nih.gov/sra/PRJNA800070