Materials for the Fall 2015 Harvard Extreme Computing course.
- Sean Davis, @seandavis12
- Eric Stahlberg
Course email list (approval, but just ask): https://groups.google.com/forum/#!forum/harvardextremecomputing2015
This lecture will serve as a basic review of the core concepts of normal biology with a focus on the current state of the understanding of the human genome.
This lab will be run by the Research Computing group at Harvard and will cover:
- Introduction to the R programming language
- Accessing and using the Odyssey High Performance Computing (HPC) cluster
- R statistical programming environment.
- Introduction to Harvard's Odyssey cluster
- Odyssey module list
I have created some introductory materials here that you may use (not required) to augment what you do during the lab. The "lecture slides" are meant to be used "interactively" and additional material is useful to get a sense of some of the basic capabilities of R.
- Install the R software.
- Consider installing RStudio as a convenient environment for accessing R
We will continue our discussion of biological principles and begin to delve more deeply into technologies that allow us to examine normal and disease biology at a molecular level. We will also begin to discuss data analysis approaches to high-throughput biological data analysis.
In this lab, we will introduce a particularly important biological application, gene expression quantification and analysis. One aspect of the final project will include RNA-seq data, so I will introduce the technology, some details about the data formats and primary data analysis.
Then, we will work with the Bioconductor project as an environment for biological data analysis. Finally, I will introduce a tutorial RNA-seq dataset that you will work through on your own (homework). Note that you DO NOT need to complete this homework in its entirety.
- See the RNA-seq exercises section.
This will be a combined lecture/lab. Topics covered include:
- Basics of parallel computing (1 hour)
- Introduction to final project (45 minutes)
- Begin work on final project (lab portion)
In this lab, we will finish up the actual "compute" of the final project and begin to interpret and visualize the results of our computation.
- Wrapup of final project, including unanswered and open questions
- Additional topics in Extreme Biocomputing
- Parting questions, comments
-
Genetics and gene regulation
-
Public resources
-
Genomic assays
-
Manuscripts of interest and classic papers
- The cancer genome
- Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring
- Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks
- Comprehensive genomic characterization defines human glioblastoma genes and core pathways
- Gene expression profiling predicts clinical outcome of breast cancer
- Genome-scale hypothesis scanning
- CloudForest
- FastQTL manuscript and software
- Bioconductor