This repository is a collection of implementation of algorithms introduced in the book
Bioinformatics Algorithms - An Active Learning Approachby Phillip Compeau and Pavel Pevzner and the Coursera specialization Bioinformatics.
The implementation is done with Python. Each algorithm, when first introduced, is implemented in a separate Python notebook, and tested on a variety of dataset from Rosalind, the complementary website. Additionally, my account on Rosalind is ngthu003.
Chapter 1: Where in the Genome Does DNA Replication Begin, Algorithmic Warmup
- 1A. Pattern Count Problem ✔️
- 1B. Frequent Words Problem ✔️
- 1C. Reverse Complement Problem ✔️
- 1D. Pattern Matching Problem ✔️
- 1E. Clump Finding Problem ✔️
- 1F. Minimum Skew Problem ✔️
- 1G. Hamming Distance Problem ✔️
- 1H. Approximate Pattern Matching Problem ✔️
- 1I. Frequent Words with Mismatches Problem ✔️
- 1J. Frequent Words with Mismatches and Reverse Complements Problem ✔️
- 1K. Computing a Frequency Array ✔️
- 1L. Implement PatternToNumber ✔️
- 1M. Implement NumberToPattern ✔️
- 1N. Generate the d-Neighborhood of a String ✔️
Chapter 2: Which DNA Patterns Play the Role of Molecular Clocks?, Randomized Algorithms
- 2A. Implement MotifEnumeration
- 2B. Find a Median String
- 2C. Find a Profile-most Probable k-mer in a String
- 2D. Implement GreedyMotifSearch
- 2E. Implement GreedyMotifSearch with Pseudocounts
- 2F. Implement RandomizedMotifSearch
- 2G. Implement GibbsSampler
- 2H. Implement DistaneBetweenPatternAndStrings