A collection of my Machine Learning and Data Science Projects,
- including both Theoretical and Practical (Applied) ML,
- references (paper, ebook, repo, tool, etc), ranging from beginner to advanced.
Given the training data set
| Aspect\Model | Generative | Discriminative |
|---|---|---|
| Learn obj |
Joint probability |
Conditional probability |
| Formulation | class prior/conditional |
likelihood |
| Result | not direct (Bayes) |
direct classification |
| Examples | Naive Bayes, HMM | Logistic Reg, SVM, DNN |
Reference: Generative and Discriminative Model, Professor Andrew NG
- Learner
$L(X_I)=h \in \mathcal{H}$ - Input training data
$X_I$ , where$x_i \in \R$ - Hypothesis
$h_{\omega}: X \in R^n \rightarrow Y$ , with weights$\omega$ .- mapping attributes vectors
$X$ to labels/output$Y={y_1,...,y_n}$ - For NN,
$h(x)=f(\omega;x)$ , explicitly parameterized by$\omega$ - For Generative model
$f: Z \rightarrow X$ ,$Z$ is the latent variable
- mapping attributes vectors
- Input training data
| Output \ Type | Unsupervised | Supervised |
|---|---|---|
|
Continuous |
Clustering & Dim Reduction |
Regression |
| ○ SVD | ○ Linear / Polynomial | |
| ○ PCA | ○ Non-Linear Regression | |
| ○ K-means | ○ Decision Trees | |
| ○ GAN ○ VAE ○ Diffusion |
○ Random Forest | |
|
Discrete |
Association / Feature Analysis |
Classification |
| ○ Apriori | ○ Bayesian ○ SVM | |
| ○ FP-Growth | ○ Logistic Regression ○ Perceptron |
|
| ○ HMM | ○ kNN / Trees |
And more,
| Aspect \ Type | Semi-Supervised | Reinforcement |
|---|---|---|
| Learn from | Labels available | Rewards |
| Methods | pseudo-labels | ○ Q learning |
| iteratively | ○ Markov Decision Process |
Reinforcement Learning
- In a state each timestamp
- when an action is performed, we move to a new state and receive a reward
- No knowledge in advance of how actions affect either the new state or the reward
Goal
- Value-based V(s)
- the agent is expecting a long-term return of the current states under policy π
- Policy-based
- the action performed in every state helps you to gain maximum reward in the future
- Deterministic: For any state, the same action is produced by the policy π
- Stochastic: Every action has a certain probability
- Model-based
- create a virtual model for each environment
- the agent learns to perform in that specific environment
The bigger picture of learning with invariances and symmetries:
| Domain | Structure | Symmetry / Bias | Example |
|---|---|---|---|
| Images | 2D grid | Translation equivariant | CNNs |
| Sequences | 1D sequence | Order-aware | RNNs, Transformers |
| Sets / Point Clouds | Unordered set | Permutation invariant | Deep Sets, PointNet |
| Graphs | Nodes + edges | Permutation equivariant | GNNs, Graph Isomorphism Networks |
| Manifolds / Spheres | 2D surface embedded in 3D | Rotation equivariant | Spherical CNNs |
- Feature Selection
- After fitting, plot Residuals vs any Predictor Variable
- Linearly-dependent feature vectors
- Imputation
- Handling Outliers
- Removal, Replacing values, Capping, Discretization
- Encoding
- Integer Encoding
- One-Hot Encoding (enum -> binary)
- Scaling
- Normalization, min-max/ 0-1
- Standardization
| Aspect | Bayesianism | Frequentism |
|---|---|---|
| Interpretation of Probability | A measure of belief or uncertainty | The limit of relative frequencies in repeated experiments |
| Methods | Prior knowledge and updates beliefs (Bayes') to obtain posterior distributions |
Hypothesis testing, MLE, confidence intervals |
| Treatment of Uncertainty Random Variables |
Parameters | Data set |
| Handling of Data | useful when prior information is available or when the focus is on prediction intervals. |
often requires larger sample sizes |
| Flexibility | flexible model, allow updating models for new data |
more rigid, on specific statistical methods |
| Computational Complexity | can be intensive computation, for models with high-dim parameter spaces |
simpler computation and may be more straightforward in practice |
Applied ML Best Practice
- Initial test set + a single metric to improve
- Target performance
- Human-level performance, published results, previous baselines, etc.
- Results can be sensitive to small changes in hyperparameter and dataset makeup.
Tune hyperparameter
|
Start simple -> Implement & Debug -> Evaluate -> ?
|
Improve model & Data
- Start simple: simplest model & data possible (LeNet on a subset of the data)
- Implement & Debug: Once model runs, overfit a single batch & reproduce a know result
- Evaluate: Apply the bias-variance decomposition
- Tuning: Coarse-to-fine random search
- Improve model/data
- Make model bigger if underfit
- Add data or regularize if overfit
Machine Learning Real World Data, University of Cambridge IA
- Text Classification; Naive Bayes; Cross-Validation,
- HMM; Social Network
Theoretical Machine Learning with Problems Sets, Stanford CS229
- Linear classifiers (Logistic Regression, GDA), SVM, etc
- Stochastic Gradient Descent; L1 L2 Regularization
Deep Learning for Computer Vision with Problems Sets, Stanford CS231n
- Image Classification + Localization
$(x,y,w,h)$ [ Supervised Learning, Discrete label + Regression ]- kNN; Softmax; classifier SVM classifier; CNN
- Object Detection
- Semantic / Instance Segmentation
- Image Captioning
- RNN, Attention, Transformer
- Positional Encoding
- Video understanding
- Generative model (GAN, VAE)
- Self-Supervised Learning
- Data Science | Uni. of Cambridge, Undergraduate course.
- AI | Uni of Cambridge, IB
- Search, Game, CSPs, Knowledge representation and Reasoning, Planning, NN.
- Machine Learning and Bayesian Inference | Uni of Cambridge, Undergraduate course.
- Linear classifiers (SVM), Unsupervised learning (K-means,EM), Bayesian networks
- Geometric Deep Learning | Cambridge, Oxford Master's courses.
Generative Pre-trained Transformer (GPT) from Scratch (Andrej Karpathy)
Paper
- Numpy, matplotlib, pandas, TensorFlow
- Caffe, Keras
- XGBoost, gensim