This README would normally document whatever steps are necessary to get your application up and running.
1. Main directory is R_SCRIPTS.
2. Main directory is EMSEMBLE for PYTHON CLASSIFIER ANALYTICS
3. Main directory is LINEAR_REGRESSION for PYTHON REGRESSION ANALYTICS
R analytics contain machine learning and predictive analytics in R
for various general areas. Of course, these have not yet been optimized
and thus are limited by your environment R's runtime limitations.
Some of these scripts are useful for educational purposes, so as to
illustrate, detailed inner-working graphics as how some of these
algorithms decision making and progress occurs over time. On most,
detailed error metrics are provided to facilitate automated
decision-making.
An agglomerative clustering (for demonstrative purposes only, please
see disclaimers). Perhaphs useful for those in bioinformatics, philogeny, etc
seeking to understand the evolution of the clustering arrangements
and those whose interest is primarely on the interdependencies and
arrangement of the branches as it automatically focuses on those (i.e.,
less overplotting as leaves are not plotted). Nevertheless, detailed
cluster assignment mappings are displayed during the agglomerative process.
Under development. Basic decision trees.
Basic multivariate linear regression with assumptions checking.
Basic recursive partitioning decision trees. Under development.
To be developed.
Implements a version of dbscan which instead uses transitive closure which
allows to do heuristics about the search space and adaptations to it.
For example, it allows to discover the value of EPSILON that fits
the data. Experimental as all in here as written in past 24 hrs.
Implements a diagnostics version of kmeans which provides feedback about
the convergence of kmeans within iterations given a value of K.
Experimental as all in here as written in past 24 hrs.
Implements plotting of adjacency matrices produced by the recommender system
using igraph. Currently based on adjacency non-sparse matrix representation
which does NOT scale well with dataset size for this application as
Implements basic diagnostics plots for the recommender system.
Implements an iterative convergence collaborative filtering and
recommendation system, tailored for the movielens dataset.
1. Collaborative filtering is done via iterative convergence between
Theta parameters and X-feature parameters.
2. Recomendations are done using euclidean (at this time) distances
wrt shortest-path neighbors at one and two degree of separations.
Performs gradient descent, stochastic gradient descent, fminunc, and
normal equations with or without regularization over numerical datasets.
Implements by wraping distance computations after various
transformations: pca, probability, and scaling transforms for
numerical and/or categorical datasets.
Implements anomaly detection over a numerical dataset wrt to
1. Gaussian univariate (independent features)
2. Gaussian multivariate (otherwise)
Implements simpler/selected t_tests statistical tests procedures with
1. iterative or not wrappers
2. over full or subsampled datasets.
Performs heuristic optimization via grid search for Market Basket Analysis
to identify the highest confidence/support RHS for the specified LHS.
Generates and load datasets into expected format for the analytics.
Wraps up some selected fSelect.R feature selection algorithms for
numerical and categorical datasets on classification and/or regression
problems
Wraps ups various common utilities used by various of these modules.
being developed.
being developed.
not yet developed. will be a database wrapper for analyzing
datasets with or without database aid.
wraps ups visualization scripts, some reusing and/or adapting
plotting code available on the web, all with the url-ref/citations
to the original site.
not yet developed. Instead, for learning curves, see
stochastic_gradient_descent.R
deprecated.
GNU license
not yet developed. provides wrapper to exception processing
The goal is to provide access to some quickly developed code-samples I put over a few days so as to facilitate discussion.
-
Quick summary * TO CASUAL VISITORS:
-
Please do not branch YET from this codebase as the code is CURRENTLY way too preliminary; it's is just a matter of a few days old (Sep/26/2014); i.e., version 0.00b.
-
However, you are welcome to BROWSE at this time the codebase. If you find it or a part useful and decide to recycle it, please follow accordance to the provided GNU license along with an URL reference to the original [codebase] (https://bitbucket.org/nelsonmanohar/machinelearning)
-
-
Version: * Again, just to be clear: version 0.001b.
-
[Learn Markdown] * (https://bitbucket.org/tutorials/markdowndemo)
- Summary of set up
- Configuration
- Dependencies
- Database configuration
- How to run tests
- Deployment instructions
- Writing tests
- Code review
- Other guidelines
- Repo owner or admin
- Other community or team contact