= Machine Learning =

== K-means clustering ==
 * https://en.wikipedia.org/wiki/K-means_clustering
It aims to partition n observations into k cluster. It's an unsupervised k-means algorithm

 * PSPP contains k-means, The QUICK CLUSTER command performs k-means clustering on the dataset.
 * Weka contains k-means and x-means.
 * Octave contains k-means.
 * OpenCV contains a k-means implementation.
 * Spark MLlib implements a distributed k-means algorithm.

== K-NN classifier ==
 * https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
k-nearest neighbors algorithm allows classification and regression

A confusion matrix or "matching matrix" is often used as a tool to validate the accuracy of k-NN classification.
 
 * https://en.wikipedia.org/wiki/Confusion_matrix

== Decision trees ==
 * https://en.wikipedia.org/wiki/Decision_tree_learning
Createa a model that predicts the value of a target variable based on several input variables. 
Classification tree outcome is the class (discrete) to which the data belongs.
Regression tree outcome can be considered a real number

Notable decision tree algorithms include:
 * ID3 (Iterative Dichotomiser 3)
 * C4.5 (successor of ID3)
 * CART (Classification And Regression Tree)
 * Chi-square automatic interaction detection (CHAID)
 * MARS

=== ID3 ===
 * https://en.wikipedia.org/wiki/ID3_algorithm
Algorithm invented by Ross Quinlan[1] used to generate a decision tree from a dataset.

== Naive Bayes classifier ==
 * https://en.wikipedia.org/wiki/Naive_Bayes_classifier

Document classification
Here is a worked example of naive Bayesian classification to the document classification problem. Consider the problem of classifying documents by their content, for example into spam and non-spam e-mails.

== Apriori algorithm ==
https://en.wikipedia.org/wiki/Apriori_algorithm
association rule learning  
market basket analysis

== Libraries/frameworks ==
 * scikit-learn
 * R (an open-source software environment for statistical computing, which includes several CART implementations such as rpart, party and randomForest packages),
 * Weka (a free and open-source data-mining suite, contains many decision tree algorithms),
 * Orange
 * KNIME
 * OpenCV

=== w3schools python ML ===
 * https://www.w3schools.com/python/python_ml_getting_started.asp

  * matplotlib.pyplot.scatter
  * matplotlib.pyplot.hist 
  * numpy.mean
  * numpy.median
  * numpy.std
  * numpy.var
  * numpy.percentile
  * numpy.random.uniform
  * numpy.random.normal
  * numpy.poly1d 
  * numpy.polyfit
  * pandas.read_csv
  * scipy.stats.mode
  * scipy.stats.linregress
  * scipy.cluster.hierarchy.dendrogram
  * scipy.cluster.hierarchy.linkage
  * sklearn.metrics.r2_score
  * sklearn.linear_model
  * sklearn.preprocessing.StandardScaler
  * sklearn.tree
  * sklearn.tree.DecisionTreeClassifier
  * sklearn.metrics.confusion_matrix 
  * sklearn.metrics.accuracy_score
  * sklearn.metrics.precision_score
  * sklearn.metrics.recall_score
  * sklearn.metrics.f1_score
  * sklearn.cluster.AgglomerativeClustering  
  * sklearn.linear_model.LogisticRegression
  * sklearn.cluster.KMeans
  * sklearn.neighbors.KNeighborsClassifier