WhatsApp Chat Sentiment Analysis in R

Machine Learning Algorithms

Machine Learning Algorithms

Machine Learning Algorithms defined as the algorithms (set of rules) which helps the system to learn. Learning does not mean the actual way of learning but also a matter of finding statistical regularities or other patterns in data. Learning algorithms can give insight into the relative difficulty of learning in different environments.

There are three techniques in Machine Learning Algorithms and they are:

1) Supervised Learning

2) Unsupervised Learning

3) Semi-supervised Learning

These THREE techniques follow mainly SIX types of Machine Learning Algorithms and they are:

1) Linear Regression

2) Decision tree

3) SVM

4) Naïve Bayes

5) KNN

6) K-Means

1) Linear Regression:

In Linear Regression, a relationship will be established in between dependent and independent variables by fitting them to a line, this line called the regression line.

Represented by a linear equation Y= a * X + b.

Here, Y – Dependent Variable, a – Slope, X – Independent variable, b – Intercept

The coefficients a & b are derived by minimizing the sum of the squared difference of distance between data points and the regression line.

Linear Regression can be classified into two types and they are:

a) Simple Linear Regression

b) Multiple Linear Regression

Simple Linear Regression: Characterized by one independent variable

Multiple Linear Regression: Characterized by multiple independent variables

2) Decision Tree:

Decision tree learning is a method for approximating discrete-valued target functions, in which the learned function is represented by a decision tree. The decision tree is constructed in a top-down recursive divide and conquer manner. Decision trees are constructed by sorting them down the tree from the root to some leaf node, which provides the classification of the instance. Each node in the tree specifies a test of some attribute of the instance, and each branch descending from that node corresponds to one of the possible values of that attribute. An instance is classified by starting at the root node of the tree, testing the attribute specified by that node, then moving down the tree branch corresponding to the value of the attribute. This process repeats for the subtree rooted at the new node.

3) SVM:

A Support Vector Machine (SVM) is an algorithm that works on using nonlinear mapping to transform original data into a higher dimension and the data considered can be of both linear and nonlinear data. Classifying data plays a prominent role in machine learning. The theme of support vector machine is to create a hyperplane in between data sets to indicate which class it (data objects) belongs to as per their class labels. SVM can also be used to learn a variety of representations, such as neural nets, splines, polynomial estimators, etc. Developing SVM is entirely different from normal algorithms used for learning.

4) Naïve Bayes:

Naive Bayes classifiers are a collection of classification algorithms based on Bayes’ Theorem. It is family of algorithms where all of them share a common principle such that every pair of features being classified is independent of each other.

Basic representation of the Bayes theorem:

P (A | B) = ( P (B | A) * P (A) ) / P (B)

Here, A and B are two events.

P(A|B): the conditional probability that event A occurs, given that B has occurred; called as the posterior probability.

P(A) and P(B): the probability of A and B.

P(B|A): the conditional probability that event B occurs, given that A has occurred.

5) KNN:

K Nearest Neighbour is one of the most popular supervised learning algorithms.

This aims to categorize the points whose class is unknown given their respective distances to points in a learning set. The NN method is a special case of KNN where k has the value equal to one.

6) K - Means:

K – Means is used for clustering method. Each cluster is associated with the centroid. Each point is assigned to the cluster with the closest centroid. A number of clusters have ‘K’ initial centroids. The data point to the cluster center whose distance from the cluster center is minimum of all the cluster centers belongs to that cluster.