WhatsApp Chat Sentiment Analysis in R

Machine Learning: K Nearest Neighbour Using R programming


K nearest neighbour is a machine learning algorithm which helps in recognising pattern of a data for classification and regression and predict the classification of new data point or set. It is also termed as lazy algorithm or kNN algorithm.

For an instance lets say their is a group of cats and dogs in room and we need to identify which one is a dog or cat, kNN will take all the parameters and classify the output into two I.e. Dog and Cat. Now with the same parameters used for the classification we can identify any cat and dog using the same model.

Applications of kNN algorithm
1. Credit rating: kNN is Mainly used by banks to calculate the credit score of an individual.
2. Segmentation: Marketers use kNN for the purpose of segmentation and targeting.
3. Disease diagnosis: Doctors and hospitals use kNN to identify various diseases in a person based of past data.

On Planet Analytics we will try to build a model to predict the outcome using kNN Algorithm.

Download the Dataset

Dataset
Available dataset has 32 variables and it will be used to identify breast cancer which is represented by the variable named ## which has two categories, B and M where B stands for ## and M stands for ##


Follow the instructions and build your model using the same dataset. 
#Set working directory
>setwd("/Users/Planet Analytics/Documents/RDirectory")

# Read File
>Data <- read.csv("cancer.csv")

#Explore Variables
>names(Data)

#See the structure of the data 
>str(Data)

#Check the summary
>summary(Data)


#Normalising all the variables
>normal <- function(x){
  return((x-min(x))/(max(x)-min(x)))
} #creating a function

>DataNormal <- as.data.frame(lapply(Data[2:31], normal))
>DataNormal$diagnosis <- Data$diagnosis

>summary(DataNormal) #check the outcome

# Creating Test and Train dataset
>library(caTools)
>set.seed(123)

>spliter <-  sample.split(DataNormal, 0.9)

>Train <- subset(DataNormal, spliter == TRUE)
>Test <- subset(DataNormal, spliter == FALSE)

>classytrain <- Train$diagnosis
>classyTest <- Test$diagnosis

#Training model
>install.packages("class")
>library(class)

#Predicting the diagnosis of test set using our model
>TestPred <- knn(train = Train[,-31], test = Test[,-31], cl = Train$diagnosis, k = 21)

#Calculating accuracy using crosstab
>table(TestPred, Test$diagnosis)

Subscribe to Planet Analytics or write to us at info@planetanalytics.in