Machine Learning: The K-means Clustering Algorithm | R Programming

K-means clustering is a unsupervised machine learning algorithm which solves the problem of classifying a set of data into two or more groups on basis of available parameters.

Applications of K-means Clustering Algorithm
1. Marketing Analytics: K-Means clustering is used by marketing department to segment their customers into different groups for various purposes like re-targeting and up-selling.
2. Inventory Categorisation: This algorithm is used to categorise inventory on the basis of multiple metrics.
3. Group Images: The same algorithm is used to classify images into groups on different platforms.

This algorithm is used by many other ways because of versatility and we will use this algorithm on planet analytics on a famous Iris Dataset to classify the data into four different species.

Download the Dataset

Download the Code

#Set working directory

# Read File
>Data <- read.csv("iris.csv")

#Explore Variables

#See the structure of the data

#Check the summary

#Check the top data points

#Visualize the species
>ggplot(Data, aes(Petal.Length, Petal.Width, color = Species)) + geom_point()
>ggplot(Data, aes(Sepal.Length, Sepal.Width, color = Species)) + geom_point()
>ggplot(Data, aes(Petal.Length, Sepal.Length, color = Species)) + geom_point()
>ggplot(Data, aes(Sepal.Width, Petal.Width, color = Species)) + geom_point()


#Train the model
>irisclusters <- kmeans(Data[ ,2:5], 3, nstart = 20)

#Check the accuracy
>table(irisclusters$cluster, Data$Species)

The algorithm predicted 50 Setosa into cluster 1, 48 into Versicolor into cluster 3 and it predicted Virginica into cluster 2 but it wrongly predicted 2 Versicolor into cluster 2 and 14 Virginica into cluster 3 which is meant for Versicolor species. 

Stay connected to Planet Analytics and write to us at