WhatsApp Chat Sentiment Analysis in R




The world is moving towards a fully digitalized economy at an incredible pace and as a result, a ginormous amount of data is being produced by the internet, social media, smartphones, tech equipment and many other sources each day which has led to the evolution of Big Data management and analytics. Sentiment analysis is one such tool and the most popular branch of textual analytics which with the help of statistics and natural language processing examine and classify the unorganized textual data into various sentiments. It is also known as opinion mining as it largely focuses on the opinion and attitude of the people through analyzing their texts.




Branches of Textual Analysis


More than 34 billion texts are exchanged over the WhatsApp every day and just imagine if we could analyze and get valuable insights from this data and leverage it to not only take better real-time decisions but also add value to the stakeholders at much lower cost and time and hence align our operational efficiency with organizational strategy. In this article, we’ll leverage the power of sentiment analysis to investigate the WhatsApp chat using R, visualize and interpret the results at the same time.
Applications of WhatsApp Chat Analysis


WhatsApp is most popular chat app with monthly active users of more than 700 million. The popularity of this app has made it a necessary app among smartphone users and even businesses and organizations use WhatsApp for daily communication in groups and across departments. Corporations get a huge amount of textual data from WhatsApp and they can leverage WhatsApp chat sentiment analysis to gain better insights about their employees and try to avoid unforeseen conflicts due to various redundancies and inefficiency of business processes.


How it’s done


Firstly, we need to select and export a chat from WhatsApp to our system which is an easy task and can be done either by phone or WhatsApp for the desktop. After this, the process is fairly simple and has been explained with all the coding details needed to analyze the texts. I am going to analyze my girlfriend’s chat during the process and for security reasons, we’ll keep her name anonymous and refer to her as “e18682”.










# Read and load text file in R

>library(readtext) #Load Required package >setwd("/Users/Desktop/RDirectory")

>TextData <- readtext("chat.txt")


>TextData <- as.data.frame(TextData)

# Remove punctuation, Numbers, special characters and other unwanted things and stem all the words.


>library(tm)


>mystopwords <- c("manish", "e18682","pm","am", "<", ">", stopwords("en")) #Define all the words which are not required

>CleanData <- tolower(TextData$text) #Turn the data into lower case

>CleanData <- removeWords(CleanData, mystopwords)


>CleanData <- removePunctuation(CleanData)


>CleanData <- removeNumbers(CleanData)

>CleanData <- stemmer(CleanData, rm.bracket = TRUE)

# Make a word-cloud with according to the frequency of the word used

>library(wordcloud)

>library(qdap)

>TextFrequency <- freq_terms(CleanData, at.least = 1)


>wordcloud(TextFrequency$WORD, TextFrequency$FREQ, colors = TextFrequency$FREQ, max.words = 200)


If we look at the wordcloud its clearly visible that hmm and okay are most frequently used words whereas hehe, babe, know and call are used repeatedly followed by other words based on their frequency of use.

# Sentiment Analysis 

>library(syuzhet)


>Sentiments <- get_nrc_sentiment(TextFrequency$WORD)


>Sentiments <- cbind("Words" = TextFrequency$WORD, Sentiments)


>SentimentsScore <- data.frame("Score" = colSums(Sentiments[2:11]))


>TotalSentiments <- cbind("Sentiments" = rownames(SentimentsScore), SentimentsScore)

>rownames(TotalSentiments) <- NULL

# Visualisation of the sentiments extracted from the texts

>library(ggplot2)


>ggplot(data = TotalSentiments, aes(x = Sentiments, y = Score)) + geom_bar(stat = "identity", aes(fill = Sentiments))





We can easily infer from the bar plot that the chat had a maximum number of positive sentiments followed by negative as second and anticipation at third.
Issues with Sentiments and Analytics


Though Sentiment analysis has been one of the most popular textual analysis tools among businesses, scholars and analysts to take decisions and for research purposes Sentiment analysis has its own limitations as language is very complex and the meaning of each and every word changes with time and from person to person. Also, the accuracy of the analysis can’t be accurately measured and compared with how human beings analyze emotions.
The problem can be classified into three main factors:


1. Sarcasm: It is a popular form of mockery to ridicule or convey insult. Analytics fails to recognize these forms of emotions and might prove to be ineffective in such cases. Though the efforts are being made to cater to this problem through the extensive use of machine learning and artificial intelligence and we might see an improved version of sentiment analysis in near future.

“I am so proud of your stupidity, you make me feel good about myself.”

2. Multiple Meanings: A word could have many meanings and it may represent multiple emotions as we move from one geography to another or even one person to another. Many English words in the UK may mean different in American English. For ex: “I think you’ve been playing horribly dope.”

3. Dependency: Sentiment analysis largely depends on the predefined words and their individual score. Which leads to many problems like ambiguity in the context of the sentence. A sentence which includes ‘good’ might not have any emotions attached to it but will be shown as positive by the analysis.

Despite its limitations Sentiment Analysis is extremely popular and widely used analytical tool in business intelligence for social media monitoring, brand health examination, effects of ad campaigns or new product launch and various research purposes. It is frequently applied to Twitter data and Customer reviews by marketers and customer service teams to identify the feelings of consumers. Sentiment analysis has also started to gain popularity in areas like psychology, political science and other alike fields where textual data is obtained and explored from books, transcripts, and reports.


For any query write to us at info@planetanalytics.in

References

Lexalyticscom. (2017). Lexalyticscom. Retrieved 19 August 2017, fromhttps://www.lexalytics.com/technology/sentiment

K. BANNISTER. (2015, 26 January 2015). Sentiment Analysis: How Does It Work? Why Should We Use It?. [Weblog]. Retrieved 19 August 2017, from https://www.brandwatch.com/blog/understanding-sentiment-analysis/

Brnrdme. (2017, no-date). ON SOCIAL SENTIMENT AND SENTIMENT ANALYSIS. [Weblog]. Retrieved 19 August 2017, from http://brnrd.me/social-sentiment-sentiment- analysis/