WhatsApp Chat Sentiment Analysis in R

Kernel SVM in Python







Problem Statement: Predict whether or not a passenger survived during Titanic Sinking

Download The Dataset


Download The Code File


Variables: PassengerID, Survived, Pclass, Name, Sex, Age, Fare

We are going to use two variables i.e. Pclass and sex of the titanic passsengers to predict whether they survived or not


Independent Variables : Pclass, Sex

Dependent Variable : Survived


# Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd



# Importing the dataset

dataset = pd.read_csv('titanic.csv')







# Separating the independent and dependent variables

X = dataset.iloc[:, [2, 4]].values

y = dataset.iloc[:, 1].values















# Encoding categorical data

from sklearn.preprocessing import LabelEncoder, OneHotEncoder

labelencoder_X = LabelEncoder()

X[:, 1] = labelencoder_X.fit_transform(X[:, 1])

onehotencoder = OneHotEncoder(categorical_features = [1])

X = onehotencoder.fit_transform(X).toarray()




We have encoded the variable "Sex" in X which had two categorical values i.e. male and female. Hence, we've got 2 different columns. Third column in the picture below is for the variable "Pclass".






# Splitting the dataset into the Training set and Test set

from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state = 0)













# Feature Scaling

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()

X_train = sc.fit_transform(X_train)

X_test = sc.transform(X_test)




We did feature scaling as we want to obtain an accurate prediction of whether a passenger survived the sinking of titanic or not.






# Fitting Kernel SVM to the Training set

from sklearn.svm import SVC

classifier = SVC(kernel = 'rbf', random_state = 0)

classifier.fit(X_train, y_train)




We created an object 'classifier' of class 'SVC' and fitted it into our training set. We used kernel = 'rbf' which represents Gaussian Kernel.



# Predicting the Test set results

y_pred = classifier.predict(X_test)








# Making the Confusion Matrix

from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, y_pred)



Confusion Matrix helps to know how good our model is predicting. In other words, we will assess how correctly our Logistic Regression Model has learned the correlations from the training set to make accurate predictions on the test set.






Here, the diagonal with 140 and 71 shows the correct predictions and the diagonal 29 and 28 shows the incorrect predictions.

So, 140 + 71 = 211 are the total number of correct predictions out of 268 instances (in y_test)

Hence, our model showed 78.7% accuracy.