Naïve Bayes Classifier

What is Naïve Bayes Classifier?

This NBC works on the principle of conditional probability as given by the Bayes theorem.

Source: https://www.simplilearn.com/tutorials/machine-learning-tutorial/naive-bayes-classifier

Where is the Naive Bayes classifier used?

Face Recognition- This classifier is used to identify facial features like eyes, nose & mouth.

Weather Prediction- It can predict whether the weather will be good or bad.

Medical Diagnosis- The classifier provides some information to the doctors that can be used to

diagnose the patients. Healthcare staff can use the Naive Bayes classifier to indicate whether the

patient is at high risk for certain diseases and conditions.

News Classification- After matching the data, the naive Bayes algorithm is applied to the test

dataset, the probability of every word is verified, and an approximate percentage value is

calculated, determining the accuracy of the fake news. As a result, whether a specific piece of

information is fake or real.

Algorithm:

The probability of an occurrence is calculated by the Naive Bayes classifier in the following

steps:

Step 1: Determine the prior probability for each of the supplied class labels.

Step 2: Using each attribute, compute the Likelihood Probability for each class.

Step 3: Enter these values into the Bayes Formula and compute the posterior probability.

Step 4: Determine whether class has a higher likelihood if the input belongs to the higher

probability class.

Smoothing: Smoothing's goal is to provide a general sense of relatively modest changes in value

with less emphasis on close matching of data values, whereas curve fitting focuses on attaining

as near a match as feasible.

Getting to the code:

Pre-processing the data

Splitting the data into three sets

Count the words and returning a tuple of counted words and letters

Generating vocabulary

Conditional Probability based on the category

By using test dataset without smoothing

Displaying the vocabulary

Word probability in documents

Output of word probability in documents

Before Smoothing

Output of test data before smoothing

Top 10 words of each category after Smoothing

Conditional probability with smoothing 0.0001

Output of the test data after smoothing value

Final Accuracy has been increased by 2% with smoothing value 0.0001.

Contributions:

1.We have removed all the stop words along with some text pre-processing techniques have been

used.

2.I have used smoothing parameters which helped to tackle the problem of zero probability in the

Naïve bayes machine learning algorithm.

Challenges:

1.we limited the dataset to 1000 records per category.

2.As the naïve bayes is little difficult to implement from the scratch without using any libraries,

learnt on the go and implemented it.

3.Calculating the conditional probability is bit challenging but the built-in word dictionaries

assisted me in overcoming this problem.

References:

https://www.kaggle.com/datasets/gaveshjain/ford-sentence-classifiaction-

dataset?resource=download

https://www.simplilearn.com/tutorials/machine-learning-tutorial/naive-bayes-classifier

https://en.wikipedia.org/wiki/Smoothing