Naïve Bayes Classifier
What is Naïve Bayes Classifier?
This NBC works on the principle of conditional probability as given by the Bayes theorem.
Source: https://www.simplilearn.com/tutorials/machine-learning-tutorial/naive-bayes-classifier
Where is the Naive Bayes classifier used?
Face Recognition- This classifier is used to identify facial features like eyes, nose & mouth.
Weather Prediction- It can predict whether the weather will be good or bad.
Medical Diagnosis- The classifier provides some information to the doctors that can be used to
diagnose the patients. Healthcare staff can use the Naive Bayes classifier to indicate whether the
patient is at high risk for certain diseases and conditions.
News Classification- After matching the data, the naive Bayes algorithm is applied to the test
dataset, the probability of every word is verified, and an approximate percentage value is
calculated, determining the accuracy of the fake news. As a result, whether a specific piece of
information is fake or real.
Algorithm:
The probability of an occurrence is calculated by the Naive Bayes classifier in the following
steps:
Step 1: Determine the prior probability for each of the supplied class labels.
Step 2: Using each attribute, compute the Likelihood Probability for each class.
Step 3: Enter these values into the Bayes Formula and compute the posterior probability.
Step 4: Determine whether class has a higher likelihood if the input belongs to the higher
probability class.
Smoothing: Smoothing's goal is to provide a general sense of relatively modest changes in value
with less emphasis on close matching of data values, whereas curve fitting focuses on attaining
as near a match as feasible.
Getting to the code:
Pre-processing the data
Splitting the data into three sets
Count the words and returning a tuple of counted words and letters
Generating vocabulary
Conditional Probability based on the category
By using test dataset without smoothing
Displaying the vocabulary
Word probability in documents
Output of word probability in documents
Before Smoothing
Output of test data before smoothing
Top 10 words of each category after Smoothing
Conditional probability with smoothing 0.0001
Output of the test data after smoothing value
Final Accuracy has been increased by 2% with smoothing value 0.0001.
Contributions:
1.We have removed all the stop words along with some text pre-processing techniques have been
used.
2.I have used smoothing parameters which helped to tackle the problem of zero probability in the
Naïve bayes machine learning algorithm.
Challenges:
1.we limited the dataset to 1000 records per category.
2.As the naïve bayes is little difficult to implement from the scratch without using any libraries,
learnt on the go and implemented it.
3.Calculating the conditional probability is bit challenging but the built-in word dictionaries
assisted me in overcoming this problem.
References:
https://www.kaggle.com/datasets/gaveshjain/ford-sentence-classifiaction-
dataset?resource=download
https://www.simplilearn.com/tutorials/machine-learning-tutorial/naive-bayes-classifier
https://en.wikipedia.org/wiki/Smoothing