Bernoulli-Document-Model_Based-Naive-Bayes-SMS-Spam-Classification

This code is for Naive Bayes Spam Classification on the SMS Spam Collection Data Set from the UCI Machine Learning Repository ( http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection )

This particular version of Naive Bayes is based on the The Bernoulli document model classification principle. The Maximum A posteriori Parameter Estimation Technique was used to compute the word Probabilities. The Beta distribution with Beta(2,1) was used as a prior.

The Preproceesing part involved the following steps :

1)Removal of trailing spaces

2)Removal of Non Words

3)Removal of Stop Words

4)Lemmatization