Bernoulli-Document-Model_Based-Naive-Bayes-SMS-Spam-Classification
This code is for Naive Bayes Spam Classification on the SMS Spam Collection Data Set from the UCI Machine Learning Repository ( http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection )
This particular version of Naive Bayes is based on the The Bernoulli document model classification principle. The Maximum A posteriori Parameter Estimation Technique was used to compute the word Probabilities. The Beta distribution with Beta(2,1) was used as a prior.
The Preproceesing part involved the following steps :
1)Removal of trailing spaces
2)Removal of Non Words
3)Removal of Stop Words
4)Lemmatization