This code is for Naive Bayes Spam Classification on the SMS Spam Collection Data Set from the UCI Machine Learning Repository ( http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection )
This particular version of Naive Bayes is based on the The Bernoulli document model classification principle. The Maximum A posteriori Parameter Estimation Technique was used to compute the word Probabilities. The Beta distribution with Beta(2,1) was used as a prior.
The Preproceesing part involved the following steps :
1)Removal of trailing spaces 2)Removal of Non Words 3)Removal of Stop Words 4)Lemmatization