Spam Filtering with Bayes


Title: Spam Filtering with Bayes


Reputability statement: This explanation of Bayesian filtering was written by Ernie Croot, a professor from the School of Mathematics at Georgia Institute of Technology. Professor Croot has proven highly knowledgeable and reputable in the field of mathematics (see

Comments: Bayesian spam filtering uses Bayes’ Theorem to determine the probability that any given email is spam. Given an email (yet to be classified as spam/non-spam) and a list of words that appear frequently in spam emails, Bayesian spam filtering calculates the individual probabilities of the email containing each suspicious word. In order to calculate this probability, the program must know the probability that a spam email contains the suspicious word, the probability that the email is spam, and the probability that a non-spam email contains the suspicious word, all of which can be calculated given a database of spam/non-spam emails. The probabilities of the email containing each suspicious word is then combined into a general probability of the email being spam.
    Bayesian spam filtering proves to be a clever and practical application of Bayes’ Theorem. While older methods of spam filtering often classified non-spam emails as spam due to the appearance of some suspicious word, Bayes’ Theorem allows for more effective filtering because the context of the word appearing in the email is more thoroughly considered. As Nate Silver describes in The Signal and The Noise, the beauty of Bayes’ Theorem comes from the large amount of context and evidence taken into account when determining probabilities, and in the world of email and technology, there is plenty of data to be used.