Naive Bayes classifiers are built on Bayesian classification methods. These rely on Bayes's theorem, which is an equation describing the relationship of conditional probabilities of statistical quantities. In Bayesian classification, we're interested in finding the probability of a label given some observed features, which we can write as "P(L \\backslash features)". Bayes's theorem tells us how to express this in terms of quantities we can compute more directly:
"P(L~|~features)=\\dfrac{P(features~|~L)~P(L)}{P(features)}"
If we are trying to decide between two labels—let's call them "L_1" and "L_2" — then one way to make this decision is to compute the ratio of the posterior probabilities for each label:
"\\dfrac{P(L_1~|~features)}{P(L_2~|~features)}=\\dfrac{P(features~|~L_1)P(L_1)}{P(features~|~L_2)P(L_2)}"
All we need now is some model by which we can compute "P(features~|~L_i)" for each label. Such a model is called a generative model because it specifies the hypothetical random process that generates the data. Specifying this generative model for each label is the main piece of the training of such a Bayesian classifier. The general version of such a training step is a very difficult task, but we can make it simpler through the use of some simplifying assumptions about the form of this model.
Comments
Leave a comment