Q: Suppose a dataset has 8500 email collection. Among 8500 emails, 4000 emails are not-spam and remaining are spam emails. The word “dating” is used as a feature, whose frequency/count in spam emails are 310 and 106 in not-spam emails. You have to compute two probabilities using bayes theorem, only knowing it contains the word “dating”.
First: Probability of an email being spam? Second: Probability of an email being not spam?
Let "S" denote the event "spam email".Let "D" denote the event "dating".
Given
a)
"=\\dfrac{\\dfrac{53}{208}(\\dfrac{9}{17})}{\\dfrac{53}{208}(\\dfrac{9}{17})+\\dfrac{155}{208}(\\dfrac{8}{17})}=0.2778"
b)
"=\\dfrac{\\dfrac{155}{208}(\\dfrac{8}{17})}{\\dfrac{53}{208}(\\dfrac{9}{17})+\\dfrac{155}{208}(\\dfrac{8}{17})}=0.7222"
Comments
Leave a comment