Answer to Question #345892 in Statistics and Probability for Toilet Gamer

Question #345892

Q: Suppose a dataset has 8500 email collection. Among 8500 emails, 4000 emails are not-spam and remaining are spam emails. The word “dating” is used as a feature, whose frequency/count in spam emails are 310 and 106 in not-spam emails. You have to compute two probabilities using bayes theorem, only knowing it contains the word “dating”.

First: Probability of an email being spam? Second: Probability of an email being not spam?

Expert's answer

Let "S" denote the event "spam email".Let "D" denote the event "dating".

Given

"P(S^C)=\\dfrac{4000}{8500}=\\dfrac{8}{17}, P(S)=\\dfrac{9}{17}."

"P(D| S)=\\dfrac{106}{106+310}=\\dfrac{53}{208}, P(D| S^C)=\\dfrac{155}{208}"

"P(S|D)=\\dfrac{P(D|S)P(S)}{P(D|S)P(S)+P(D|S^C)P(S^C)}"

"=\\dfrac{\\dfrac{53}{208}(\\dfrac{9}{17})}{\\dfrac{53}{208}(\\dfrac{9}{17})+\\dfrac{155}{208}(\\dfrac{8}{17})}=0.2778"

"P(S^C|D)=\\dfrac{P(D|S^C)P(S^C)}{P(D|S)P(S)+P(D|S^C)P(S^C)}"

"=\\dfrac{\\dfrac{155}{208}(\\dfrac{8}{17})}{\\dfrac{53}{208}(\\dfrac{9}{17})+\\dfrac{155}{208}(\\dfrac{8}{17})}=0.7222"

Learn more about our help with Assignments: Statistics and Probability

Comments

No comments. Be the first!

Answer to Question #345892 in Statistics and Probability for Toilet Gamer

Comments

Leave a comment

Related Questions