Answer to Question #345892 in Statistics and Probability for Toilet Gamer

Question #345892

Q: Suppose a dataset has 8500 email collection. Among 8500 emails, 4000 emails are not-spam and remaining are spam emails. The word “dating” is used as a feature, whose frequency/count in spam emails are 310 and 106 in not-spam emails. You have to compute two probabilities using bayes theorem, only knowing it contains the word “dating”.

 

First: Probability of an email being spam?        Second: Probability of an email being not spam?


1
Expert's answer
2022-05-31T12:34:06-0400

Let SS denote the event "spam email".Let DD denote the event "dating".

Given


P(SC)=40008500=817,P(S)=917.P(S^C)=\dfrac{4000}{8500}=\dfrac{8}{17}, P(S)=\dfrac{9}{17}.


P(DS)=106106+310=53208,P(DSC)=155208P(D| S)=\dfrac{106}{106+310}=\dfrac{53}{208}, P(D| S^C)=\dfrac{155}{208}

a)


P(SD)=P(DS)P(S)P(DS)P(S)+P(DSC)P(SC)P(S|D)=\dfrac{P(D|S)P(S)}{P(D|S)P(S)+P(D|S^C)P(S^C)}

=53208(917)53208(917)+155208(817)=0.2778=\dfrac{\dfrac{53}{208}(\dfrac{9}{17})}{\dfrac{53}{208}(\dfrac{9}{17})+\dfrac{155}{208}(\dfrac{8}{17})}=0.2778


b)


P(SCD)=P(DSC)P(SC)P(DS)P(S)+P(DSC)P(SC)P(S^C|D)=\dfrac{P(D|S^C)P(S^C)}{P(D|S)P(S)+P(D|S^C)P(S^C)}

=155208(817)53208(917)+155208(817)=0.7222=\dfrac{\dfrac{155}{208}(\dfrac{8}{17})}{\dfrac{53}{208}(\dfrac{9}{17})+\dfrac{155}{208}(\dfrac{8}{17})}=0.7222




Need a fast expert's response?

Submit order

and get a quick answer at the best price

for any assignment or question with DETAILED EXPLANATIONS!

Comments

No comments. Be the first!

Leave a comment