The City of South Lake Tahoe, CA has an Asian population of 1419 people, out of a total population of 23,609 (Source: U.S. Census Bureau, Census 2000). Suppose that a survey of 1419 self-reported Asians in the Manhattan, NY, area yielded the data in the table below.
Conduct a goodness of fit test to determine if the self-reported sub-groups of Asians in the Manhattan area fit that of the Lake Tahoe area.
Race Lake Tahoe Frequency Manhattan
Frequency
Asian Indian 131 174
Chinese 181 557
Filipino 1045 518
Japanese 80 54
Korean 12 29
Vietnamese 9 21
Other 24 26
In this question, we are going to perform a Chi-square goodness of fit test(Homogeneity). The reason is because if the self-reported sub-groups of Asians in the Manhattan area fit that of the Lake Tahoe area then, they have the same distribution.
The hypotheses tested are,
"H_0:" self-reported sub-groups of Asians in the Manhattan area fit that of the Lake Tahoe area
"Against"
"H_1:" self-reported sub-groups of Asians in the Manhattan area do not fit that of the Lake Tahoe area
We first determine the expected count for each cell using the formula below,
"E_{ij}=(r_i*c_j)\/n, \\space i=1,2,3,....,7\\space \\& \\space j=1,2", where "r_i" is the corresponding row total for each cell and "c_j" is the corresponding column total for each cell. "n=2901" is the sample size(total number of Asians in both regions).
The expected counts are as follows,
"E_{11}=(r_1*c_1)\/n=(1482*305)\/2901=155.81"
"E_{12}=(r_1*c_2)\/n=(1419*305)\/2901=149.19"
"E_{21}=(r_2*c_1)\/n=(1482*738)\/2901=377.01"
"E_{22}=(r_2*c_2)\/n=(1419*738)\/2901=360.99"
"E_{31}=(r_3*c_1)\/n=(1482*1563)\/2901=798.47"
"E_{32}=(r_3*c_2)\/n=(1419*1563)\/2901=764.53"
"E_{41}=(r_4*c_1)\/n=(1482*134)\/2901=68.46"
"E_{42}=(r_4*c_2)\/n=(1419*134)\/2901=65.54"
"E_{51}=(r_5*c_1)\/n=(1482*41)\/2901=20.95"
"E_{52}=(r_5*c_2)\/n=(1419*41)\/2901=20.05"
"E_{61}=(r_6*c_1)\/n=(1482*30)\/2901=15.33"
"E_{62}=(r_6*c_2)\/n=(1419*30)\/2901=14.67"
"E_{71}=(r_7*c_1)\/n=(1482*90)\/2901=45.98"
"E_{72}=(r_7*c_2)\/n=(1419*90)\/2901=44.02"
Next is to determine the test statistic given as,
"\\chi^2_c=\\displaystyle\\sum^7_{i=1}\\displaystyle\\sum^2_{j=1}(O_{ij}-E_{ij})^2\/E_{ij}"
Now,
"\\chi^2_c=(131-155.81)^2\/155.81+(174-149.19)^2\/149.19+(181-377.01)^2\/377.01+(557-360.99)^2\/360.99+(1045-798.47)^2\/798.47+(518-764.53)^2\/764.53+(80-68.46)^2\/68.46+(54-65.54)^2\/65.54+(12-20.95)^2\/20.95+(29-20.05)^2\/20.05+(9-15.33)^2\/15.33+(21-14.67)^2\/14.67+(24-45.98)^2\/45.98+(66-44.02)^2\/44.02=410.6408(4dp)"
"\\chi^2_c" is compared with the table value at "\\alpha" level of significance with "(r-1)*(c-1)=(7-1)*(2-1)=6*1=6" degrees of freedom.
The table value is "\\chi^2_{\\alpha=0.05,6}=12.5916" and the null hypothesis is rejected if, "\\chi^2_c\\gt\\chi^2_{0.05,6}."
Since "\\chi^2_c=410.6408\\gt\\chi^2_{0.05,6}=12.5916," we reject the null hypothesis and conclude that there is no sufficient evidence to show that the self-reported sub-groups of Asians in the Manhattan area fit that of the Lake Tahoe area at 5% level of significance.
Comments
Leave a comment