Question #115944
A computer manager needs to know how efficiency of her new computer program depends
on the size of incoming data. Efficiency will be measured by the number of processed
requests per hour. Applying the program to data sets of different sizes, she obtains the
following results,
Data size (gigabytes) 6 7 7 8 10 10 15
Processed requests 40 55 50 41 17 26 16
i. Draw the scatterplot for the data. Be sure to label your axes.
ii. Is there any correlation between the processing request and the size of incoming data?
What is the correlation coefficient?
iii. By what percentage is the processing time dependent on the size of incoming data?
1
Expert's answer
2020-05-19T19:09:14-0400
Data science megabytes,xProcessed requests,y640755750841101710261516\begin{matrix} Data\ science\ megabytes, x & Processed\ requests, y \\ 6 & 40 \\ 7 & 55 \\ 7 & 50\\ 8 & 41\\ 10 & 17\\ 10 & 26\\ 15 & 16 \end{matrix}

The response variable here is the number of processed requests (y),(y), and we attempt to predict it from the size of a data set (x).(x).  


xyx2xyy2640362401600755493853025750493502500841643281681101710017028910261002606761516225240256Sum=63245623197310027\def\arraystretch{1.5} \begin{array}{c:c:c} & x & y & x^2 & xy & y^2 \\ \hline & 6 & 40 & 36 & 240 & 1600 \\ & 7 & 55 & 49 & 385 & 3025 \\ & 7 & 50 & 49 & 350 & 2500 \\ & 8 & 41 & 64 & 328 & 1681 \\ & 10 & 17 & 100 & 170 & 289 \\ & 10 & 26 & 100 & 260 & 676 \\ & 15 & 16 & 225 & 240 & 256 \\ Sum =& 63 & 245 & 623 & 1973 & 10027 \end{array}


xˉ=ixin=637=9, yˉ=iyin=2457=35\bar{x}={\sum_i x_i\over n}={63\over 7}=9,\ \bar{y}={\sum_i y_i\over n}={245\over 7}=35

Sxx=ixi2nxˉ2=6237(9)2=56S_{xx}=\sum_i x_i^2-n\cdot\bar{x}^2=623-7\cdot(9)^2=56

Sxy=ixiyinxˉyˉ=19737(9)(35)=232S_{xy}=\sum_i x_iy_i-n\cdot\bar{x}\bar{y}=1973-7\cdot(9)(35)=-232

Syy=iyi2nyˉ2=100277(35)2=1452S_{yy}=\sum_i y_i^2-n\cdot\bar{y}^2=10027-7\cdot(35)^2=1452

Therefore, based on the above calculations, the regression coefficients (the slope m,m, and the yy- intercept nn) are obtained as follows:


m=SxySxx=23256=2974.142857m={S_{xy}\over S_{xx}}={-232\over 56}=-{29\over 7}\approx-4.142857

n=yˉmxˉ=35(297)(9)=506772.285714n=\bar{y}-m\bar{x}=35-(-{29\over 7})(9)={506\over 7}\approx72.285714

Therefore, we find that the regression equation is:


Y=72.2857144.142857XY=72.285714-4.142857X


ii. Is there any correlation between the processing request and the size of incoming data?

What is the correlation coefficient?

Correlation cofficient


r=SxySxxSyy=2325614520.8136r={S_{xy}\over \sqrt{S_{xx}}\sqrt{S_{yy}}}={-232\over \sqrt{56}\sqrt{1452}}\approx-0.8136

Strong correlation


iii. By what percentage is the processing time dependent on the size of incoming data?

The coefficient of determination


r2=(0.8136)2=0.6619r^2=(-0.8136)^2=0.6619

66.19 %66.19\ \%

The proportion of Y variance explained by the linear relationship between X and Y is 66.19 %.66.19\ \%.




Need a fast expert's response?

Submit order

and get a quick answer at the best price

for any assignment or question with DETAILED EXPLANATIONS!

Comments

No comments. Be the first!
LATEST TUTORIALS
APPROVED BY CLIENTS