Solveeit Logo

Question

Question: How do you write the equation of the regression line for the following set of data and find the corr...

How do you write the equation of the regression line for the following set of data and find the correlation coefficient?
The table shows the number of turtles hatched at a zoo each year since 20022002

Year2003200320042004200520052006200620072007
Turtles hatched21211717161616161414
Explanation

Solution

Here we will find the equation of the regression line by using regression formula and then substituted the value in the equation of the regression line and also find the correlation coefficient of the given data by using correlation coefficient formula.

Formula used:
For finding Regression line m=nxy(x)(y)nx2(x)2m = \dfrac{{n\sum {xy - \left( {\sum x } \right)\left( {\sum y } \right)} }}{{n\sum {{x^2}} - {{\left( {\sum x } \right)}^2}}} and regression equation y^=mx+b\hat y = mx + b where
b=yˉmxˉb = \bar y - m\bar x , yˉ=yn\bar y = \dfrac{{\sum y }}{n} and xˉ=xn\bar x = \dfrac{{\sum x }}{n}
For finding correlation coefficient r=nxy(x)(y)nx2(x)2ny2(y)2r = \dfrac{{n\sum {xy - \left( {\sum x } \right)\left( {\sum y } \right)} }}{{\sqrt {n\sum {{x^2}} - {{\left( {\sum x } \right)}^2}} \sqrt {n\sum {{y^2} - } {{\left( {\sum y } \right)}^2}} }}

Complete step by step answer:

xxyyxyxyx2{x^2}y2{y^2}
200320032121420634206340120094012009441441
200420041717340683406840160164016016289289
200520051616320803208040200254020025256256
200620061616320963209640240364024036256256
200720071414280982809840280494028049196196
xi=10025\sum {{x_i} = 10025} yi=84\sum {{y_i} = 84} xiyi=168405\sum {{x_i}{y_i} = 168405} xi2=20100135\sum {{x_i}^2 = 20100135} yi2=1438\sum {{y_i}^2 = 1438}

For finding Regression line m=nxy(x)(y)nx2(x)2m = \dfrac{{n\sum {xy - \left( {\sum x } \right)\left( {\sum y } \right)} }}{{n\sum {{x^2}} - {{\left( {\sum x } \right)}^2}}}
Now substitute the values in the formula we get,
m=5(168405)(10025)(84)5(20100135)(10025)2m = \dfrac{{5(168405) - (10025)(84)}}{{5(20100135) - {{(10025)}^2}}}
m=842025842100100500675100500625=7550=1.5m = \dfrac{{842025 - 842100}}{{100500675 - 100500625}} = \dfrac{{ - 75}}{{50}} = - 1.5
m=1.5m = - 1.5
The regression line is m=1.5m = - 1.5
Now we are going to find the regression line. Formula for finding regression equation y^=mx+b\hat y = mx + b where
b=yˉmxˉb = \bar y - m\bar x , yˉ=yn\bar y = \dfrac{{\sum y }}{n} and xˉ=xn\bar x = \dfrac{{\sum x }}{n}and The slope term is m=1.5m = - 1.5.
b=yˉmxˉb = \bar y - m\bar x
=ynmxn\Rightarrow = \dfrac{{\sum y }}{n} - m\dfrac{{\sum x }}{n}
=845(1.5)100255=3024.3\Rightarrow = \dfrac{{84}}{5} - ( - 1.5)\dfrac{{10025}}{5} = 3024.3
b=3024.3b = 3024.3
Now we have m=1.5m = - 1.5and b=3024.3b = 3024.3, then substitute this value into the regression equation we get,
Therefore the regression equation is y^=1.5x+3024.3\hat y = - 1.5x + 3024.3
Now we are going to find the correlation coefficient,
For finding correlation coefficient r=nxy(x)(y)nx2(x)2ny2(y)2r = \dfrac{{n\sum {xy - \left( {\sum x } \right)\left( {\sum y } \right)} }}{{\sqrt {n\sum {{x^2}} - {{\left( {\sum x } \right)}^2}} \sqrt {n\sum {{y^2} - } {{\left( {\sum y } \right)}^2}} }}
From the table we have the values of xy\sum {xy} ,x\sum x ,y\sum y ,
r=5(168405)(10025)(84)5(20100135)(10025)25(1438)(84)2r = \dfrac{{5(168405) - (10025)(84)}}{{\sqrt {5(20100135) - {{(10025)}^2}} \sqrt {5(1438) - {{(84)}^2}} }}
r=7550134r = \dfrac{{ - 75}}{{\sqrt {50} \sqrt {134} }}
r=0.916271r = - 0.916271
Hence the correlation coefficient is r=0.916271r = - 0.916271.
There is a very strong negative (downhill) linear relation between Year(x)(x)and Turtles hatched(y)(y).

Note: The purpose of regression is estimate, explain. Predict and evaluate the relation between variables.
The correlation coefficient is a measure of the strength and the direction of a linear relationship between two variables.
The symbol rr represents the sample correlation coefficient.
The range of the correlation coefficient is 1 - 1 to11 .
If xx and yy have a strong positive linear correlation, rr is close to 11. If xx and yy have a strong negative linear correlation, rr is close to 1 - 1. If there is no correlation or a weak linear correlation, rr is close to 00.