Solveeit Logo

Question

Question: Prove that \({{b}_{yx}}\cdot {{b}_{xy}}={{\left\\{ \rho \left( X,Y \right) \right\\}}^{2}}\)\[\]...

Prove that ${{b}{yx}}\cdot {{b}{xy}}={{\left\{ \rho \left( X,Y \right) \right\}}^{2}}$$$$$

Explanation

Solution

We recall the definitions and formula of regression coefficients bxy,byx{{b}_{xy}},{{b}_{yx}} in the regression analysis bivariate data X,YX,Y as the slopes of regression line. We recall the formula for correlation coefficient ρ(X,Y)\rho \left( X,Y \right) which is the ratio the ratio of covariance COV(X,Y)COV\left( X,Y \right) of the bivariate population and product of standard deviations σx,σy{{\sigma }_{x}},{{\sigma }_{y}} of XX and YY. We proceed from the left hand side to prove the statement. $$$$

Complete step by step answer:
We know that mean of a population with nn data points X=x1,x2,...xnX={{x}_{1}},{{x}_{2}},...{{x}_{n}} is given by
X=1ni=1nxi\overline{X}=\dfrac{1}{n}\sum\limits_{i=1}^{n}{{{x}_{i}}}
We know in regression analysis that in bivariate data two variables vary each other. It means if there are two variables X,YX,Y then XX may depend on YY and also YY may depend on X.X. Let us take a set of nn data points (x1,y1),(x2,y2),...,(xn,yn)\left( {{x}_{1}},{{y}_{1}} \right),\left( {{x}_{2}},{{y}_{2}} \right),...,\left( {{x}_{n}},{{y}_{n}} \right). We use the least square method to find the regression lines to fit the data. Let us assume when X=x1,x2,...xnX={{x}_{1}},{{x}_{2}},...{{x}_{n}}may depend on Y=y1,y2,y3,...,ynY={{y}_{1}},{{y}_{2}},{{y}_{3}},...,{{y}_{n}} we obtain equation of the regression line
X=c+dxX=c+dx
Here cc is the average value of XX when YY is zero. We know that the slope of the above line is called the regression coefficient bxy{{b}_{xy}} which is given by
bxy=i=1nxiyinXYi=1nxi2n(X)2{{b}_{xy}}=\dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{x}_{i}}^{2}-n{{\left( \overline{X} \right)}^{2}}}}
We assume the equation of the regression line YY may depend on XXas
Y=ax+bY=ax+b
Here aa is the average value of YY when XX is zero. Here the slope of the line is the regression coefficient byx{{b}_{yx}} which is given by
byx=i=1nxiyiXYi=1nyi2n(Y)2{{b}_{yx}}=\dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{y}_{i}}^{2}}-n{{\left( \overline{Y} \right)}^{2}}}
The correlation coefficient ρ(X,Y)\rho \left( X,Y \right) of the population determines the degree of causality of XXon YY or YYon X.X.we know that it is the ratio of covariance COV(X,Y)COV\left( X,Y \right) of the bivariate population and product of standard deviations of XX and YY. So it is given by
ρ(X,Y)=COV(X,Y)σxσy=xiyinXYxi2nX2yi2nY2\rho \left( X,Y \right)=\dfrac{\text{COV}\left( X,Y \right)}{{{\sigma }_{x}}{{\sigma }_{y}}}=\dfrac{\sum{{{x}_{i}}{{y}_{i}}-n\overline{X}\overline{Y}}}{\sqrt{\sum{{{x}_{i}}^{2}-n{{\overline{X}}^{2}}}}\sqrt{\sum{{{y}_{i}}^{2}-n{{\overline{Y}}^{2}}}}}
We proceed from the left hand side of the statement {{b}_{yx}}\cdot {{b}_{xy}}={{\left\\{ \rho \left( X,Y \right) \right\\}}^{2}}

& {{b}_{yx}}\cdot {{b}_{xy}}=\dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{x}_{i}}^{2}-n{{\left( \overline{X} \right)}^{2}}}}\times \dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sum\limits_{i=1}^{n}{{{y}_{i}}^{2}}-n{{\left( \overline{Y} \right)}^{2}}} \\\ & ={{\left( \dfrac{\sum\limits_{i=1}^{n}{{{x}_{i}}{{y}_{i}}}-n\overline{X}\overline{Y}}{\sqrt{\sum\limits_{i=1}^{n}{{{x}_{i}}^{2}-n{{\left( \overline{X} \right)}^{2}}}\times \sum\limits_{i=1}^{n}{{{y}_{i}}^{2}}-n{{\left( \overline{Y} \right)}^{2}}}} \right)}^{2}}={{\left\\{ \rho \left( X,Y \right) \right\\}}^{2}} \\\ \end{aligned}$$ Which is equal to the right hand side and hence the statement is proved. **Note:** We can alternatively solve if we know the relation between regression coefficients ${{b}_{xy}},{{b}_{yx}}$ and standard deviation ${{\sigma }_{x}},{{\sigma }_{y}}$ as ${{b}_{xy}}=\rho \dfrac{{{\sigma }_{x}}}{{{\sigma }_{y}}},{{b}_{yx}}=\rho \dfrac{{{\sigma }_{y}}}{{{\sigma }_{x}}}$ and then proceed from left hand side. The proving statement can be written as the correlation coefficient in bivariate data is the geometric mean of regression coefficients.