Relationship between OLS estimates of slope coefficients of simple linear regression Y on X and X on Y
$begingroup$
Assume a model $y = beta_0 + beta_1x + u$. Given a sample $(x_i, y_i)_{i=1}^n$, we can find the OLS estimates of $beta_1$, $hat{beta_1}$. Then suppose that we assume another model $x = gamma_0 + gamma_1y + varepsilon$. Then we also can compute the OLS estimates of $gamma_1, hat{gamma_1}$. And my question is whether $frac{1}{hat{gamma_1}}$ is the unbiased estimator of $beta_1$? Assuming the models satisfy Gauss - Markov assumptions, i.e.
1) $mathbb{E}(u|x) = 0, mathbb{E}(varepsilon|y) = 0$
2) $(y_i, x_{i})_{i=1}^n$ are i.i.d.
3) $mathbb{E}(y^4) < infty, mathbb{E}(x^4) < infty$
4) $u, varepsilon$ are homoskedastic
5) $u sim mathscr{N}(0, sigma_u^2), varepsilon sim mathscr{N}(0, sigma_{varepsilon}^2)$
What have I done:
$hat{gamma_1} = frac{sum(y_i - bar{y})(x_i - bar{x})}{sum (y_i - bar{y})^2} = frac{s^2_{xy}}{s^2_{yy}}$ (where $s^2$ means sample covariance)
$frac{1}{hat{gamma_1}} = frac{s^2_{yy}}{s^2_{xy}} = frac{sum(y_i - bar{y})(beta_1x_i + u_i - beta_1bar{x} - bar{u})}{s^2_{xy}} = frac{beta_1s^2_{xy} + s^2_{yu}}{s^2_{xy}} = beta_1 + frac{s^2_{yu}}{s^2_{xy}}$
And here I got stuck. I have no idea, how to calculate the expectation of the second term (or prove that it has zero, or non-zero expectation). Could you please give me any hints?
Thanks a lot in advance for any help!
linear-regression
$endgroup$
add a comment |
$begingroup$
Assume a model $y = beta_0 + beta_1x + u$. Given a sample $(x_i, y_i)_{i=1}^n$, we can find the OLS estimates of $beta_1$, $hat{beta_1}$. Then suppose that we assume another model $x = gamma_0 + gamma_1y + varepsilon$. Then we also can compute the OLS estimates of $gamma_1, hat{gamma_1}$. And my question is whether $frac{1}{hat{gamma_1}}$ is the unbiased estimator of $beta_1$? Assuming the models satisfy Gauss - Markov assumptions, i.e.
1) $mathbb{E}(u|x) = 0, mathbb{E}(varepsilon|y) = 0$
2) $(y_i, x_{i})_{i=1}^n$ are i.i.d.
3) $mathbb{E}(y^4) < infty, mathbb{E}(x^4) < infty$
4) $u, varepsilon$ are homoskedastic
5) $u sim mathscr{N}(0, sigma_u^2), varepsilon sim mathscr{N}(0, sigma_{varepsilon}^2)$
What have I done:
$hat{gamma_1} = frac{sum(y_i - bar{y})(x_i - bar{x})}{sum (y_i - bar{y})^2} = frac{s^2_{xy}}{s^2_{yy}}$ (where $s^2$ means sample covariance)
$frac{1}{hat{gamma_1}} = frac{s^2_{yy}}{s^2_{xy}} = frac{sum(y_i - bar{y})(beta_1x_i + u_i - beta_1bar{x} - bar{u})}{s^2_{xy}} = frac{beta_1s^2_{xy} + s^2_{yu}}{s^2_{xy}} = beta_1 + frac{s^2_{yu}}{s^2_{xy}}$
And here I got stuck. I have no idea, how to calculate the expectation of the second term (or prove that it has zero, or non-zero expectation). Could you please give me any hints?
Thanks a lot in advance for any help!
linear-regression
$endgroup$
add a comment |
$begingroup$
Assume a model $y = beta_0 + beta_1x + u$. Given a sample $(x_i, y_i)_{i=1}^n$, we can find the OLS estimates of $beta_1$, $hat{beta_1}$. Then suppose that we assume another model $x = gamma_0 + gamma_1y + varepsilon$. Then we also can compute the OLS estimates of $gamma_1, hat{gamma_1}$. And my question is whether $frac{1}{hat{gamma_1}}$ is the unbiased estimator of $beta_1$? Assuming the models satisfy Gauss - Markov assumptions, i.e.
1) $mathbb{E}(u|x) = 0, mathbb{E}(varepsilon|y) = 0$
2) $(y_i, x_{i})_{i=1}^n$ are i.i.d.
3) $mathbb{E}(y^4) < infty, mathbb{E}(x^4) < infty$
4) $u, varepsilon$ are homoskedastic
5) $u sim mathscr{N}(0, sigma_u^2), varepsilon sim mathscr{N}(0, sigma_{varepsilon}^2)$
What have I done:
$hat{gamma_1} = frac{sum(y_i - bar{y})(x_i - bar{x})}{sum (y_i - bar{y})^2} = frac{s^2_{xy}}{s^2_{yy}}$ (where $s^2$ means sample covariance)
$frac{1}{hat{gamma_1}} = frac{s^2_{yy}}{s^2_{xy}} = frac{sum(y_i - bar{y})(beta_1x_i + u_i - beta_1bar{x} - bar{u})}{s^2_{xy}} = frac{beta_1s^2_{xy} + s^2_{yu}}{s^2_{xy}} = beta_1 + frac{s^2_{yu}}{s^2_{xy}}$
And here I got stuck. I have no idea, how to calculate the expectation of the second term (or prove that it has zero, or non-zero expectation). Could you please give me any hints?
Thanks a lot in advance for any help!
linear-regression
$endgroup$
Assume a model $y = beta_0 + beta_1x + u$. Given a sample $(x_i, y_i)_{i=1}^n$, we can find the OLS estimates of $beta_1$, $hat{beta_1}$. Then suppose that we assume another model $x = gamma_0 + gamma_1y + varepsilon$. Then we also can compute the OLS estimates of $gamma_1, hat{gamma_1}$. And my question is whether $frac{1}{hat{gamma_1}}$ is the unbiased estimator of $beta_1$? Assuming the models satisfy Gauss - Markov assumptions, i.e.
1) $mathbb{E}(u|x) = 0, mathbb{E}(varepsilon|y) = 0$
2) $(y_i, x_{i})_{i=1}^n$ are i.i.d.
3) $mathbb{E}(y^4) < infty, mathbb{E}(x^4) < infty$
4) $u, varepsilon$ are homoskedastic
5) $u sim mathscr{N}(0, sigma_u^2), varepsilon sim mathscr{N}(0, sigma_{varepsilon}^2)$
What have I done:
$hat{gamma_1} = frac{sum(y_i - bar{y})(x_i - bar{x})}{sum (y_i - bar{y})^2} = frac{s^2_{xy}}{s^2_{yy}}$ (where $s^2$ means sample covariance)
$frac{1}{hat{gamma_1}} = frac{s^2_{yy}}{s^2_{xy}} = frac{sum(y_i - bar{y})(beta_1x_i + u_i - beta_1bar{x} - bar{u})}{s^2_{xy}} = frac{beta_1s^2_{xy} + s^2_{yu}}{s^2_{xy}} = beta_1 + frac{s^2_{yu}}{s^2_{xy}}$
And here I got stuck. I have no idea, how to calculate the expectation of the second term (or prove that it has zero, or non-zero expectation). Could you please give me any hints?
Thanks a lot in advance for any help!
linear-regression
linear-regression
edited Jan 24 at 20:02
D F
asked Jan 24 at 16:36
D FD F
1,0771318
1,0771318
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
This is a biased estimator. If $beta_1 = g(gamma)=1/gamma=$ and $mathbb{E}[hat{gamma}] = gamma=1/beta_1$, then $beta_1 = 1/mathbb{E}[hat{gamma}]$. And due to the fact that $1/x$ is either convex (for $x > 0$) or concave (for $x<0$), $1/hat{gamma}$ will be biased upward and downward, respectively. This stems from Jensen inequality, i.e., for $hat{gamma}>0$ you have
$$
mathbb{E}g(hat{gamma}) ge gleft( mathbb{E}hat{gamma} right),
$$
and reversed inequality for $ hat{gamma}<0$.
$endgroup$
add a comment |
$begingroup$
There is some contradiction in your symbology: I am assuming that you estimated the betas through a regression $y,x$ and the gammas with a regression $x,y$.
That means that:
- in the first case, you are assuming the "error" to be in the $y$ data (while the $x$ values are precise);
- in the second, on the contrary, the "error" is on the $x$ data.
In any case the regression line is going to pass through $(x_{avg},y_{avg})$, so we are concerned only with determining the slope.
So $beta_1$ is tied to the conditional probability that all the error be with $y$ and $1/gamma_{1}$ that it be all with $x$.
The intermediate case in which we allow the error to be both on the $x$ and $y$ data is solved through Total Least Squares linear regression which would produce a different slope according to the presumed ratio of precision (or noise) between the two sets of data.
So a relation cannot be established, unless you state such an assumption.
-- addendum in reply to your comment --
The "physical" and "engineering" concept about regression is what I described above.
If you have, e.g., a set of measurements of (volume, weight) of a certain substance, made with instruments of which you know the precision and excluding systematic errors, and you want to determine the density of the substance then you resort to the statistics property of levelling out the errors by using multiple samples. That is to least squares method.
The standard least square method is to minimize $(Delta y)^2=(y_n-(beta_0+ beta_1 x_n))^2$ which is the same to say that the "precise" value of $y$ is $beta_0+ beta_1 x_n$ which in turn means that $x_n$ is precise and you are minimizing the vertical distances $y_n$-line.
And that's acceptable if the expected r.m.s. error (the $sigma$) on $x$ is much lower than that on $y$.
If it is the contrary, then you shall minimize the horizontal deviations from the line ($x,y$ regression).
In the case in which the expected r.m.s. errors are comparable, you shall minimize the distances from the line taken along a direction inclined same as the ratio of the $sigma$'s (in between the two extreme cases above).
So, physically it is known that the vertical and horizontal approach might provide two different slopes, and mathematically you cannot concile them if not making assumptions about the said ratio.
-- addendum upon expliciting Gauss-Markov conditions --
Translated into the engineering approach described above, they read as:
- the systematic error is null;
- it is present on both $x$ and $y$, independently;
- it is distributed as a double Normal, on the two independent variables $x,y$, thus as a product of two Normals;
- the expected r.m.s. error is given and is respectively $sigma_x , , sigma_y$.
So what explained above is totally justified and the actual slope shall be computed by minimizing the sum of
the square errors = deviations taken along the line $Delta y / Delta x = sigma_y / sigma_x$.
Or, in other words, transform $x$ and $y$ into
$$
xi _{,n} = {{x_{,n} - bar x} over {sigma _{,x} }},quad eta _{,n} = {{y_{,n} - bar y} over {sigma _{,y} }}
$$
and calculate the errors orthogonally to the line : Orthogonal Regression.
Now you expect to have the inversion of the slope if you invert the axes.
$endgroup$
$begingroup$
Could you please clarify what you mean by errors in $y$ or $x$ data and why it matters? We have a sample $(y_i, x_i)_{i=1}^n$ a set of pairs of points $x_i$ and $y_i$. We are not observing any errors, in these cases only our assumptions on what is an independent variable differ. And yes, I actually estimated betas through regression $y$ on $x$ and gammas through regression $x$ on $y$.
$endgroup$
– D F
Jan 24 at 17:21
$begingroup$
@DF: added some more considerations to the answer: hope I succeeded to make concept clear .
$endgroup$
– G Cab
Jan 24 at 18:28
$begingroup$
Well, I understood what you are talking about, but here as I said, Gauss - Markov conditions are satisfied i.e. expected errors on both $x$ and $y$ are assumed to be $0$.
$endgroup$
– D F
Jan 24 at 18:33
$begingroup$
@DF: well, sorry, I was talking about the expected non-systematic error of the measurements which actually means the expected r.m.s. of the error (the $sigma$): I amended accordingly the answer. I suppose that when you talk of "expected error = 0" you are talking of the "systematic" error, i.e. the arithmetic mean, is it ? (by the way what do you mean by Gauss-Markov conditions ?)
$endgroup$
– G Cab
Jan 24 at 19:09
$begingroup$
I have modified the question and added the Gauss-Markov conditions.
$endgroup$
– D F
Jan 24 at 19:45
|
show 1 more comment
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3086087%2frelationship-between-ols-estimates-of-slope-coefficients-of-simple-linear-regres%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
This is a biased estimator. If $beta_1 = g(gamma)=1/gamma=$ and $mathbb{E}[hat{gamma}] = gamma=1/beta_1$, then $beta_1 = 1/mathbb{E}[hat{gamma}]$. And due to the fact that $1/x$ is either convex (for $x > 0$) or concave (for $x<0$), $1/hat{gamma}$ will be biased upward and downward, respectively. This stems from Jensen inequality, i.e., for $hat{gamma}>0$ you have
$$
mathbb{E}g(hat{gamma}) ge gleft( mathbb{E}hat{gamma} right),
$$
and reversed inequality for $ hat{gamma}<0$.
$endgroup$
add a comment |
$begingroup$
This is a biased estimator. If $beta_1 = g(gamma)=1/gamma=$ and $mathbb{E}[hat{gamma}] = gamma=1/beta_1$, then $beta_1 = 1/mathbb{E}[hat{gamma}]$. And due to the fact that $1/x$ is either convex (for $x > 0$) or concave (for $x<0$), $1/hat{gamma}$ will be biased upward and downward, respectively. This stems from Jensen inequality, i.e., for $hat{gamma}>0$ you have
$$
mathbb{E}g(hat{gamma}) ge gleft( mathbb{E}hat{gamma} right),
$$
and reversed inequality for $ hat{gamma}<0$.
$endgroup$
add a comment |
$begingroup$
This is a biased estimator. If $beta_1 = g(gamma)=1/gamma=$ and $mathbb{E}[hat{gamma}] = gamma=1/beta_1$, then $beta_1 = 1/mathbb{E}[hat{gamma}]$. And due to the fact that $1/x$ is either convex (for $x > 0$) or concave (for $x<0$), $1/hat{gamma}$ will be biased upward and downward, respectively. This stems from Jensen inequality, i.e., for $hat{gamma}>0$ you have
$$
mathbb{E}g(hat{gamma}) ge gleft( mathbb{E}hat{gamma} right),
$$
and reversed inequality for $ hat{gamma}<0$.
$endgroup$
This is a biased estimator. If $beta_1 = g(gamma)=1/gamma=$ and $mathbb{E}[hat{gamma}] = gamma=1/beta_1$, then $beta_1 = 1/mathbb{E}[hat{gamma}]$. And due to the fact that $1/x$ is either convex (for $x > 0$) or concave (for $x<0$), $1/hat{gamma}$ will be biased upward and downward, respectively. This stems from Jensen inequality, i.e., for $hat{gamma}>0$ you have
$$
mathbb{E}g(hat{gamma}) ge gleft( mathbb{E}hat{gamma} right),
$$
and reversed inequality for $ hat{gamma}<0$.
answered Jan 24 at 20:50
V. VancakV. Vancak
11.3k2926
11.3k2926
add a comment |
add a comment |
$begingroup$
There is some contradiction in your symbology: I am assuming that you estimated the betas through a regression $y,x$ and the gammas with a regression $x,y$.
That means that:
- in the first case, you are assuming the "error" to be in the $y$ data (while the $x$ values are precise);
- in the second, on the contrary, the "error" is on the $x$ data.
In any case the regression line is going to pass through $(x_{avg},y_{avg})$, so we are concerned only with determining the slope.
So $beta_1$ is tied to the conditional probability that all the error be with $y$ and $1/gamma_{1}$ that it be all with $x$.
The intermediate case in which we allow the error to be both on the $x$ and $y$ data is solved through Total Least Squares linear regression which would produce a different slope according to the presumed ratio of precision (or noise) between the two sets of data.
So a relation cannot be established, unless you state such an assumption.
-- addendum in reply to your comment --
The "physical" and "engineering" concept about regression is what I described above.
If you have, e.g., a set of measurements of (volume, weight) of a certain substance, made with instruments of which you know the precision and excluding systematic errors, and you want to determine the density of the substance then you resort to the statistics property of levelling out the errors by using multiple samples. That is to least squares method.
The standard least square method is to minimize $(Delta y)^2=(y_n-(beta_0+ beta_1 x_n))^2$ which is the same to say that the "precise" value of $y$ is $beta_0+ beta_1 x_n$ which in turn means that $x_n$ is precise and you are minimizing the vertical distances $y_n$-line.
And that's acceptable if the expected r.m.s. error (the $sigma$) on $x$ is much lower than that on $y$.
If it is the contrary, then you shall minimize the horizontal deviations from the line ($x,y$ regression).
In the case in which the expected r.m.s. errors are comparable, you shall minimize the distances from the line taken along a direction inclined same as the ratio of the $sigma$'s (in between the two extreme cases above).
So, physically it is known that the vertical and horizontal approach might provide two different slopes, and mathematically you cannot concile them if not making assumptions about the said ratio.
-- addendum upon expliciting Gauss-Markov conditions --
Translated into the engineering approach described above, they read as:
- the systematic error is null;
- it is present on both $x$ and $y$, independently;
- it is distributed as a double Normal, on the two independent variables $x,y$, thus as a product of two Normals;
- the expected r.m.s. error is given and is respectively $sigma_x , , sigma_y$.
So what explained above is totally justified and the actual slope shall be computed by minimizing the sum of
the square errors = deviations taken along the line $Delta y / Delta x = sigma_y / sigma_x$.
Or, in other words, transform $x$ and $y$ into
$$
xi _{,n} = {{x_{,n} - bar x} over {sigma _{,x} }},quad eta _{,n} = {{y_{,n} - bar y} over {sigma _{,y} }}
$$
and calculate the errors orthogonally to the line : Orthogonal Regression.
Now you expect to have the inversion of the slope if you invert the axes.
$endgroup$
$begingroup$
Could you please clarify what you mean by errors in $y$ or $x$ data and why it matters? We have a sample $(y_i, x_i)_{i=1}^n$ a set of pairs of points $x_i$ and $y_i$. We are not observing any errors, in these cases only our assumptions on what is an independent variable differ. And yes, I actually estimated betas through regression $y$ on $x$ and gammas through regression $x$ on $y$.
$endgroup$
– D F
Jan 24 at 17:21
$begingroup$
@DF: added some more considerations to the answer: hope I succeeded to make concept clear .
$endgroup$
– G Cab
Jan 24 at 18:28
$begingroup$
Well, I understood what you are talking about, but here as I said, Gauss - Markov conditions are satisfied i.e. expected errors on both $x$ and $y$ are assumed to be $0$.
$endgroup$
– D F
Jan 24 at 18:33
$begingroup$
@DF: well, sorry, I was talking about the expected non-systematic error of the measurements which actually means the expected r.m.s. of the error (the $sigma$): I amended accordingly the answer. I suppose that when you talk of "expected error = 0" you are talking of the "systematic" error, i.e. the arithmetic mean, is it ? (by the way what do you mean by Gauss-Markov conditions ?)
$endgroup$
– G Cab
Jan 24 at 19:09
$begingroup$
I have modified the question and added the Gauss-Markov conditions.
$endgroup$
– D F
Jan 24 at 19:45
|
show 1 more comment
$begingroup$
There is some contradiction in your symbology: I am assuming that you estimated the betas through a regression $y,x$ and the gammas with a regression $x,y$.
That means that:
- in the first case, you are assuming the "error" to be in the $y$ data (while the $x$ values are precise);
- in the second, on the contrary, the "error" is on the $x$ data.
In any case the regression line is going to pass through $(x_{avg},y_{avg})$, so we are concerned only with determining the slope.
So $beta_1$ is tied to the conditional probability that all the error be with $y$ and $1/gamma_{1}$ that it be all with $x$.
The intermediate case in which we allow the error to be both on the $x$ and $y$ data is solved through Total Least Squares linear regression which would produce a different slope according to the presumed ratio of precision (or noise) between the two sets of data.
So a relation cannot be established, unless you state such an assumption.
-- addendum in reply to your comment --
The "physical" and "engineering" concept about regression is what I described above.
If you have, e.g., a set of measurements of (volume, weight) of a certain substance, made with instruments of which you know the precision and excluding systematic errors, and you want to determine the density of the substance then you resort to the statistics property of levelling out the errors by using multiple samples. That is to least squares method.
The standard least square method is to minimize $(Delta y)^2=(y_n-(beta_0+ beta_1 x_n))^2$ which is the same to say that the "precise" value of $y$ is $beta_0+ beta_1 x_n$ which in turn means that $x_n$ is precise and you are minimizing the vertical distances $y_n$-line.
And that's acceptable if the expected r.m.s. error (the $sigma$) on $x$ is much lower than that on $y$.
If it is the contrary, then you shall minimize the horizontal deviations from the line ($x,y$ regression).
In the case in which the expected r.m.s. errors are comparable, you shall minimize the distances from the line taken along a direction inclined same as the ratio of the $sigma$'s (in between the two extreme cases above).
So, physically it is known that the vertical and horizontal approach might provide two different slopes, and mathematically you cannot concile them if not making assumptions about the said ratio.
-- addendum upon expliciting Gauss-Markov conditions --
Translated into the engineering approach described above, they read as:
- the systematic error is null;
- it is present on both $x$ and $y$, independently;
- it is distributed as a double Normal, on the two independent variables $x,y$, thus as a product of two Normals;
- the expected r.m.s. error is given and is respectively $sigma_x , , sigma_y$.
So what explained above is totally justified and the actual slope shall be computed by minimizing the sum of
the square errors = deviations taken along the line $Delta y / Delta x = sigma_y / sigma_x$.
Or, in other words, transform $x$ and $y$ into
$$
xi _{,n} = {{x_{,n} - bar x} over {sigma _{,x} }},quad eta _{,n} = {{y_{,n} - bar y} over {sigma _{,y} }}
$$
and calculate the errors orthogonally to the line : Orthogonal Regression.
Now you expect to have the inversion of the slope if you invert the axes.
$endgroup$
$begingroup$
Could you please clarify what you mean by errors in $y$ or $x$ data and why it matters? We have a sample $(y_i, x_i)_{i=1}^n$ a set of pairs of points $x_i$ and $y_i$. We are not observing any errors, in these cases only our assumptions on what is an independent variable differ. And yes, I actually estimated betas through regression $y$ on $x$ and gammas through regression $x$ on $y$.
$endgroup$
– D F
Jan 24 at 17:21
$begingroup$
@DF: added some more considerations to the answer: hope I succeeded to make concept clear .
$endgroup$
– G Cab
Jan 24 at 18:28
$begingroup$
Well, I understood what you are talking about, but here as I said, Gauss - Markov conditions are satisfied i.e. expected errors on both $x$ and $y$ are assumed to be $0$.
$endgroup$
– D F
Jan 24 at 18:33
$begingroup$
@DF: well, sorry, I was talking about the expected non-systematic error of the measurements which actually means the expected r.m.s. of the error (the $sigma$): I amended accordingly the answer. I suppose that when you talk of "expected error = 0" you are talking of the "systematic" error, i.e. the arithmetic mean, is it ? (by the way what do you mean by Gauss-Markov conditions ?)
$endgroup$
– G Cab
Jan 24 at 19:09
$begingroup$
I have modified the question and added the Gauss-Markov conditions.
$endgroup$
– D F
Jan 24 at 19:45
|
show 1 more comment
$begingroup$
There is some contradiction in your symbology: I am assuming that you estimated the betas through a regression $y,x$ and the gammas with a regression $x,y$.
That means that:
- in the first case, you are assuming the "error" to be in the $y$ data (while the $x$ values are precise);
- in the second, on the contrary, the "error" is on the $x$ data.
In any case the regression line is going to pass through $(x_{avg},y_{avg})$, so we are concerned only with determining the slope.
So $beta_1$ is tied to the conditional probability that all the error be with $y$ and $1/gamma_{1}$ that it be all with $x$.
The intermediate case in which we allow the error to be both on the $x$ and $y$ data is solved through Total Least Squares linear regression which would produce a different slope according to the presumed ratio of precision (or noise) between the two sets of data.
So a relation cannot be established, unless you state such an assumption.
-- addendum in reply to your comment --
The "physical" and "engineering" concept about regression is what I described above.
If you have, e.g., a set of measurements of (volume, weight) of a certain substance, made with instruments of which you know the precision and excluding systematic errors, and you want to determine the density of the substance then you resort to the statistics property of levelling out the errors by using multiple samples. That is to least squares method.
The standard least square method is to minimize $(Delta y)^2=(y_n-(beta_0+ beta_1 x_n))^2$ which is the same to say that the "precise" value of $y$ is $beta_0+ beta_1 x_n$ which in turn means that $x_n$ is precise and you are minimizing the vertical distances $y_n$-line.
And that's acceptable if the expected r.m.s. error (the $sigma$) on $x$ is much lower than that on $y$.
If it is the contrary, then you shall minimize the horizontal deviations from the line ($x,y$ regression).
In the case in which the expected r.m.s. errors are comparable, you shall minimize the distances from the line taken along a direction inclined same as the ratio of the $sigma$'s (in between the two extreme cases above).
So, physically it is known that the vertical and horizontal approach might provide two different slopes, and mathematically you cannot concile them if not making assumptions about the said ratio.
-- addendum upon expliciting Gauss-Markov conditions --
Translated into the engineering approach described above, they read as:
- the systematic error is null;
- it is present on both $x$ and $y$, independently;
- it is distributed as a double Normal, on the two independent variables $x,y$, thus as a product of two Normals;
- the expected r.m.s. error is given and is respectively $sigma_x , , sigma_y$.
So what explained above is totally justified and the actual slope shall be computed by minimizing the sum of
the square errors = deviations taken along the line $Delta y / Delta x = sigma_y / sigma_x$.
Or, in other words, transform $x$ and $y$ into
$$
xi _{,n} = {{x_{,n} - bar x} over {sigma _{,x} }},quad eta _{,n} = {{y_{,n} - bar y} over {sigma _{,y} }}
$$
and calculate the errors orthogonally to the line : Orthogonal Regression.
Now you expect to have the inversion of the slope if you invert the axes.
$endgroup$
There is some contradiction in your symbology: I am assuming that you estimated the betas through a regression $y,x$ and the gammas with a regression $x,y$.
That means that:
- in the first case, you are assuming the "error" to be in the $y$ data (while the $x$ values are precise);
- in the second, on the contrary, the "error" is on the $x$ data.
In any case the regression line is going to pass through $(x_{avg},y_{avg})$, so we are concerned only with determining the slope.
So $beta_1$ is tied to the conditional probability that all the error be with $y$ and $1/gamma_{1}$ that it be all with $x$.
The intermediate case in which we allow the error to be both on the $x$ and $y$ data is solved through Total Least Squares linear regression which would produce a different slope according to the presumed ratio of precision (or noise) between the two sets of data.
So a relation cannot be established, unless you state such an assumption.
-- addendum in reply to your comment --
The "physical" and "engineering" concept about regression is what I described above.
If you have, e.g., a set of measurements of (volume, weight) of a certain substance, made with instruments of which you know the precision and excluding systematic errors, and you want to determine the density of the substance then you resort to the statistics property of levelling out the errors by using multiple samples. That is to least squares method.
The standard least square method is to minimize $(Delta y)^2=(y_n-(beta_0+ beta_1 x_n))^2$ which is the same to say that the "precise" value of $y$ is $beta_0+ beta_1 x_n$ which in turn means that $x_n$ is precise and you are minimizing the vertical distances $y_n$-line.
And that's acceptable if the expected r.m.s. error (the $sigma$) on $x$ is much lower than that on $y$.
If it is the contrary, then you shall minimize the horizontal deviations from the line ($x,y$ regression).
In the case in which the expected r.m.s. errors are comparable, you shall minimize the distances from the line taken along a direction inclined same as the ratio of the $sigma$'s (in between the two extreme cases above).
So, physically it is known that the vertical and horizontal approach might provide two different slopes, and mathematically you cannot concile them if not making assumptions about the said ratio.
-- addendum upon expliciting Gauss-Markov conditions --
Translated into the engineering approach described above, they read as:
- the systematic error is null;
- it is present on both $x$ and $y$, independently;
- it is distributed as a double Normal, on the two independent variables $x,y$, thus as a product of two Normals;
- the expected r.m.s. error is given and is respectively $sigma_x , , sigma_y$.
So what explained above is totally justified and the actual slope shall be computed by minimizing the sum of
the square errors = deviations taken along the line $Delta y / Delta x = sigma_y / sigma_x$.
Or, in other words, transform $x$ and $y$ into
$$
xi _{,n} = {{x_{,n} - bar x} over {sigma _{,x} }},quad eta _{,n} = {{y_{,n} - bar y} over {sigma _{,y} }}
$$
and calculate the errors orthogonally to the line : Orthogonal Regression.
Now you expect to have the inversion of the slope if you invert the axes.
edited Jan 24 at 22:54
answered Jan 24 at 17:08
G CabG Cab
19.9k31340
19.9k31340
$begingroup$
Could you please clarify what you mean by errors in $y$ or $x$ data and why it matters? We have a sample $(y_i, x_i)_{i=1}^n$ a set of pairs of points $x_i$ and $y_i$. We are not observing any errors, in these cases only our assumptions on what is an independent variable differ. And yes, I actually estimated betas through regression $y$ on $x$ and gammas through regression $x$ on $y$.
$endgroup$
– D F
Jan 24 at 17:21
$begingroup$
@DF: added some more considerations to the answer: hope I succeeded to make concept clear .
$endgroup$
– G Cab
Jan 24 at 18:28
$begingroup$
Well, I understood what you are talking about, but here as I said, Gauss - Markov conditions are satisfied i.e. expected errors on both $x$ and $y$ are assumed to be $0$.
$endgroup$
– D F
Jan 24 at 18:33
$begingroup$
@DF: well, sorry, I was talking about the expected non-systematic error of the measurements which actually means the expected r.m.s. of the error (the $sigma$): I amended accordingly the answer. I suppose that when you talk of "expected error = 0" you are talking of the "systematic" error, i.e. the arithmetic mean, is it ? (by the way what do you mean by Gauss-Markov conditions ?)
$endgroup$
– G Cab
Jan 24 at 19:09
$begingroup$
I have modified the question and added the Gauss-Markov conditions.
$endgroup$
– D F
Jan 24 at 19:45
|
show 1 more comment
$begingroup$
Could you please clarify what you mean by errors in $y$ or $x$ data and why it matters? We have a sample $(y_i, x_i)_{i=1}^n$ a set of pairs of points $x_i$ and $y_i$. We are not observing any errors, in these cases only our assumptions on what is an independent variable differ. And yes, I actually estimated betas through regression $y$ on $x$ and gammas through regression $x$ on $y$.
$endgroup$
– D F
Jan 24 at 17:21
$begingroup$
@DF: added some more considerations to the answer: hope I succeeded to make concept clear .
$endgroup$
– G Cab
Jan 24 at 18:28
$begingroup$
Well, I understood what you are talking about, but here as I said, Gauss - Markov conditions are satisfied i.e. expected errors on both $x$ and $y$ are assumed to be $0$.
$endgroup$
– D F
Jan 24 at 18:33
$begingroup$
@DF: well, sorry, I was talking about the expected non-systematic error of the measurements which actually means the expected r.m.s. of the error (the $sigma$): I amended accordingly the answer. I suppose that when you talk of "expected error = 0" you are talking of the "systematic" error, i.e. the arithmetic mean, is it ? (by the way what do you mean by Gauss-Markov conditions ?)
$endgroup$
– G Cab
Jan 24 at 19:09
$begingroup$
I have modified the question and added the Gauss-Markov conditions.
$endgroup$
– D F
Jan 24 at 19:45
$begingroup$
Could you please clarify what you mean by errors in $y$ or $x$ data and why it matters? We have a sample $(y_i, x_i)_{i=1}^n$ a set of pairs of points $x_i$ and $y_i$. We are not observing any errors, in these cases only our assumptions on what is an independent variable differ. And yes, I actually estimated betas through regression $y$ on $x$ and gammas through regression $x$ on $y$.
$endgroup$
– D F
Jan 24 at 17:21
$begingroup$
Could you please clarify what you mean by errors in $y$ or $x$ data and why it matters? We have a sample $(y_i, x_i)_{i=1}^n$ a set of pairs of points $x_i$ and $y_i$. We are not observing any errors, in these cases only our assumptions on what is an independent variable differ. And yes, I actually estimated betas through regression $y$ on $x$ and gammas through regression $x$ on $y$.
$endgroup$
– D F
Jan 24 at 17:21
$begingroup$
@DF: added some more considerations to the answer: hope I succeeded to make concept clear .
$endgroup$
– G Cab
Jan 24 at 18:28
$begingroup$
@DF: added some more considerations to the answer: hope I succeeded to make concept clear .
$endgroup$
– G Cab
Jan 24 at 18:28
$begingroup$
Well, I understood what you are talking about, but here as I said, Gauss - Markov conditions are satisfied i.e. expected errors on both $x$ and $y$ are assumed to be $0$.
$endgroup$
– D F
Jan 24 at 18:33
$begingroup$
Well, I understood what you are talking about, but here as I said, Gauss - Markov conditions are satisfied i.e. expected errors on both $x$ and $y$ are assumed to be $0$.
$endgroup$
– D F
Jan 24 at 18:33
$begingroup$
@DF: well, sorry, I was talking about the expected non-systematic error of the measurements which actually means the expected r.m.s. of the error (the $sigma$): I amended accordingly the answer. I suppose that when you talk of "expected error = 0" you are talking of the "systematic" error, i.e. the arithmetic mean, is it ? (by the way what do you mean by Gauss-Markov conditions ?)
$endgroup$
– G Cab
Jan 24 at 19:09
$begingroup$
@DF: well, sorry, I was talking about the expected non-systematic error of the measurements which actually means the expected r.m.s. of the error (the $sigma$): I amended accordingly the answer. I suppose that when you talk of "expected error = 0" you are talking of the "systematic" error, i.e. the arithmetic mean, is it ? (by the way what do you mean by Gauss-Markov conditions ?)
$endgroup$
– G Cab
Jan 24 at 19:09
$begingroup$
I have modified the question and added the Gauss-Markov conditions.
$endgroup$
– D F
Jan 24 at 19:45
$begingroup$
I have modified the question and added the Gauss-Markov conditions.
$endgroup$
– D F
Jan 24 at 19:45
|
show 1 more comment
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3086087%2frelationship-between-ols-estimates-of-slope-coefficients-of-simple-linear-regres%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown