least square failure as classifier
I was reading pattern recognition and machine learning by Christopher bishop in chapter 4.1.3 page 186 about least square classification failure I stumbled on this phrase
"The failure of least square should not surprise us when we recall
that it corresponds to Maximum likelihood under the assumption of a
Gaussian conditional distribution"
however, I can not understand this! what is least square relation with conditional? why are we talking about conditional distribution? and how can it relate to gaussian?
I would be so grateful if U could help me. please.
statistics statistical-inference conditional-expectation machine-learning maximum-likelihood
add a comment |
I was reading pattern recognition and machine learning by Christopher bishop in chapter 4.1.3 page 186 about least square classification failure I stumbled on this phrase
"The failure of least square should not surprise us when we recall
that it corresponds to Maximum likelihood under the assumption of a
Gaussian conditional distribution"
however, I can not understand this! what is least square relation with conditional? why are we talking about conditional distribution? and how can it relate to gaussian?
I would be so grateful if U could help me. please.
statistics statistical-inference conditional-expectation machine-learning maximum-likelihood
add a comment |
I was reading pattern recognition and machine learning by Christopher bishop in chapter 4.1.3 page 186 about least square classification failure I stumbled on this phrase
"The failure of least square should not surprise us when we recall
that it corresponds to Maximum likelihood under the assumption of a
Gaussian conditional distribution"
however, I can not understand this! what is least square relation with conditional? why are we talking about conditional distribution? and how can it relate to gaussian?
I would be so grateful if U could help me. please.
statistics statistical-inference conditional-expectation machine-learning maximum-likelihood
I was reading pattern recognition and machine learning by Christopher bishop in chapter 4.1.3 page 186 about least square classification failure I stumbled on this phrase
"The failure of least square should not surprise us when we recall
that it corresponds to Maximum likelihood under the assumption of a
Gaussian conditional distribution"
however, I can not understand this! what is least square relation with conditional? why are we talking about conditional distribution? and how can it relate to gaussian?
I would be so grateful if U could help me. please.
statistics statistical-inference conditional-expectation machine-learning maximum-likelihood
statistics statistical-inference conditional-expectation machine-learning maximum-likelihood
asked Jan 6 at 19:30
Hoda FakharzadehHoda Fakharzadeh
153
153
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as
$$y_i = f(mathbf x_i) + epsilon,$$
where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.
Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$
Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,
$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$
So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.
The point that Bishop is making here is that, for classification problems, this Gaussian noise model is not very sensible. For one thing, $y_i$ should always be $0$ or $1$ for classification! But the Gaussian noise model can give you fractional values for $y_i$, and even negative values or values greater than one!
thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21
@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3064311%2fleast-square-failure-as-classifier%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as
$$y_i = f(mathbf x_i) + epsilon,$$
where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.
Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$
Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,
$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$
So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.
The point that Bishop is making here is that, for classification problems, this Gaussian noise model is not very sensible. For one thing, $y_i$ should always be $0$ or $1$ for classification! But the Gaussian noise model can give you fractional values for $y_i$, and even negative values or values greater than one!
thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21
@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22
add a comment |
Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as
$$y_i = f(mathbf x_i) + epsilon,$$
where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.
Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$
Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,
$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$
So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.
The point that Bishop is making here is that, for classification problems, this Gaussian noise model is not very sensible. For one thing, $y_i$ should always be $0$ or $1$ for classification! But the Gaussian noise model can give you fractional values for $y_i$, and even negative values or values greater than one!
thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21
@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22
add a comment |
Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as
$$y_i = f(mathbf x_i) + epsilon,$$
where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.
Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$
Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,
$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$
So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.
The point that Bishop is making here is that, for classification problems, this Gaussian noise model is not very sensible. For one thing, $y_i$ should always be $0$ or $1$ for classification! But the Gaussian noise model can give you fractional values for $y_i$, and even negative values or values greater than one!
Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as
$$y_i = f(mathbf x_i) + epsilon,$$
where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.
Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$
Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,
$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$
So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.
The point that Bishop is making here is that, for classification problems, this Gaussian noise model is not very sensible. For one thing, $y_i$ should always be $0$ or $1$ for classification! But the Gaussian noise model can give you fractional values for $y_i$, and even negative values or values greater than one!
answered Jan 6 at 21:39
Kenny WongKenny Wong
18.3k21438
18.3k21438
thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21
@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22
add a comment |
thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21
@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22
thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21
thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21
@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22
@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3064311%2fleast-square-failure-as-classifier%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown