least square failure as classifier

I was reading pattern recognition and machine learning by Christopher bishop in chapter 4.1.3 page 186 about least square classification failure I stumbled on this phrase

"The failure of least square should not surprise us when we recall
that it corresponds to Maximum likelihood under the assumption of a
Gaussian conditional distribution"
however, I can not understand this! what is least square relation with conditional? why are we talking about conditional distribution? and how can it relate to gaussian?
I would be so grateful if U could help me. please.

asked Jan 6 at 19:30

Hoda Fakharzadeh

153

add a comment |

I was reading pattern recognition and machine learning by Christopher bishop in chapter 4.1.3 page 186 about least square classification failure I stumbled on this phrase

"The failure of least square should not surprise us when we recall
that it corresponds to Maximum likelihood under the assumption of a
Gaussian conditional distribution"
however, I can not understand this! what is least square relation with conditional? why are we talking about conditional distribution? and how can it relate to gaussian?
I would be so grateful if U could help me. please.

asked Jan 6 at 19:30

Hoda Fakharzadeh

153

add a comment |

I was reading pattern recognition and machine learning by Christopher bishop in chapter 4.1.3 page 186 about least square classification failure I stumbled on this phrase

"The failure of least square should not surprise us when we recall
that it corresponds to Maximum likelihood under the assumption of a
Gaussian conditional distribution"
however, I can not understand this! what is least square relation with conditional? why are we talking about conditional distribution? and how can it relate to gaussian?
I would be so grateful if U could help me. please.

asked Jan 6 at 19:30

Hoda Fakharzadeh

153

I was reading pattern recognition and machine learning by Christopher bishop in chapter 4.1.3 page 186 about least square classification failure I stumbled on this phrase

"The failure of least square should not surprise us when we recall
that it corresponds to Maximum likelihood under the assumption of a
Gaussian conditional distribution"
however, I can not understand this! what is least square relation with conditional? why are we talking about conditional distribution? and how can it relate to gaussian?
I would be so grateful if U could help me. please.

statistics statistical-inference conditional-expectation machine-learning maximum-likelihood

asked Jan 6 at 19:30

Hoda Fakharzadeh

153

asked Jan 6 at 19:30

Hoda Fakharzadeh

153

asked Jan 6 at 19:30

Hoda Fakharzadeh

153

asked Jan 6 at 19:30

Hoda Fakharzadeh

153

asked Jan 6 at 19:30

Hoda Fakharzadeh

153

add a comment |

1 Answer
1

active

oldest

votes

Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as

$$y_i = f(mathbf x_i) + epsilon,$$

where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.

Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$

Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,

$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$

So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.

The point that Bishop is making here is that, for classification problems, this Gaussian noise model is not very sensible. For one thing, $y_i$ should always be $0$ or $1$ for classification! But the Gaussian noise model can give you fractional values for $y_i$, and even negative values or values greater than one!

answered Jan 6 at 21:39

Kenny Wong

18.3k21438

thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21

@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3064311%2fleast-square-failure-as-classifier%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as

$$y_i = f(mathbf x_i) + epsilon,$$

where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.

Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$

Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,

$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$

So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.

answered Jan 6 at 21:39

Kenny Wong

18.3k21438

thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21

@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22

add a comment |

Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as

$$y_i = f(mathbf x_i) + epsilon,$$

where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.

Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$

Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,

$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$

So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.

answered Jan 6 at 21:39

Kenny Wong

18.3k21438

thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21

@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22

add a comment |

Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as

$$y_i = f(mathbf x_i) + epsilon,$$

where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.

Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$

Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,

$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$

So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.

answered Jan 6 at 21:39

Kenny Wong

18.3k21438

Suppose the relationship between the feature vectors $mathbf x_i$ and the target variables $y_i$ is modelled as

$$y_i = f(mathbf x_i) + epsilon,$$

where the function $f$ represents the "true model", and $epsilon sim mathcal N(0, sigma^2)$ is Gaussian noise.

Then the log likelihood for the dataset is
$$ log P(y_1, dots, y_N | mathbf x_1 , dots, mathbf x_N) = - frac{1}{2sigma^2} sum_{i=1}^N (y_i - f(mathbf x_i))^2 - frac{N}{2} log (2pi sigma^2).$$

Treating $sigma^2$ as a constant, and ignoring constant terms, we see that this log-likelihood is proportional to the least-squares loss function,

$$ L(y_1, dots, y_N | mathbf x_1, dots, mathbf x_n) = sum_{i=1}^N (y_i - f(mathbf x_i))^2.$$

So optimising the log-likelihood (under the assumption that the noise is Gaussian) is equivalent to optimising the least-squares loss function.

answered Jan 6 at 21:39

Kenny Wong

18.3k21438

answered Jan 6 at 21:39

Kenny Wong

18.3k21438

answered Jan 6 at 21:39

Kenny Wong

18.3k21438

answered Jan 6 at 21:39

Kenny Wong

18.3k21438

thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21

@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22

add a comment |

thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21

@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22

thank u for your answer I kind of understand it now but, it also talks about conditional Gaussian? how is it conditional?
– Hoda Fakharzadeh
Jan 6 at 22:21

@HodaFakharzadeh I suppose you can say $P(y_i | mathbf x_i) = mathcal N(y_i | f(mathbf x_i), sigma^2)$.
– Kenny Wong
Jan 6 at 22:22

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

ZAH,jlItpIVU2WRuWw,lQYywIPP4 vx r 1a jve,yYp,lEJ30Xdf3Z0qtiE4MZ2UlTT WyS3ODp lZ8TqNC4YKKFrZS OE101fq5NXK5AIDS

搜尋此網誌

Vtgyjfy