Minimum of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$
$begingroup$
Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that
$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$
If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.
It's easy to check that $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2 = leftVertbegin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}mathbf{x}-begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} rightVert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix}hspace{1em}$where $^dagger$ denotes de psuedo-inverse of a matrix.
I found on internet that an alternative solution, is to find the vector $mathbf{x}$ such that the gradient of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$ is null. Then, I yields to the followin system
$$mathbf{(I+A^TA)x=y+A^Tb}$$
My question is, why
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$
[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.
linear-algebra optimization convex-optimization
$endgroup$
add a comment |
$begingroup$
Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that
$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$
If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.
It's easy to check that $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2 = leftVertbegin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}mathbf{x}-begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} rightVert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix}hspace{1em}$where $^dagger$ denotes de psuedo-inverse of a matrix.
I found on internet that an alternative solution, is to find the vector $mathbf{x}$ such that the gradient of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$ is null. Then, I yields to the followin system
$$mathbf{(I+A^TA)x=y+A^Tb}$$
My question is, why
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$
[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.
linear-algebra optimization convex-optimization
$endgroup$
$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21
add a comment |
$begingroup$
Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that
$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$
If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.
It's easy to check that $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2 = leftVertbegin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}mathbf{x}-begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} rightVert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix}hspace{1em}$where $^dagger$ denotes de psuedo-inverse of a matrix.
I found on internet that an alternative solution, is to find the vector $mathbf{x}$ such that the gradient of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$ is null. Then, I yields to the followin system
$$mathbf{(I+A^TA)x=y+A^Tb}$$
My question is, why
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$
[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.
linear-algebra optimization convex-optimization
$endgroup$
Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that
$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$
If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.
It's easy to check that $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2 = leftVertbegin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}mathbf{x}-begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} rightVert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix}hspace{1em}$where $^dagger$ denotes de psuedo-inverse of a matrix.
I found on internet that an alternative solution, is to find the vector $mathbf{x}$ such that the gradient of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$ is null. Then, I yields to the followin system
$$mathbf{(I+A^TA)x=y+A^Tb}$$
My question is, why
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$
[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.
linear-algebra optimization convex-optimization
linear-algebra optimization convex-optimization
asked Jan 15 at 15:53
Yesid Fonseca V.Yesid Fonseca V.
987
987
$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21
add a comment |
$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21
$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21
$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$
$endgroup$
$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54
add a comment |
$begingroup$
Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).
$$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
$$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
the parts with identities ($delta$) can be resolved, and all can be divided by 2:
$$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
This can be rewriteerewritten back into matrix form:
$$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
Expand and join the $bf x$ terms:
$$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).
Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.
This time, I can outline less index-y way, by using $||x||^2=x^Tx$:
$${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
$${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
now it's
$$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
$$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
$$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).
If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
$${bf x}={bf C}^dagger{bf z}$$
which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.
$endgroup$
add a comment |
$begingroup$
Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have
begin{equation}
begin{split}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}^{dagger} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} = & left(
begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}
right)^{-1} left(begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} right) \
= & left(
mathbf{A}^T
mathbf{A} + mathbf{I}
right)^{-1} left(
mathbf{A}^T mathbf{b} +
mathbf{y}
right)
end{split}
end{equation}
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3074575%2fminimum-of-mathbfx-y2-mathbfax-mathbfb2%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$
$endgroup$
$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54
add a comment |
$begingroup$
Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$
$endgroup$
$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54
add a comment |
$begingroup$
Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$
$endgroup$
Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$
answered Jan 15 at 17:49
LinAlgLinAlg
9,4511521
9,4511521
$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54
add a comment |
$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54
$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54
$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54
add a comment |
$begingroup$
Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).
$$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
$$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
the parts with identities ($delta$) can be resolved, and all can be divided by 2:
$$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
This can be rewriteerewritten back into matrix form:
$$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
Expand and join the $bf x$ terms:
$$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).
Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.
This time, I can outline less index-y way, by using $||x||^2=x^Tx$:
$${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
$${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
now it's
$$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
$$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
$$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).
If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
$${bf x}={bf C}^dagger{bf z}$$
which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.
$endgroup$
add a comment |
$begingroup$
Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).
$$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
$$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
the parts with identities ($delta$) can be resolved, and all can be divided by 2:
$$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
This can be rewriteerewritten back into matrix form:
$$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
Expand and join the $bf x$ terms:
$$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).
Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.
This time, I can outline less index-y way, by using $||x||^2=x^Tx$:
$${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
$${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
now it's
$$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
$$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
$$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).
If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
$${bf x}={bf C}^dagger{bf z}$$
which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.
$endgroup$
add a comment |
$begingroup$
Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).
$$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
$$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
the parts with identities ($delta$) can be resolved, and all can be divided by 2:
$$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
This can be rewriteerewritten back into matrix form:
$$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
Expand and join the $bf x$ terms:
$$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).
Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.
This time, I can outline less index-y way, by using $||x||^2=x^Tx$:
$${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
$${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
now it's
$$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
$$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
$$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).
If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
$${bf x}={bf C}^dagger{bf z}$$
which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.
$endgroup$
Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).
$$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
$$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
the parts with identities ($delta$) can be resolved, and all can be divided by 2:
$$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
This can be rewriteerewritten back into matrix form:
$$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
Expand and join the $bf x$ terms:
$$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).
Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.
This time, I can outline less index-y way, by using $||x||^2=x^Tx$:
$${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
$${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
now it's
$$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
$$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
$$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).
If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
$${bf x}={bf C}^dagger{bf z}$$
which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.
edited Jan 15 at 16:38
answered Jan 15 at 16:15
orionorion
13.6k11837
13.6k11837
add a comment |
add a comment |
$begingroup$
Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have
begin{equation}
begin{split}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}^{dagger} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} = & left(
begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}
right)^{-1} left(begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} right) \
= & left(
mathbf{A}^T
mathbf{A} + mathbf{I}
right)^{-1} left(
mathbf{A}^T mathbf{b} +
mathbf{y}
right)
end{split}
end{equation}
$endgroup$
add a comment |
$begingroup$
Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have
begin{equation}
begin{split}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}^{dagger} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} = & left(
begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}
right)^{-1} left(begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} right) \
= & left(
mathbf{A}^T
mathbf{A} + mathbf{I}
right)^{-1} left(
mathbf{A}^T mathbf{b} +
mathbf{y}
right)
end{split}
end{equation}
$endgroup$
add a comment |
$begingroup$
Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have
begin{equation}
begin{split}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}^{dagger} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} = & left(
begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}
right)^{-1} left(begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} right) \
= & left(
mathbf{A}^T
mathbf{A} + mathbf{I}
right)^{-1} left(
mathbf{A}^T mathbf{b} +
mathbf{y}
right)
end{split}
end{equation}
$endgroup$
Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have
begin{equation}
begin{split}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}^{dagger} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} = & left(
begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}
right)^{-1} left(begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} right) \
= & left(
mathbf{A}^T
mathbf{A} + mathbf{I}
right)^{-1} left(
mathbf{A}^T mathbf{b} +
mathbf{y}
right)
end{split}
end{equation}
answered Jan 15 at 19:14
Héctor Miguel Vargas GarcíaHéctor Miguel Vargas García
111
111
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3074575%2fminimum-of-mathbfx-y2-mathbfax-mathbfb2%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21