Minimum of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$

Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that

$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$

If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.

It's easy to check that $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2 = leftVertbegin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}mathbf{x}-begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} rightVert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix}hspace{1em}$where $^dagger$ denotes de psuedo-inverse of a matrix.

I found on internet that an alternative solution, is to find the vector $mathbf{x}$ such that the gradient of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$ is null. Then, I yields to the followin system

$$mathbf{(I+A^TA)x=y+A^Tb}$$

My question is, why

$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$

[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.

asked Jan 15 at 15:53

Yesid Fonseca V.

987

$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21

add a comment |

Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that

$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$

If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.

$$mathbf{(I+A^TA)x=y+A^Tb}$$

My question is, why

$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$

[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.

asked Jan 15 at 15:53

Yesid Fonseca V.

987

$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21

add a comment |

Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that

$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$

If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.

$$mathbf{(I+A^TA)x=y+A^Tb}$$

My question is, why

$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$

[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.

asked Jan 15 at 15:53

Yesid Fonseca V.

987

Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that

$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$

If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.

$$mathbf{(I+A^TA)x=y+A^Tb}$$

My question is, why

$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$

[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.

linear-algebra optimization convex-optimization

asked Jan 15 at 15:53

Yesid Fonseca V.

987

asked Jan 15 at 15:53

Yesid Fonseca V.

987

asked Jan 15 at 15:53

Yesid Fonseca V.

987

asked Jan 15 at 15:53

Yesid Fonseca V.

987

asked Jan 15 at 15:53

Yesid Fonseca V.

987

$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21

add a comment |

$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21

My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.

– littleO
Jan 15 at 19:21

add a comment |

3 Answers
3

active

oldest

votes

Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$

answered Jan 15 at 17:49

LinAlg

9,4511521

$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54

add a comment |

Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).

$$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
$$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
the parts with identities ($delta$) can be resolved, and all can be divided by 2:
$$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
This can be rewriteerewritten back into matrix form:
$$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
Expand and join the $bf x$ terms:
$$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).

Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.

This time, I can outline less index-y way, by using $||x||^2=x^Tx$:

$${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
$${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
now it's
$$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
$$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
$$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).

If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
$${bf x}={bf C}^dagger{bf z}$$
which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.

edited Jan 15 at 16:38

answered Jan 15 at 16:15

orion

13.6k11837

add a comment |

Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have

begin{equation}
begin{split}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}^{dagger} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} = & left(
begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix}
begin{bmatrix}
mathbf{A} \
mathbf{I}
end{bmatrix}
right)^{-1} left(begin{bmatrix}
mathbf{A}^T&
mathbf{I}
end{bmatrix} begin{bmatrix}
mathbf{b} \
mathbf{y}
end{bmatrix} right) \
= & left(
mathbf{A}^T
mathbf{A} + mathbf{I}
right)^{-1} left(
mathbf{A}^T mathbf{b} +
mathbf{y}
right)
end{split}
end{equation}

answered Jan 15 at 19:14

Héctor Miguel Vargas García

111

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3074575%2fminimum-of-mathbfx-y2-mathbfax-mathbfb2%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

answered Jan 15 at 17:49

LinAlg

9,4511521

$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54

add a comment |

answered Jan 15 at 17:49

LinAlg

9,4511521

$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54

add a comment |

answered Jan 15 at 17:49

LinAlg

9,4511521

answered Jan 15 at 17:49

LinAlg

9,4511521

answered Jan 15 at 17:49

LinAlg

9,4511521

answered Jan 15 at 17:49

LinAlg

9,4511521

answered Jan 15 at 17:49

LinAlg

9,4511521

$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54

add a comment |

$begingroup$
You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
$endgroup$
– Yesid Fonseca V.
Jan 15 at 19:54

You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.

– Yesid Fonseca V.
Jan 15 at 19:54

add a comment |

Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.

This time, I can outline less index-y way, by using $||x||^2=x^Tx$:

edited Jan 15 at 16:38

answered Jan 15 at 16:15

orion

13.6k11837

add a comment |

Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.

This time, I can outline less index-y way, by using $||x||^2=x^Tx$:

edited Jan 15 at 16:38

answered Jan 15 at 16:15

orion

13.6k11837

add a comment |

Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.

This time, I can outline less index-y way, by using $||x||^2=x^Tx$:

edited Jan 15 at 16:38

answered Jan 15 at 16:15

orion

13.6k11837

Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.

This time, I can outline less index-y way, by using $||x||^2=x^Tx$:

edited Jan 15 at 16:38

answered Jan 15 at 16:15

orion

13.6k11837

edited Jan 15 at 16:38

answered Jan 15 at 16:15

orion

13.6k11837

answered Jan 15 at 16:15

orion

13.6k11837

answered Jan 15 at 16:15

orion

13.6k11837

add a comment |

Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have

answered Jan 15 at 19:14

Héctor Miguel Vargas García

111

add a comment |

Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have

answered Jan 15 at 19:14

Héctor Miguel Vargas García

111

add a comment |

Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have

answered Jan 15 at 19:14

Héctor Miguel Vargas García

111

Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have

answered Jan 15 at 19:14

Héctor Miguel Vargas García

111

answered Jan 15 at 19:14

Héctor Miguel Vargas García

111

answered Jan 15 at 19:14

Héctor Miguel Vargas García

111

answered Jan 15 at 19:14

Héctor Miguel Vargas García

111

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

WAutVI8Doh5yNteN,Tfpcs86WBzjR7RmTScs7psbxcInCKeZtz qMcCxgp3WB7,vYZLrjF,IYQ,V7KuX,f2wGrmd

搜尋此網誌

Vtgyjfy