Minimum of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$












2












$begingroup$


Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that



$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$



If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.



It's easy to check that $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2 = leftVertbegin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}mathbf{x}-begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} rightVert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix}hspace{1em}$where $^dagger$ denotes de psuedo-inverse of a matrix.



I found on internet that an alternative solution, is to find the vector $mathbf{x}$ such that the gradient of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$ is null. Then, I yields to the followin system



$$mathbf{(I+A^TA)x=y+A^Tb}$$



My question is, why



$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$





[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.










share|cite|improve this question









$endgroup$












  • $begingroup$
    My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
    $endgroup$
    – littleO
    Jan 15 at 19:21


















2












$begingroup$


Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that



$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$



If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.



It's easy to check that $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2 = leftVertbegin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}mathbf{x}-begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} rightVert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix}hspace{1em}$where $^dagger$ denotes de psuedo-inverse of a matrix.



I found on internet that an alternative solution, is to find the vector $mathbf{x}$ such that the gradient of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$ is null. Then, I yields to the followin system



$$mathbf{(I+A^TA)x=y+A^Tb}$$



My question is, why



$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$





[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.










share|cite|improve this question









$endgroup$












  • $begingroup$
    My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
    $endgroup$
    – littleO
    Jan 15 at 19:21
















2












2








2


2



$begingroup$


Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that



$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$



If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.



It's easy to check that $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2 = leftVertbegin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}mathbf{x}-begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} rightVert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix}hspace{1em}$where $^dagger$ denotes de psuedo-inverse of a matrix.



I found on internet that an alternative solution, is to find the vector $mathbf{x}$ such that the gradient of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$ is null. Then, I yields to the followin system



$$mathbf{(I+A^TA)x=y+A^Tb}$$



My question is, why



$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$





[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.










share|cite|improve this question









$endgroup$




Given a matrix $mathbf{A}inmathbb{R}^{mtimes n}$ and vectors $mathbf{b}inmathbb{R}^m$ and $mathbf{y}inmathbb{R}^n$, I need to find the vector $mathbf{x_*}inmathbb{R}^n$ such that



$$mathbf{x_*} = text{arg}min_{mathbf{x}inmathbb{R}^n}{||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2}$$



If $mathbf{y}$ satisfies that $mathbf{Ay=b}$, then $mathbf{x}_*=y$. This is the trivial case. So, we assume from here to below that $mathbf{Ayneq b}$.



It's easy to check that $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2 = leftVertbegin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}mathbf{x}-begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} rightVert^2$. So, by lemmas 2.1 y 2.2 in [1] we have that $mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix}hspace{1em}$where $^dagger$ denotes de psuedo-inverse of a matrix.



I found on internet that an alternative solution, is to find the vector $mathbf{x}$ such that the gradient of $||mathbf{x-y}||^2+||mathbf{Ax}-mathbf{b}||^2$ is null. Then, I yields to the followin system



$$mathbf{(I+A^TA)x=y+A^Tb}$$



My question is, why



$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)} $$





[1] Piziak, R., and P. L. Odell. "Affine projections." Computers & Mathematics with Applications 48.1-2 (2004): 177-190.







linear-algebra optimization convex-optimization






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked Jan 15 at 15:53









Yesid Fonseca V.Yesid Fonseca V.

987




987












  • $begingroup$
    My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
    $endgroup$
    – littleO
    Jan 15 at 19:21




















  • $begingroup$
    My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
    $endgroup$
    – littleO
    Jan 15 at 19:21


















$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21






$begingroup$
My favorite definition of the pseudoinverse $M^dagger$ is that it is the linear transformation that takes a vector $z$ as input and returns the least norm solution of $Mx = hat z$, where $hat z$ is the projection of $z$ onto the column space of $M$. The pseudoinverse exists in order to solve least squares problems. The various formulas for the pseudoinverse, which hold in various situations, are all consequences of this conceptual definition of the pseudoinverse. Of course, you can also minimize $|Mx - z |^2$ by setting the derivative (gradient) equal to zero, like we do in calculus.
$endgroup$
– littleO
Jan 15 at 19:21












3 Answers
3






active

oldest

votes


















2












$begingroup$

Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
$$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$






share|cite|improve this answer









$endgroup$













  • $begingroup$
    You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
    $endgroup$
    – Yesid Fonseca V.
    Jan 15 at 19:54





















2












$begingroup$

Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).



$$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
$$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
the parts with identities ($delta$) can be resolved, and all can be divided by 2:
$$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
This can be rewriteerewritten back into matrix form:
$$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
Expand and join the $bf x$ terms:
$$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).



Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.





This time, I can outline less index-y way, by using $||x||^2=x^Tx$:



$${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
$${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
now it's
$$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
$$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
$$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).



If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
$${bf x}={bf C}^dagger{bf z}$$
which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.






share|cite|improve this answer











$endgroup$





















    1












    $begingroup$

    Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have



    begin{equation}
    begin{split}
    begin{bmatrix}
    mathbf{A} \
    mathbf{I}
    end{bmatrix}^{dagger} begin{bmatrix}
    mathbf{b} \
    mathbf{y}
    end{bmatrix} = & left(
    begin{bmatrix}
    mathbf{A}^T&
    mathbf{I}
    end{bmatrix}
    begin{bmatrix}
    mathbf{A} \
    mathbf{I}
    end{bmatrix}
    right)^{-1} left(begin{bmatrix}
    mathbf{A}^T&
    mathbf{I}
    end{bmatrix} begin{bmatrix}
    mathbf{b} \
    mathbf{y}
    end{bmatrix} right) \
    = & left(
    mathbf{A}^T
    mathbf{A} + mathbf{I}
    right)^{-1} left(
    mathbf{A}^T mathbf{b} +
    mathbf{y}
    right)
    end{split}
    end{equation}






    share|cite|improve this answer









    $endgroup$













      Your Answer





      StackExchange.ifUsing("editor", function () {
      return StackExchange.using("mathjaxEditing", function () {
      StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
      StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
      });
      });
      }, "mathjax-editing");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "69"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      noCode: true, onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });














      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3074575%2fminimum-of-mathbfx-y2-mathbfax-mathbfb2%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      3 Answers
      3






      active

      oldest

      votes








      3 Answers
      3






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      2












      $begingroup$

      Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
      $$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$






      share|cite|improve this answer









      $endgroup$













      • $begingroup$
        You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
        $endgroup$
        – Yesid Fonseca V.
        Jan 15 at 19:54


















      2












      $begingroup$

      Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
      $$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$






      share|cite|improve this answer









      $endgroup$













      • $begingroup$
        You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
        $endgroup$
        – Yesid Fonseca V.
        Jan 15 at 19:54
















      2












      2








      2





      $begingroup$

      Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
      $$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$






      share|cite|improve this answer









      $endgroup$



      Here is a shorter answer. Wikipedia mentions that the generalized inverse of a matrix $begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}$ can be expressed as $left(begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^Tbegin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}right)^{-1}begin{bmatrix}mathbf{A} \ mathbf{I}end{bmatrix}^T$, if the inverse exists. This expression simplifies to $left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T$. Therefore,
      $$mathbf{x}_* = begin{bmatrix}mathbf{A}\mathbf{I}end{bmatrix}^daggerbegin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = left(mathbf{A^TA + I}right)^{-1}begin{bmatrix}mathbf{A} \mathbf{I}end{bmatrix}^T begin{bmatrix}mathbf{b}\mathbf{y}end{bmatrix} = mathbf{(I+A^TA)^{-1}(y+A^Tb)}. $$







      share|cite|improve this answer












      share|cite|improve this answer



      share|cite|improve this answer










      answered Jan 15 at 17:49









      LinAlgLinAlg

      9,4511521




      9,4511521












      • $begingroup$
        You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
        $endgroup$
        – Yesid Fonseca V.
        Jan 15 at 19:54




















      • $begingroup$
        You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
        $endgroup$
        – Yesid Fonseca V.
        Jan 15 at 19:54


















      $begingroup$
      You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
      $endgroup$
      – Yesid Fonseca V.
      Jan 15 at 19:54






      $begingroup$
      You are right, when a matrix $mathbf{B}$ is full column rank, the pseudo-inverse $mathbf{B}^dagger$ can be caculated as $mathbf{B^dagger = (B^TB)^{-1}B^T}$. And it's easy to check that $begin{bmatrix}mathbf{A}\ mathbf{I}end{bmatrix}$ is full column rank.
      $endgroup$
      – Yesid Fonseca V.
      Jan 15 at 19:54













      2












      $begingroup$

      Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).



      $$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
      For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
      $$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
      the parts with identities ($delta$) can be resolved, and all can be divided by 2:
      $$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
      This can be rewriteerewritten back into matrix form:
      $$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
      Expand and join the $bf x$ terms:
      $$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
      The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).



      Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.





      This time, I can outline less index-y way, by using $||x||^2=x^Tx$:



      $${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
      $${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
      now it's
      $$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
      which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
      $$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
      You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
      The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
      $$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
      For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).



      If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
      $${bf x}={bf C}^dagger{bf z}$$
      which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.






      share|cite|improve this answer











      $endgroup$


















        2












        $begingroup$

        Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).



        $$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
        For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
        $$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
        the parts with identities ($delta$) can be resolved, and all can be divided by 2:
        $$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
        This can be rewriteerewritten back into matrix form:
        $$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
        Expand and join the $bf x$ terms:
        $$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
        The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).



        Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.





        This time, I can outline less index-y way, by using $||x||^2=x^Tx$:



        $${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
        $${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
        now it's
        $$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
        which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
        $$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
        You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
        The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
        $$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
        For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).



        If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
        $${bf x}={bf C}^dagger{bf z}$$
        which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.






        share|cite|improve this answer











        $endgroup$
















          2












          2








          2





          $begingroup$

          Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).



          $$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
          For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
          $$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
          the parts with identities ($delta$) can be resolved, and all can be divided by 2:
          $$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
          This can be rewriteerewritten back into matrix form:
          $$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
          Expand and join the $bf x$ terms:
          $$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
          The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).



          Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.





          This time, I can outline less index-y way, by using $||x||^2=x^Tx$:



          $${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
          $${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
          now it's
          $$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
          which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
          $$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
          You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
          The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
          $$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
          For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).



          If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
          $${bf x}={bf C}^dagger{bf z}$$
          which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.






          share|cite|improve this answer











          $endgroup$



          Differentiation gets you to result quite directly. It's easier to see if you write it with indices. In general, when in doubt, always write everything in index notation - it solves everything (I'm not using Einstein convention here to be more explicit, but skipping all $sum$ signs makes everything more readable).



          $$L=sum_i (x_i-y_i)^2 + sum_{i}(sum_j A_{ij}x_j-b_i)^2$$
          For each $k$, this must be zero (you also know $partial x_i/partial x_k=delta_{ik}$):
          $$0=frac{partial L}{partial x_k}=sum_i 2(x_i-y_i)delta_{ik}+sum_{i}2(sum_j A_{ij}x_j-b_i)sum_{j}A_{ij}delta_{jk}$$
          the parts with identities ($delta$) can be resolved, and all can be divided by 2:
          $$0=(x_k-y_k)+sum_{i}A_{ik}(sum_j A_{ij}x_j-b_i)$$
          This can be rewriteerewritten back into matrix form:
          $$0={bf x}-{bf y}+{bf A}^T ({bf A x}-{bf b})$$
          Expand and join the $bf x$ terms:
          $$({bf I}+{bf A}^T{bf A}){bf x}={bf y}+{bf A}^T{bf b}$$
          The last step requires an inverse, or a pseudoinverse, if the matrix is not invertible (in most cases it should be).



          Complications with extended system are not strictly necessary, but once you have the extended representation, the procedure is the same.





          This time, I can outline less index-y way, by using $||x||^2=x^Tx$:



          $${bf C}=begin{bmatrix}{bf A}\{bf I}end{bmatrix}$$
          $${bf z}=begin{bmatrix}{bf b}\{bf y}end{bmatrix}$$
          now it's
          $$L=||{bf C}{bf x}-{bf z}||^2={bf x}^T {bf C}^T {bf C x} - {bf x}^T{bf C}^T{bf z}-{bf z}^T{bf C x}+{bf z}^T{bf z}$$
          which you differentiate like before, just leave a "dangling index" where $x$ used to be. The derivative over a vector is a vector, one equation for each component to differentiate over. These are matrices so the order must be preserved, just leave a box where $x$ was (we differentiate first term as a product and last term is a constant):
          $$frac{partial L}{partial bf x}={Box}^T {bf C}^T {bf C x}+{bf x}^T {bf C}^T {bf C } Box- {Box}^T{bf C}^T{bf z}-{bf z}^T{bf C }Box$$
          You can imagine setting $Box=(1,0,0,ldots)$ to get $partial L/partial x_1$ and so on ($Box$ is the direction of differentiation, in $bf x$ space).
          The result of this expression is still technically a scalar so you can transpose the ones that have the box in the wrong place ($a^Tb=b^Ta$):
          $$frac{partial L}{partial bf x}=2{Box}^T {bf C}^T {bf C x}-2 {Box}^T{bf C}^T{bf z}=2Box^T({bf C}^T {bf C x}-{bf C}^T{bf z})$$
          For this to be zero, the parentheses must be zero, and you get the usual expression for least squares solution (solving the normal system, as we usually do in linear fitting).



          If you simply require ${bf C}{bf x}={bf z}$ to try to enforce the expression to equal zero exactly, you of course get an overdetermined system without a proper inverse, and the pseudoinverse gives you a least-squares solution, leading you to the simple naive writing
          $${bf x}={bf C}^dagger{bf z}$$
          which "behind the scenes" solves the normal system with $C^T C$ (or uses SVD decomposition which is again the same thing). This duality of solving normal or extended system is found in every course of linear minimization.







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited Jan 15 at 16:38

























          answered Jan 15 at 16:15









          orionorion

          13.6k11837




          13.6k11837























              1












              $begingroup$

              Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have



              begin{equation}
              begin{split}
              begin{bmatrix}
              mathbf{A} \
              mathbf{I}
              end{bmatrix}^{dagger} begin{bmatrix}
              mathbf{b} \
              mathbf{y}
              end{bmatrix} = & left(
              begin{bmatrix}
              mathbf{A}^T&
              mathbf{I}
              end{bmatrix}
              begin{bmatrix}
              mathbf{A} \
              mathbf{I}
              end{bmatrix}
              right)^{-1} left(begin{bmatrix}
              mathbf{A}^T&
              mathbf{I}
              end{bmatrix} begin{bmatrix}
              mathbf{b} \
              mathbf{y}
              end{bmatrix} right) \
              = & left(
              mathbf{A}^T
              mathbf{A} + mathbf{I}
              right)^{-1} left(
              mathbf{A}^T mathbf{b} +
              mathbf{y}
              right)
              end{split}
              end{equation}






              share|cite|improve this answer









              $endgroup$


















                1












                $begingroup$

                Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have



                begin{equation}
                begin{split}
                begin{bmatrix}
                mathbf{A} \
                mathbf{I}
                end{bmatrix}^{dagger} begin{bmatrix}
                mathbf{b} \
                mathbf{y}
                end{bmatrix} = & left(
                begin{bmatrix}
                mathbf{A}^T&
                mathbf{I}
                end{bmatrix}
                begin{bmatrix}
                mathbf{A} \
                mathbf{I}
                end{bmatrix}
                right)^{-1} left(begin{bmatrix}
                mathbf{A}^T&
                mathbf{I}
                end{bmatrix} begin{bmatrix}
                mathbf{b} \
                mathbf{y}
                end{bmatrix} right) \
                = & left(
                mathbf{A}^T
                mathbf{A} + mathbf{I}
                right)^{-1} left(
                mathbf{A}^T mathbf{b} +
                mathbf{y}
                right)
                end{split}
                end{equation}






                share|cite|improve this answer









                $endgroup$
















                  1












                  1








                  1





                  $begingroup$

                  Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have



                  begin{equation}
                  begin{split}
                  begin{bmatrix}
                  mathbf{A} \
                  mathbf{I}
                  end{bmatrix}^{dagger} begin{bmatrix}
                  mathbf{b} \
                  mathbf{y}
                  end{bmatrix} = & left(
                  begin{bmatrix}
                  mathbf{A}^T&
                  mathbf{I}
                  end{bmatrix}
                  begin{bmatrix}
                  mathbf{A} \
                  mathbf{I}
                  end{bmatrix}
                  right)^{-1} left(begin{bmatrix}
                  mathbf{A}^T&
                  mathbf{I}
                  end{bmatrix} begin{bmatrix}
                  mathbf{b} \
                  mathbf{y}
                  end{bmatrix} right) \
                  = & left(
                  mathbf{A}^T
                  mathbf{A} + mathbf{I}
                  right)^{-1} left(
                  mathbf{A}^T mathbf{b} +
                  mathbf{y}
                  right)
                  end{split}
                  end{equation}






                  share|cite|improve this answer









                  $endgroup$



                  Defining $mathbf{A}^{dagger} = (mathbf{A}^Tmathbf{A})^{-1}mathbf{A}^T$ and performing the indicated algebraic operations, then we have



                  begin{equation}
                  begin{split}
                  begin{bmatrix}
                  mathbf{A} \
                  mathbf{I}
                  end{bmatrix}^{dagger} begin{bmatrix}
                  mathbf{b} \
                  mathbf{y}
                  end{bmatrix} = & left(
                  begin{bmatrix}
                  mathbf{A}^T&
                  mathbf{I}
                  end{bmatrix}
                  begin{bmatrix}
                  mathbf{A} \
                  mathbf{I}
                  end{bmatrix}
                  right)^{-1} left(begin{bmatrix}
                  mathbf{A}^T&
                  mathbf{I}
                  end{bmatrix} begin{bmatrix}
                  mathbf{b} \
                  mathbf{y}
                  end{bmatrix} right) \
                  = & left(
                  mathbf{A}^T
                  mathbf{A} + mathbf{I}
                  right)^{-1} left(
                  mathbf{A}^T mathbf{b} +
                  mathbf{y}
                  right)
                  end{split}
                  end{equation}







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Jan 15 at 19:14









                  Héctor Miguel Vargas GarcíaHéctor Miguel Vargas García

                  111




                  111






























                      draft saved

                      draft discarded




















































                      Thanks for contributing an answer to Mathematics Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3074575%2fminimum-of-mathbfx-y2-mathbfax-mathbfb2%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Mario Kart Wii

                      The Binding of Isaac: Rebirth/Afterbirth

                      What does “Dominus providebit” mean?