Minimize $|A-XB|_F$ subject to $Xv=0$












3














Assume we are given two matrices $A, B in mathbb R^{n times m}$ and a vector $v in mathbb R^n$. Let $|cdot|_F$ be the Frobenius norm of a matrix. How can we solve the following optimization problem in $X in mathbb R^{n times n}$?



$$begin{array}{ll} text{minimize} & |A-XB|_F\ text{subject to} & Xv=0end{array}$$



Can this problem be converted to a constrained least squares problem with the optimization variable being a vector instead of a matrix? If so, does this way work?



Are there some references about solving such constrained linear least Frobenius norm problems?



Thanks!










share|cite|improve this question





























    3














    Assume we are given two matrices $A, B in mathbb R^{n times m}$ and a vector $v in mathbb R^n$. Let $|cdot|_F$ be the Frobenius norm of a matrix. How can we solve the following optimization problem in $X in mathbb R^{n times n}$?



    $$begin{array}{ll} text{minimize} & |A-XB|_F\ text{subject to} & Xv=0end{array}$$



    Can this problem be converted to a constrained least squares problem with the optimization variable being a vector instead of a matrix? If so, does this way work?



    Are there some references about solving such constrained linear least Frobenius norm problems?



    Thanks!










    share|cite|improve this question



























      3












      3








      3


      4





      Assume we are given two matrices $A, B in mathbb R^{n times m}$ and a vector $v in mathbb R^n$. Let $|cdot|_F$ be the Frobenius norm of a matrix. How can we solve the following optimization problem in $X in mathbb R^{n times n}$?



      $$begin{array}{ll} text{minimize} & |A-XB|_F\ text{subject to} & Xv=0end{array}$$



      Can this problem be converted to a constrained least squares problem with the optimization variable being a vector instead of a matrix? If so, does this way work?



      Are there some references about solving such constrained linear least Frobenius norm problems?



      Thanks!










      share|cite|improve this question















      Assume we are given two matrices $A, B in mathbb R^{n times m}$ and a vector $v in mathbb R^n$. Let $|cdot|_F$ be the Frobenius norm of a matrix. How can we solve the following optimization problem in $X in mathbb R^{n times n}$?



      $$begin{array}{ll} text{minimize} & |A-XB|_F\ text{subject to} & Xv=0end{array}$$



      Can this problem be converted to a constrained least squares problem with the optimization variable being a vector instead of a matrix? If so, does this way work?



      Are there some references about solving such constrained linear least Frobenius norm problems?



      Thanks!







      matrices reference-request convex-optimization least-squares quadratic-programming






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Mar 25 '18 at 19:55









      Rodrigo de Azevedo

      12.8k41855




      12.8k41855










      asked Apr 1 '13 at 0:57









      EthanEthan

      1,05711226




      1,05711226






















          4 Answers
          4






          active

          oldest

          votes


















          6














          As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.



          Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
          $$
          Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
          e_n=(0,0,ldots,0,1)^T.
          $$
          Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
          $$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
          And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.






          share|cite|improve this answer





















          • Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
            – Ethan
            Apr 2 '13 at 16:09












          • @Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
            – user1551
            Apr 2 '13 at 17:24










          • And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
            – user1551
            Apr 2 '13 at 17:28



















          5














          [Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]



          Yes, this problem can indeed be examined and solved in a variety of ways.



          Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
          $$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
          and
          $$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
          where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
          $$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
          text{subject to} & tilde{C} x = 0 end{array}$$ where $$
          tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
          If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.



          Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
          $$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
          Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
          $$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
          So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
          $$begin{aligned}
          L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
          &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
          &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
          &=
          mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
          $$
          Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
          $$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
          Setting to zero and solving, we find that the Lagrangian optimality condition is
          $$2 XBB^T - lambda v^T = 2AB^T$$
          So the pair of optimality conditions for the constrained problem are
          $$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
          You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
          $$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.



          Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
          $$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
          begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
          begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
          I've negated the second (block) row so that the symmetry of the system is revealed.



          Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.



          I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):



          cvx_begin
          variable X(m,n)
          minimize(norm(A-X*B,'Fro'))
          subject to
          X * v == 0
          cvx_end


          Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)



          Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.






          share|cite|improve this answer























          • Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
            – Ethan
            Apr 2 '13 at 15:06










          • Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
            – Michael Grant
            Apr 2 '13 at 15:21










          • The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
            – Michael Grant
            Apr 2 '13 at 15:22










          • Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
            – Ethan
            Apr 2 '13 at 15:29










          • Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
            – Michael Grant
            Apr 2 '13 at 15:45



















          2














          suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$



          if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.



          $hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.



          Thus the square of $F$ norm is



          $mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$



          where $(AB')_k$ is the $k$th row of $AB'$.



          Thus the norm is



          $mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$



          where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.



          The constraint condition is



          $Xv = 0$, then that is $langle X_k, v'_krangle = 0$






          share|cite|improve this answer





























            1














            The closed-form solution to this problem is
            $$eqalign{
            X &= A,(HB)^+H cr
            }$$

            where $M^+$ denotes the pseudoinverse of $M$

            and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$



            The derivation goes as follows.




            Consider the linearly constrained, linear problem
            $$eqalign{
            &min &|Cx-a|_F cr
            &{rm s.t.} &Ex=fcr
            }$$

            The projector into the nullspace of $E$ will be useful
            $$eqalign{P &= (I-E^+E) implies EP = 0}$$
            The general solution of the constraint equation is
            $$eqalign{
            x &= E^+f + Py cr
            }$$
            where the $y$ vector is a free parameter.

            Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
            $$eqalign{
            y &= (CP)^+(a-CE^+f) cr
            x &= E^+f + P (CP)^+(a-CE^+f) cr
            }$$

            We can apply this solution to the current problem by vectorization and making the following identifications
            $$eqalign{
            C &= (B^Totimes I) cr
            E &= (v^Totimes I) cr
            f &= 0 cr
            a &= {rm vec}(A) cr
            x &= {rm vec}(X) cr
            }$$

            This yields a closed-form solution of the problem (in terms of pseudoinverses)
            $$eqalign{
            x
            &= P (CP)^+a cr
            &= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
            }$$

            As a final step, reshape the vector $x$ into the matrix $X$.






            But let's see if we can calculate the $X$ matrix more directly.

            First recall that the pseudoinverse of the vector $v$ has a closed-form
            $$v^+=frac{v^T}{v^Tv}$$
            as does the pseudoinverse of a Kronecker product
            $$(Aotimes B)^+ = A^+otimes B^+ $$
            Therefore we can do the following
            $$eqalign{
            E^+ &= frac{v}{v^Tv}otimes I cr
            P &= I-E^+E cr
            &= Iotimes I - vv^+otimes Icr
            &= (I-vv^+)otimes Icr
            &= Hotimes I cr
            CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
            x&= P(CP)^+a cr
            &= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
            &= big(H(B^TH)^+otimes Ibig)a cr
            &= {rm vec}big(A(HB)^+Hbig) cr
            X&= A(HB)^+H crcr
            }$$

            As a quick sanity check, since $Hv=0$ one also has $Xv=0$.



            And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.






            share|cite|improve this answer























              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "69"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              noCode: true, onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f347753%2fminimize-a-xb-f-subject-to-xv-0%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              4 Answers
              4






              active

              oldest

              votes








              4 Answers
              4






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              6














              As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.



              Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
              $$
              Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
              e_n=(0,0,ldots,0,1)^T.
              $$
              Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
              $$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
              And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.






              share|cite|improve this answer





















              • Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
                – Ethan
                Apr 2 '13 at 16:09












              • @Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
                – user1551
                Apr 2 '13 at 17:24










              • And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
                – user1551
                Apr 2 '13 at 17:28
















              6














              As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.



              Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
              $$
              Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
              e_n=(0,0,ldots,0,1)^T.
              $$
              Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
              $$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
              And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.






              share|cite|improve this answer





















              • Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
                – Ethan
                Apr 2 '13 at 16:09












              • @Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
                – user1551
                Apr 2 '13 at 17:24










              • And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
                – user1551
                Apr 2 '13 at 17:28














              6












              6








              6






              As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.



              Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
              $$
              Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
              e_n=(0,0,ldots,0,1)^T.
              $$
              Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
              $$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
              And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.






              share|cite|improve this answer












              As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.



              Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
              $$
              Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
              e_n=(0,0,ldots,0,1)^T.
              $$
              Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
              $$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
              And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.







              share|cite|improve this answer












              share|cite|improve this answer



              share|cite|improve this answer










              answered Apr 1 '13 at 10:04









              user1551user1551

              71.8k566125




              71.8k566125












              • Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
                – Ethan
                Apr 2 '13 at 16:09












              • @Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
                – user1551
                Apr 2 '13 at 17:24










              • And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
                – user1551
                Apr 2 '13 at 17:28


















              • Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
                – Ethan
                Apr 2 '13 at 16:09












              • @Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
                – user1551
                Apr 2 '13 at 17:24










              • And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
                – user1551
                Apr 2 '13 at 17:28
















              Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
              – Ethan
              Apr 2 '13 at 16:09






              Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
              – Ethan
              Apr 2 '13 at 16:09














              @Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
              – user1551
              Apr 2 '13 at 17:24




              @Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
              – user1551
              Apr 2 '13 at 17:24












              And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
              – user1551
              Apr 2 '13 at 17:28




              And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
              – user1551
              Apr 2 '13 at 17:28











              5














              [Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]



              Yes, this problem can indeed be examined and solved in a variety of ways.



              Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
              $$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
              and
              $$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
              where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
              $$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
              text{subject to} & tilde{C} x = 0 end{array}$$ where $$
              tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
              If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.



              Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
              $$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
              Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
              $$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
              So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
              $$begin{aligned}
              L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
              &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
              &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
              &=
              mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
              $$
              Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
              $$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
              Setting to zero and solving, we find that the Lagrangian optimality condition is
              $$2 XBB^T - lambda v^T = 2AB^T$$
              So the pair of optimality conditions for the constrained problem are
              $$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
              You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
              $$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.



              Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
              $$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
              begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
              begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
              I've negated the second (block) row so that the symmetry of the system is revealed.



              Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.



              I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):



              cvx_begin
              variable X(m,n)
              minimize(norm(A-X*B,'Fro'))
              subject to
              X * v == 0
              cvx_end


              Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)



              Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.






              share|cite|improve this answer























              • Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
                – Ethan
                Apr 2 '13 at 15:06










              • Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
                – Michael Grant
                Apr 2 '13 at 15:21










              • The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
                – Michael Grant
                Apr 2 '13 at 15:22










              • Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
                – Ethan
                Apr 2 '13 at 15:29










              • Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
                – Michael Grant
                Apr 2 '13 at 15:45
















              5














              [Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]



              Yes, this problem can indeed be examined and solved in a variety of ways.



              Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
              $$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
              and
              $$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
              where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
              $$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
              text{subject to} & tilde{C} x = 0 end{array}$$ where $$
              tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
              If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.



              Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
              $$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
              Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
              $$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
              So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
              $$begin{aligned}
              L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
              &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
              &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
              &=
              mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
              $$
              Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
              $$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
              Setting to zero and solving, we find that the Lagrangian optimality condition is
              $$2 XBB^T - lambda v^T = 2AB^T$$
              So the pair of optimality conditions for the constrained problem are
              $$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
              You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
              $$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.



              Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
              $$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
              begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
              begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
              I've negated the second (block) row so that the symmetry of the system is revealed.



              Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.



              I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):



              cvx_begin
              variable X(m,n)
              minimize(norm(A-X*B,'Fro'))
              subject to
              X * v == 0
              cvx_end


              Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)



              Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.






              share|cite|improve this answer























              • Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
                – Ethan
                Apr 2 '13 at 15:06










              • Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
                – Michael Grant
                Apr 2 '13 at 15:21










              • The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
                – Michael Grant
                Apr 2 '13 at 15:22










              • Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
                – Ethan
                Apr 2 '13 at 15:29










              • Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
                – Michael Grant
                Apr 2 '13 at 15:45














              5












              5








              5






              [Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]



              Yes, this problem can indeed be examined and solved in a variety of ways.



              Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
              $$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
              and
              $$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
              where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
              $$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
              text{subject to} & tilde{C} x = 0 end{array}$$ where $$
              tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
              If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.



              Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
              $$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
              Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
              $$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
              So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
              $$begin{aligned}
              L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
              &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
              &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
              &=
              mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
              $$
              Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
              $$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
              Setting to zero and solving, we find that the Lagrangian optimality condition is
              $$2 XBB^T - lambda v^T = 2AB^T$$
              So the pair of optimality conditions for the constrained problem are
              $$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
              You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
              $$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.



              Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
              $$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
              begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
              begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
              I've negated the second (block) row so that the symmetry of the system is revealed.



              Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.



              I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):



              cvx_begin
              variable X(m,n)
              minimize(norm(A-X*B,'Fro'))
              subject to
              X * v == 0
              cvx_end


              Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)



              Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.






              share|cite|improve this answer














              [Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]



              Yes, this problem can indeed be examined and solved in a variety of ways.



              Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
              $$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
              and
              $$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
              where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
              $$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
              text{subject to} & tilde{C} x = 0 end{array}$$ where $$
              tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
              If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.



              Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
              $$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
              Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
              $$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
              So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
              $$begin{aligned}
              L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
              &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
              &=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
              &=
              mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
              $$
              Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
              $$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
              Setting to zero and solving, we find that the Lagrangian optimality condition is
              $$2 XBB^T - lambda v^T = 2AB^T$$
              So the pair of optimality conditions for the constrained problem are
              $$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
              You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
              $$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.



              Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
              $$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
              begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
              begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
              I've negated the second (block) row so that the symmetry of the system is revealed.



              Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.



              I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):



              cvx_begin
              variable X(m,n)
              minimize(norm(A-X*B,'Fro'))
              subject to
              X * v == 0
              cvx_end


              Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)



              Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.







              share|cite|improve this answer














              share|cite|improve this answer



              share|cite|improve this answer








              edited Apr 1 '13 at 11:38

























              answered Apr 1 '13 at 3:29









              Michael GrantMichael Grant

              14.9k11944




              14.9k11944












              • Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
                – Ethan
                Apr 2 '13 at 15:06










              • Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
                – Michael Grant
                Apr 2 '13 at 15:21










              • The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
                – Michael Grant
                Apr 2 '13 at 15:22










              • Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
                – Ethan
                Apr 2 '13 at 15:29










              • Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
                – Michael Grant
                Apr 2 '13 at 15:45


















              • Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
                – Ethan
                Apr 2 '13 at 15:06










              • Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
                – Michael Grant
                Apr 2 '13 at 15:21










              • The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
                – Michael Grant
                Apr 2 '13 at 15:22










              • Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
                – Ethan
                Apr 2 '13 at 15:29










              • Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
                – Michael Grant
                Apr 2 '13 at 15:45
















              Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
              – Ethan
              Apr 2 '13 at 15:06




              Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
              – Ethan
              Apr 2 '13 at 15:06












              Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
              – Michael Grant
              Apr 2 '13 at 15:21




              Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
              – Michael Grant
              Apr 2 '13 at 15:21












              The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
              – Michael Grant
              Apr 2 '13 at 15:22




              The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
              – Michael Grant
              Apr 2 '13 at 15:22












              Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
              – Ethan
              Apr 2 '13 at 15:29




              Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
              – Ethan
              Apr 2 '13 at 15:29












              Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
              – Michael Grant
              Apr 2 '13 at 15:45




              Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
              – Michael Grant
              Apr 2 '13 at 15:45











              2














              suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$



              if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.



              $hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.



              Thus the square of $F$ norm is



              $mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$



              where $(AB')_k$ is the $k$th row of $AB'$.



              Thus the norm is



              $mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$



              where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.



              The constraint condition is



              $Xv = 0$, then that is $langle X_k, v'_krangle = 0$






              share|cite|improve this answer


























                2














                suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$



                if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.



                $hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.



                Thus the square of $F$ norm is



                $mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$



                where $(AB')_k$ is the $k$th row of $AB'$.



                Thus the norm is



                $mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$



                where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.



                The constraint condition is



                $Xv = 0$, then that is $langle X_k, v'_krangle = 0$






                share|cite|improve this answer
























                  2












                  2








                  2






                  suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$



                  if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.



                  $hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.



                  Thus the square of $F$ norm is



                  $mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$



                  where $(AB')_k$ is the $k$th row of $AB'$.



                  Thus the norm is



                  $mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$



                  where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.



                  The constraint condition is



                  $Xv = 0$, then that is $langle X_k, v'_krangle = 0$






                  share|cite|improve this answer












                  suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$



                  if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.



                  $hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.



                  Thus the square of $F$ norm is



                  $mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$



                  where $(AB')_k$ is the $k$th row of $AB'$.



                  Thus the norm is



                  $mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$



                  where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.



                  The constraint condition is



                  $Xv = 0$, then that is $langle X_k, v'_krangle = 0$







                  share|cite|improve this answer












                  share|cite|improve this answer



                  share|cite|improve this answer










                  answered Apr 1 '13 at 2:57









                  YiminYimin

                  2,1271124




                  2,1271124























                      1














                      The closed-form solution to this problem is
                      $$eqalign{
                      X &= A,(HB)^+H cr
                      }$$

                      where $M^+$ denotes the pseudoinverse of $M$

                      and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$



                      The derivation goes as follows.




                      Consider the linearly constrained, linear problem
                      $$eqalign{
                      &min &|Cx-a|_F cr
                      &{rm s.t.} &Ex=fcr
                      }$$

                      The projector into the nullspace of $E$ will be useful
                      $$eqalign{P &= (I-E^+E) implies EP = 0}$$
                      The general solution of the constraint equation is
                      $$eqalign{
                      x &= E^+f + Py cr
                      }$$
                      where the $y$ vector is a free parameter.

                      Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
                      $$eqalign{
                      y &= (CP)^+(a-CE^+f) cr
                      x &= E^+f + P (CP)^+(a-CE^+f) cr
                      }$$

                      We can apply this solution to the current problem by vectorization and making the following identifications
                      $$eqalign{
                      C &= (B^Totimes I) cr
                      E &= (v^Totimes I) cr
                      f &= 0 cr
                      a &= {rm vec}(A) cr
                      x &= {rm vec}(X) cr
                      }$$

                      This yields a closed-form solution of the problem (in terms of pseudoinverses)
                      $$eqalign{
                      x
                      &= P (CP)^+a cr
                      &= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
                      }$$

                      As a final step, reshape the vector $x$ into the matrix $X$.






                      But let's see if we can calculate the $X$ matrix more directly.

                      First recall that the pseudoinverse of the vector $v$ has a closed-form
                      $$v^+=frac{v^T}{v^Tv}$$
                      as does the pseudoinverse of a Kronecker product
                      $$(Aotimes B)^+ = A^+otimes B^+ $$
                      Therefore we can do the following
                      $$eqalign{
                      E^+ &= frac{v}{v^Tv}otimes I cr
                      P &= I-E^+E cr
                      &= Iotimes I - vv^+otimes Icr
                      &= (I-vv^+)otimes Icr
                      &= Hotimes I cr
                      CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
                      x&= P(CP)^+a cr
                      &= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
                      &= big(H(B^TH)^+otimes Ibig)a cr
                      &= {rm vec}big(A(HB)^+Hbig) cr
                      X&= A(HB)^+H crcr
                      }$$

                      As a quick sanity check, since $Hv=0$ one also has $Xv=0$.



                      And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.






                      share|cite|improve this answer




























                        1














                        The closed-form solution to this problem is
                        $$eqalign{
                        X &= A,(HB)^+H cr
                        }$$

                        where $M^+$ denotes the pseudoinverse of $M$

                        and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$



                        The derivation goes as follows.




                        Consider the linearly constrained, linear problem
                        $$eqalign{
                        &min &|Cx-a|_F cr
                        &{rm s.t.} &Ex=fcr
                        }$$

                        The projector into the nullspace of $E$ will be useful
                        $$eqalign{P &= (I-E^+E) implies EP = 0}$$
                        The general solution of the constraint equation is
                        $$eqalign{
                        x &= E^+f + Py cr
                        }$$
                        where the $y$ vector is a free parameter.

                        Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
                        $$eqalign{
                        y &= (CP)^+(a-CE^+f) cr
                        x &= E^+f + P (CP)^+(a-CE^+f) cr
                        }$$

                        We can apply this solution to the current problem by vectorization and making the following identifications
                        $$eqalign{
                        C &= (B^Totimes I) cr
                        E &= (v^Totimes I) cr
                        f &= 0 cr
                        a &= {rm vec}(A) cr
                        x &= {rm vec}(X) cr
                        }$$

                        This yields a closed-form solution of the problem (in terms of pseudoinverses)
                        $$eqalign{
                        x
                        &= P (CP)^+a cr
                        &= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
                        }$$

                        As a final step, reshape the vector $x$ into the matrix $X$.






                        But let's see if we can calculate the $X$ matrix more directly.

                        First recall that the pseudoinverse of the vector $v$ has a closed-form
                        $$v^+=frac{v^T}{v^Tv}$$
                        as does the pseudoinverse of a Kronecker product
                        $$(Aotimes B)^+ = A^+otimes B^+ $$
                        Therefore we can do the following
                        $$eqalign{
                        E^+ &= frac{v}{v^Tv}otimes I cr
                        P &= I-E^+E cr
                        &= Iotimes I - vv^+otimes Icr
                        &= (I-vv^+)otimes Icr
                        &= Hotimes I cr
                        CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
                        x&= P(CP)^+a cr
                        &= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
                        &= big(H(B^TH)^+otimes Ibig)a cr
                        &= {rm vec}big(A(HB)^+Hbig) cr
                        X&= A(HB)^+H crcr
                        }$$

                        As a quick sanity check, since $Hv=0$ one also has $Xv=0$.



                        And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.






                        share|cite|improve this answer


























                          1












                          1








                          1






                          The closed-form solution to this problem is
                          $$eqalign{
                          X &= A,(HB)^+H cr
                          }$$

                          where $M^+$ denotes the pseudoinverse of $M$

                          and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$



                          The derivation goes as follows.




                          Consider the linearly constrained, linear problem
                          $$eqalign{
                          &min &|Cx-a|_F cr
                          &{rm s.t.} &Ex=fcr
                          }$$

                          The projector into the nullspace of $E$ will be useful
                          $$eqalign{P &= (I-E^+E) implies EP = 0}$$
                          The general solution of the constraint equation is
                          $$eqalign{
                          x &= E^+f + Py cr
                          }$$
                          where the $y$ vector is a free parameter.

                          Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
                          $$eqalign{
                          y &= (CP)^+(a-CE^+f) cr
                          x &= E^+f + P (CP)^+(a-CE^+f) cr
                          }$$

                          We can apply this solution to the current problem by vectorization and making the following identifications
                          $$eqalign{
                          C &= (B^Totimes I) cr
                          E &= (v^Totimes I) cr
                          f &= 0 cr
                          a &= {rm vec}(A) cr
                          x &= {rm vec}(X) cr
                          }$$

                          This yields a closed-form solution of the problem (in terms of pseudoinverses)
                          $$eqalign{
                          x
                          &= P (CP)^+a cr
                          &= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
                          }$$

                          As a final step, reshape the vector $x$ into the matrix $X$.






                          But let's see if we can calculate the $X$ matrix more directly.

                          First recall that the pseudoinverse of the vector $v$ has a closed-form
                          $$v^+=frac{v^T}{v^Tv}$$
                          as does the pseudoinverse of a Kronecker product
                          $$(Aotimes B)^+ = A^+otimes B^+ $$
                          Therefore we can do the following
                          $$eqalign{
                          E^+ &= frac{v}{v^Tv}otimes I cr
                          P &= I-E^+E cr
                          &= Iotimes I - vv^+otimes Icr
                          &= (I-vv^+)otimes Icr
                          &= Hotimes I cr
                          CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
                          x&= P(CP)^+a cr
                          &= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
                          &= big(H(B^TH)^+otimes Ibig)a cr
                          &= {rm vec}big(A(HB)^+Hbig) cr
                          X&= A(HB)^+H crcr
                          }$$

                          As a quick sanity check, since $Hv=0$ one also has $Xv=0$.



                          And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.






                          share|cite|improve this answer














                          The closed-form solution to this problem is
                          $$eqalign{
                          X &= A,(HB)^+H cr
                          }$$

                          where $M^+$ denotes the pseudoinverse of $M$

                          and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$



                          The derivation goes as follows.




                          Consider the linearly constrained, linear problem
                          $$eqalign{
                          &min &|Cx-a|_F cr
                          &{rm s.t.} &Ex=fcr
                          }$$

                          The projector into the nullspace of $E$ will be useful
                          $$eqalign{P &= (I-E^+E) implies EP = 0}$$
                          The general solution of the constraint equation is
                          $$eqalign{
                          x &= E^+f + Py cr
                          }$$
                          where the $y$ vector is a free parameter.

                          Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
                          $$eqalign{
                          y &= (CP)^+(a-CE^+f) cr
                          x &= E^+f + P (CP)^+(a-CE^+f) cr
                          }$$

                          We can apply this solution to the current problem by vectorization and making the following identifications
                          $$eqalign{
                          C &= (B^Totimes I) cr
                          E &= (v^Totimes I) cr
                          f &= 0 cr
                          a &= {rm vec}(A) cr
                          x &= {rm vec}(X) cr
                          }$$

                          This yields a closed-form solution of the problem (in terms of pseudoinverses)
                          $$eqalign{
                          x
                          &= P (CP)^+a cr
                          &= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
                          }$$

                          As a final step, reshape the vector $x$ into the matrix $X$.






                          But let's see if we can calculate the $X$ matrix more directly.

                          First recall that the pseudoinverse of the vector $v$ has a closed-form
                          $$v^+=frac{v^T}{v^Tv}$$
                          as does the pseudoinverse of a Kronecker product
                          $$(Aotimes B)^+ = A^+otimes B^+ $$
                          Therefore we can do the following
                          $$eqalign{
                          E^+ &= frac{v}{v^Tv}otimes I cr
                          P &= I-E^+E cr
                          &= Iotimes I - vv^+otimes Icr
                          &= (I-vv^+)otimes Icr
                          &= Hotimes I cr
                          CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
                          x&= P(CP)^+a cr
                          &= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
                          &= big(H(B^TH)^+otimes Ibig)a cr
                          &= {rm vec}big(A(HB)^+Hbig) cr
                          X&= A(HB)^+H crcr
                          }$$

                          As a quick sanity check, since $Hv=0$ one also has $Xv=0$.



                          And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.







                          share|cite|improve this answer














                          share|cite|improve this answer



                          share|cite|improve this answer








                          edited 2 days ago

























                          answered Jun 28 '18 at 17:09









                          greggreg

                          7,5451821




                          7,5451821






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Mathematics Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.





                              Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                              Please pay close attention to the following guidance:


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f347753%2fminimize-a-xb-f-subject-to-xv-0%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Mario Kart Wii

                              What does “Dominus providebit” mean?

                              Antonio Litta Visconti Arese