Minimize $|A-XB|_F$ subject to $Xv=0$
Assume we are given two matrices $A, B in mathbb R^{n times m}$ and a vector $v in mathbb R^n$. Let $|cdot|_F$ be the Frobenius norm of a matrix. How can we solve the following optimization problem in $X in mathbb R^{n times n}$?
$$begin{array}{ll} text{minimize} & |A-XB|_F\ text{subject to} & Xv=0end{array}$$
Can this problem be converted to a constrained least squares problem with the optimization variable being a vector instead of a matrix? If so, does this way work?
Are there some references about solving such constrained linear least Frobenius norm problems?
Thanks!
matrices reference-request convex-optimization least-squares quadratic-programming
add a comment |
Assume we are given two matrices $A, B in mathbb R^{n times m}$ and a vector $v in mathbb R^n$. Let $|cdot|_F$ be the Frobenius norm of a matrix. How can we solve the following optimization problem in $X in mathbb R^{n times n}$?
$$begin{array}{ll} text{minimize} & |A-XB|_F\ text{subject to} & Xv=0end{array}$$
Can this problem be converted to a constrained least squares problem with the optimization variable being a vector instead of a matrix? If so, does this way work?
Are there some references about solving such constrained linear least Frobenius norm problems?
Thanks!
matrices reference-request convex-optimization least-squares quadratic-programming
add a comment |
Assume we are given two matrices $A, B in mathbb R^{n times m}$ and a vector $v in mathbb R^n$. Let $|cdot|_F$ be the Frobenius norm of a matrix. How can we solve the following optimization problem in $X in mathbb R^{n times n}$?
$$begin{array}{ll} text{minimize} & |A-XB|_F\ text{subject to} & Xv=0end{array}$$
Can this problem be converted to a constrained least squares problem with the optimization variable being a vector instead of a matrix? If so, does this way work?
Are there some references about solving such constrained linear least Frobenius norm problems?
Thanks!
matrices reference-request convex-optimization least-squares quadratic-programming
Assume we are given two matrices $A, B in mathbb R^{n times m}$ and a vector $v in mathbb R^n$. Let $|cdot|_F$ be the Frobenius norm of a matrix. How can we solve the following optimization problem in $X in mathbb R^{n times n}$?
$$begin{array}{ll} text{minimize} & |A-XB|_F\ text{subject to} & Xv=0end{array}$$
Can this problem be converted to a constrained least squares problem with the optimization variable being a vector instead of a matrix? If so, does this way work?
Are there some references about solving such constrained linear least Frobenius norm problems?
Thanks!
matrices reference-request convex-optimization least-squares quadratic-programming
matrices reference-request convex-optimization least-squares quadratic-programming
edited Mar 25 '18 at 19:55
Rodrigo de Azevedo
12.8k41855
12.8k41855
asked Apr 1 '13 at 0:57
EthanEthan
1,05711226
1,05711226
add a comment |
add a comment |
4 Answers
4
active
oldest
votes
As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.
Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
$$
Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
e_n=(0,0,ldots,0,1)^T.
$$
Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
$$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.
Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
– Ethan
Apr 2 '13 at 16:09
@Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
– user1551
Apr 2 '13 at 17:24
And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
– user1551
Apr 2 '13 at 17:28
add a comment |
[Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]
Yes, this problem can indeed be examined and solved in a variety of ways.
Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
$$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
and
$$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
$$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
text{subject to} & tilde{C} x = 0 end{array}$$ where $$
tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.
Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
$$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
$$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
$$begin{aligned}
L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
&=
mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
$$
Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
$$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
Setting to zero and solving, we find that the Lagrangian optimality condition is
$$2 XBB^T - lambda v^T = 2AB^T$$
So the pair of optimality conditions for the constrained problem are
$$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
$$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.
Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
$$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
I've negated the second (block) row so that the symmetry of the system is revealed.
Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.
I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):
cvx_begin
variable X(m,n)
minimize(norm(A-X*B,'Fro'))
subject to
X * v == 0
cvx_end
Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)
Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.
Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
– Ethan
Apr 2 '13 at 15:06
Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
– Michael Grant
Apr 2 '13 at 15:21
The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
– Michael Grant
Apr 2 '13 at 15:22
Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
– Ethan
Apr 2 '13 at 15:29
Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
– Michael Grant
Apr 2 '13 at 15:45
|
show 2 more comments
suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$
if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.
$hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.
Thus the square of $F$ norm is
$mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$
where $(AB')_k$ is the $k$th row of $AB'$.
Thus the norm is
$mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$
where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.
The constraint condition is
$Xv = 0$, then that is $langle X_k, v'_krangle = 0$
add a comment |
The closed-form solution to this problem is
$$eqalign{
X &= A,(HB)^+H cr
}$$
where $M^+$ denotes the pseudoinverse of $M$
and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$
The derivation goes as follows.
Consider the linearly constrained, linear problem
$$eqalign{
&min &|Cx-a|_F cr
&{rm s.t.} &Ex=fcr
}$$
The projector into the nullspace of $E$ will be useful
$$eqalign{P &= (I-E^+E) implies EP = 0}$$
The general solution of the constraint equation is
$$eqalign{
x &= E^+f + Py cr
}$$ where the $y$ vector is a free parameter.
Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
$$eqalign{
y &= (CP)^+(a-CE^+f) cr
x &= E^+f + P (CP)^+(a-CE^+f) cr
}$$
We can apply this solution to the current problem by vectorization and making the following identifications
$$eqalign{
C &= (B^Totimes I) cr
E &= (v^Totimes I) cr
f &= 0 cr
a &= {rm vec}(A) cr
x &= {rm vec}(X) cr
}$$
This yields a closed-form solution of the problem (in terms of pseudoinverses)
$$eqalign{
x
&= P (CP)^+a cr
&= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
}$$
As a final step, reshape the vector $x$ into the matrix $X$.
But let's see if we can calculate the $X$ matrix more directly.
First recall that the pseudoinverse of the vector $v$ has a closed-form
$$v^+=frac{v^T}{v^Tv}$$
as does the pseudoinverse of a Kronecker product
$$(Aotimes B)^+ = A^+otimes B^+ $$
Therefore we can do the following
$$eqalign{
E^+ &= frac{v}{v^Tv}otimes I cr
P &= I-E^+E cr
&= Iotimes I - vv^+otimes Icr
&= (I-vv^+)otimes Icr
&= Hotimes I cr
CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
x&= P(CP)^+a cr
&= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
&= big(H(B^TH)^+otimes Ibig)a cr
&= {rm vec}big(A(HB)^+Hbig) cr
X&= A(HB)^+H crcr
}$$
As a quick sanity check, since $Hv=0$ one also has $Xv=0$.
And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f347753%2fminimize-a-xb-f-subject-to-xv-0%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.
Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
$$
Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
e_n=(0,0,ldots,0,1)^T.
$$
Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
$$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.
Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
– Ethan
Apr 2 '13 at 16:09
@Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
– user1551
Apr 2 '13 at 17:24
And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
– user1551
Apr 2 '13 at 17:28
add a comment |
As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.
Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
$$
Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
e_n=(0,0,ldots,0,1)^T.
$$
Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
$$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.
Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
– Ethan
Apr 2 '13 at 16:09
@Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
– user1551
Apr 2 '13 at 17:24
And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
– user1551
Apr 2 '13 at 17:28
add a comment |
As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.
Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
$$
Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
e_n=(0,0,ldots,0,1)^T.
$$
Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
$$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.
As the others show, the answer to your question is affirmative. However, I don't see what's the point of doing so, when the problem can actually be converted into an unconstrained least square problem.
Let $Q$ be a real orthogonal matrix that has $frac{v}{|v|}$ as its last column. For instance, you may take $Q$ as a Householder matrix:
$$
Q = I - 2uu^T, u = frac{v-|v|e_n}{|v-|v|e_n|},
e_n=(0,0,ldots,0,1)^T.
$$
Then $Xv=0$ means that $XQe_n=0$, which in turn implies that $XQ$ is a matrix of the form $XQ = (widetilde{X},0)$, where $widetilde{X}inmathbb{R}^{ntimes(n-1)}$. So, if you partition $Q^TB$ as $Q^TB = pmatrix{widetilde{B}\ ast}$, where $widetilde{B}$ is $(n-1)times m$, then your minimisation problem can be rewritten as the unconstrained least-square problem
$$min_{widetilde{X}inmathbb{R}^{ntimes(n-1)}} |A-widetilde{X}widetilde{B}|_F.$$
And the least-norm solution is given by $widetilde{X} = Awidetilde{B}^+$, where $widetilde{B}^+$ denotes the Moore-Penrose pseudoinverse of $widetilde{B}$. Once $widetilde{X}$ is obtained, you can recover $X = (widetilde{X},0)Q^T$.
answered Apr 1 '13 at 10:04
user1551user1551
71.8k566125
71.8k566125
Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
– Ethan
Apr 2 '13 at 16:09
@Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
– user1551
Apr 2 '13 at 17:24
And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
– user1551
Apr 2 '13 at 17:28
add a comment |
Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
– Ethan
Apr 2 '13 at 16:09
@Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
– user1551
Apr 2 '13 at 17:24
And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
– user1551
Apr 2 '13 at 17:28
Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
– Ethan
Apr 2 '13 at 16:09
Thanks, this is a very clever way indeed. I very much appreciate it. (1) Is there some consideration of choosing $Q$ besides it being orthogonal and has $v/|v|$ as its last column? (2) Did you come up with it yourself or is there some reference for it?
– Ethan
Apr 2 '13 at 16:09
@Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
– user1551
Apr 2 '13 at 17:24
@Ethan There aren't any other considerations. Orthogonality is required to preserve the Frobenius norm. Setting $v/|v|$ as the last column is just for convenience. You may set it as the first column if you want. And other matrices than the Householder matrix can be used, but the Householder matrix is more convenient in this case because (1) it is symmetric (so you don't actually need to transpose $Q$ in $QT$) and (2) you don't need to store $u$ instead of $n^2$ matrix entries.
– user1551
Apr 2 '13 at 17:24
And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
– user1551
Apr 2 '13 at 17:28
And the Householder trick is well known because the Householder matrix is used in some other matrix algorithms (such as QR decomposition). But I don't have any references at hand.
– user1551
Apr 2 '13 at 17:28
add a comment |
[Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]
Yes, this problem can indeed be examined and solved in a variety of ways.
Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
$$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
and
$$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
$$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
text{subject to} & tilde{C} x = 0 end{array}$$ where $$
tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.
Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
$$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
$$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
$$begin{aligned}
L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
&=
mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
$$
Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
$$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
Setting to zero and solving, we find that the Lagrangian optimality condition is
$$2 XBB^T - lambda v^T = 2AB^T$$
So the pair of optimality conditions for the constrained problem are
$$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
$$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.
Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
$$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
I've negated the second (block) row so that the symmetry of the system is revealed.
Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.
I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):
cvx_begin
variable X(m,n)
minimize(norm(A-X*B,'Fro'))
subject to
X * v == 0
cvx_end
Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)
Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.
Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
– Ethan
Apr 2 '13 at 15:06
Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
– Michael Grant
Apr 2 '13 at 15:21
The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
– Michael Grant
Apr 2 '13 at 15:22
Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
– Ethan
Apr 2 '13 at 15:29
Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
– Michael Grant
Apr 2 '13 at 15:45
|
show 2 more comments
[Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]
Yes, this problem can indeed be examined and solved in a variety of ways.
Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
$$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
and
$$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
$$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
text{subject to} & tilde{C} x = 0 end{array}$$ where $$
tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.
Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
$$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
$$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
$$begin{aligned}
L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
&=
mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
$$
Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
$$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
Setting to zero and solving, we find that the Lagrangian optimality condition is
$$2 XBB^T - lambda v^T = 2AB^T$$
So the pair of optimality conditions for the constrained problem are
$$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
$$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.
Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
$$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
I've negated the second (block) row so that the symmetry of the system is revealed.
Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.
I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):
cvx_begin
variable X(m,n)
minimize(norm(A-X*B,'Fro'))
subject to
X * v == 0
cvx_end
Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)
Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.
Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
– Ethan
Apr 2 '13 at 15:06
Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
– Michael Grant
Apr 2 '13 at 15:21
The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
– Michael Grant
Apr 2 '13 at 15:22
Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
– Ethan
Apr 2 '13 at 15:29
Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
– Michael Grant
Apr 2 '13 at 15:45
|
show 2 more comments
[Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]
Yes, this problem can indeed be examined and solved in a variety of ways.
Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
$$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
and
$$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
$$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
text{subject to} & tilde{C} x = 0 end{array}$$ where $$
tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.
Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
$$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
$$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
$$begin{aligned}
L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
&=
mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
$$
Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
$$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
Setting to zero and solving, we find that the Lagrangian optimality condition is
$$2 XBB^T - lambda v^T = 2AB^T$$
So the pair of optimality conditions for the constrained problem are
$$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
$$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.
Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
$$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
I've negated the second (block) row so that the symmetry of the system is revealed.
Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.
I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):
cvx_begin
variable X(m,n)
minimize(norm(A-X*B,'Fro'))
subject to
X * v == 0
cvx_end
Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)
Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.
[Major changes here, sorry! I realized the matrix calculus approach has some real advantages here after I submitted my first answer.]
Yes, this problem can indeed be examined and solved in a variety of ways.
Kronecker products. Kronecker products provide a concise way to relate matrix equations and standard matrix-vector equations. Using them, we can say that
$$mathop{textrm{vec}}(A-XB) = mathop{textrm{vec}}(A) - (B^Totimes I) mathop{textrm{vec}}(X)$$
and
$$mathop{textrm{vec}}(Xv) = (v^Totimes I)mathop{textrm{vec}}(X),$$
where $mathop{textrm{vec}}(cdot)$ stacks the columns of its input argument into a single column vector. If $Xinmathbb{R}^{mtimes n}$, then both of the identity matrices $I$ above are $mtimes m$. Thus Kronecker products can allow us to express this problem as
$$begin{array}{ll} text{minimize} & | tilde{b} - tilde{A} x |_2 \
text{subject to} & tilde{C} x = 0 end{array}$$ where $$
tilde{A} triangleq (B^Totimes I), ~ tilde{b} triangleq mathop{textrm{vec}}(A), ~ tilde{C} triangleq (v^Ttimes I), ~ x triangleq mathop{textrm{vec}}(X).$$
If you square the objective, then this model becomes an equality-constrained least-squares (EQLS) problem. You can derive the optimality conditions for the EQLS form using a simple Lagrange multiplier technique; finding an optimal $x$ requires the solution of a single symmetric indefinite linear system.
Matrix calculus. I actually think Kronecker products are clumsy. Thankfully with some care we can avoid them, at least until we get to the EQLS optimality conditions. Noting that $|Z|_F^2 = mathop{textrm{Tr}}(Z^TZ)$, we can write the squared objective as follows:
$$|A-XB|_F^2 = mathop{textrm{Tr}}((A-XB)^T(A-XB)).$$
Now let $lambdainmathbb{R}^m$ be the Lagrange multiplier for $Xv=0$. The Lagrangian is
$$L(X,lambda)=mathop{textrm{Tr}}((A-XB)^T(A-XB))-lambda^TXv$$
So let's just find the gradient of $L$ with respect to $X$ and set it to zero... but how? Well, I like to use a trace-centered approach. First, we do some rewriting, using identities $mathop{textrm{Tr}}(Z+Y)=mathop{textrm{Tr}}(Z)+mathop{textrm{Tr}}(Y)$, $mathop{textrm{Tr}}(Z)=mathop{textrm{Tr}}(Z^T)$, and $mathop{textrm{Tr}}(ZY)=mathop{textrm{Tr}}(YZ)$. (The latter works only if $ZY$ and $YZ$ are both well-posed.)
$$begin{aligned}
L(X,lambda)&=mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(A^TXB)-mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-lambda^TXv\
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(B^TX^TA)+mathop{textrm{Tr}}(B^TX^TXB)-v^TX^Tlambda \
&=mathop{textrm{Tr}}(A^TA)-2mathop{textrm{Tr}}(X^TAB^T)+mathop{textrm{Tr}}(X^TXBB^T)-mathop{textrm{Tr}}(X^Tlambda v^T) \
&=
mathop{textrm{Tr}}(A^TA)-mathop{textrm{Tr}}(X^T(2AB^T+lambda v^T))+mathop{textrm{Tr}}(X^TXBB^T). end{aligned}
$$
Now, I wish I could point you to a good reference for this---if someone else can, please do---but I did all this rewriting because I can now find the gradient above by rote:
$$nabla_X L(X,lambda) = -(2AB^T+lambda v^T) + 2XBB^T.$$
Setting to zero and solving, we find that the Lagrangian optimality condition is
$$2 XBB^T - lambda v^T = 2AB^T$$
So the pair of optimality conditions for the constrained problem are
$$boxed{2XBB^T - lambda v^T = 2AB^T quad text{and} quad Xv=0}$$
You can verify this is correct by examining the unconstrained case: if $BB^T$ is nonsingular, then
$$mathop{textrm{arg min}}_X|A-XB|_F = AB^dagger = AB^T(BB^T)^{-1}$$ which clearly satisfies the Lagrangian optimality condition with $lambda=0$. If $BB^T$ is singular, on the other hand, then the problem likely does not admit a unique solution.
Note that if we wish to express these conditions in vector form---perhaps we want to use a standard linear solver---then we're back to Kronecker products:
$$begin{bmatrix} 2BB^Totimes I & - votimes I \ - v^Totimes I & 0 end{bmatrix}
begin{bmatrix} mathop{textrm{vec}}(X) \ lambda end{bmatrix} =
begin{bmatrix} mathop{textrm{vec}}(AB) \ 0 end{bmatrix}$$
I've negated the second (block) row so that the symmetry of the system is revealed.
Convex optimization. If you are simply interested in the problem as you've expressed it, then what I am about to suggest is overkill---stick with the symmetric indefinite approach. But since the Frobenius norm is convex, this problem can be readily solved using a variety of extant convex optimization tools. In its original form, it can be expressed as a second-order cone program (SOCP); in EQLS form, it can be expressed as a quadratic program (QP). Solvers for both types of problems are readily available; indeed, SOCP packages can solve QPs as well.
I'm the developer of a MATLAB-based convex optimization toolbox called CVX that can handle this problem rather easily, and connects to solvers that can solve (among other things) QPs and SOCPs. The nice thing about using something like CVX that you don't have to jump through the Kronecker hoops---it will do it for you. For instance, here's the CVX model for your problem (assuming $Xinmathbb{R}^{mtimes n}$):
cvx_begin
variable X(m,n)
minimize(norm(A-X*B,'Fro'))
subject to
X * v == 0
cvx_end
Like I said, this is overkill, so I am not recommending it for this application---unless you are considering adding applying more complex (but convex) constraints to your model besides just $Xv=0$. (I did use CVX to numerically verify my work above, though!)
Iterative methods. Getting back to the EQLS approach for a moment: if $A$, $X$, and $B$ start getting large, you're going to want to exploit the Kronecker structure somehow to improve performance. I think the large symmetric indefinite linear system will be sparse in this case, in which case a $LDL^T$ solver will do, as long as the optimality conditions are nonsingular. (If they are not, then there is not a unique solution.) But with Kronecker structure an iterative solver is often a better choice; something like SYMMLQ. What is important is that you never actually form the full symmetric indefinite matrix, which will involve products of $tilde{C}$, $tilde{A}$, etc. Instead, you treat it like a linear operator, and implement code to compute its forward and adjoint operation. This will reduce the computational requirements considerably.
edited Apr 1 '13 at 11:38
answered Apr 1 '13 at 3:29
Michael GrantMichael Grant
14.9k11944
14.9k11944
Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
– Ethan
Apr 2 '13 at 15:06
Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
– Michael Grant
Apr 2 '13 at 15:21
The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
– Michael Grant
Apr 2 '13 at 15:22
Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
– Ethan
Apr 2 '13 at 15:29
Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
– Michael Grant
Apr 2 '13 at 15:45
|
show 2 more comments
Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
– Ethan
Apr 2 '13 at 15:06
Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
– Michael Grant
Apr 2 '13 at 15:21
The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
– Michael Grant
Apr 2 '13 at 15:22
Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
– Ethan
Apr 2 '13 at 15:29
Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
– Michael Grant
Apr 2 '13 at 15:45
Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
– Ethan
Apr 2 '13 at 15:06
Thanks, Michael! I appreciate that. (1)is the linear system (the optimal condition) by the first method by Kronecker product same as the linear system in the second method by matrix calculus? (2) In the last part "Iterative methods" for solving the linear systems obtained by the Kronecker method or by the Matrix calculus method, why "you never actually form the full symmetric indefinite matrix"?
– Ethan
Apr 2 '13 at 15:06
Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
– Michael Grant
Apr 2 '13 at 15:21
Yes, the matrix calculus and Kronecker methods should produce the equivalent linear systems in the end. However, the matrix calculus approach does a better job of exposing the structure of the linear systems calculations. As for the iterative method, I suggest that you apply an iterative method to the boxed linear system. If you negate $Xv=0$, then the resulting linear operator $(X,lambda)rightarrow(2XBB^T-lambda v^T,-Xv)$ is symmetric.
– Michael Grant
Apr 2 '13 at 15:21
The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
– Michael Grant
Apr 2 '13 at 15:22
The challenge here (using matrix calculus, applying an iterative method to solve for $(X,lambda)$) is to stop thinking about linear systems in terms of matrices and vectors, and start thinking about them in terms of linear operators and more general vector spaces. If you can do that, you can ditch Kronecker products altogether.
– Michael Grant
Apr 2 '13 at 15:22
Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
– Ethan
Apr 2 '13 at 15:29
Michael, Thanks again! How would you think of the other reply, which reduces the problem to an unconstrained LS problem. I don't see any error there.
– Ethan
Apr 2 '13 at 15:29
Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
– Michael Grant
Apr 2 '13 at 15:45
Yes, it looks promising---in fact I'm one of the upvotes! I haven't verified the math, but I don't see any obvious issues.
– Michael Grant
Apr 2 '13 at 15:45
|
show 2 more comments
suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$
if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.
$hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.
Thus the square of $F$ norm is
$mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$
where $(AB')_k$ is the $k$th row of $AB'$.
Thus the norm is
$mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$
where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.
The constraint condition is
$Xv = 0$, then that is $langle X_k, v'_krangle = 0$
add a comment |
suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$
if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.
$hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.
Thus the square of $F$ norm is
$mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$
where $(AB')_k$ is the $k$th row of $AB'$.
Thus the norm is
$mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$
where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.
The constraint condition is
$Xv = 0$, then that is $langle X_k, v'_krangle = 0$
add a comment |
suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$
if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.
$hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.
Thus the square of $F$ norm is
$mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$
where $(AB')_k$ is the $k$th row of $AB'$.
Thus the norm is
$mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$
where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.
The constraint condition is
$Xv = 0$, then that is $langle X_k, v'_krangle = 0$
suppose $A = (a_{ij})$, $B = (b_{ij})$, then $A - XB = (a_{ij} - sum_{k}X_{ik}b_{kj})$
if you take the matrix $X$ as a $1times n^2$vector arranged according to the rows from top to bottom.
$hat{X} = (x_{1,1},x_{1,2},cdots,x_{n,n-1},x_{n,n}) = (X_1,X_2,cdots,X_n)$, where $X_k$ is the $k$th row of $X$.
Thus the square of $F$ norm is
$mathrm{tr}(A-XB)(A-XB)' = mathrm{tr}(AA' - XBA'-AB'X'+XBB'X') = mathrm{tr}(AA') -2mathrm{tr}(XBA') + mathrm{tr}(XBB'X') = mathrm{tr}(AA') - 2sum_k langle X_k,(AB')_krangle + sum_k X_k(BB')X_k'$
where $(AB')_k$ is the $k$th row of $AB'$.
Thus the norm is
$mathrm{tr}(AA') - 2 langle hat{X},M'rangle + hat{X}left((BB')otimes I_nright) hat{X}'$
where $M = ((AB')_1,(AB')_2,cdots,(AB')_n)$, arrange the rows in one row.
The constraint condition is
$Xv = 0$, then that is $langle X_k, v'_krangle = 0$
answered Apr 1 '13 at 2:57
YiminYimin
2,1271124
2,1271124
add a comment |
add a comment |
The closed-form solution to this problem is
$$eqalign{
X &= A,(HB)^+H cr
}$$
where $M^+$ denotes the pseudoinverse of $M$
and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$
The derivation goes as follows.
Consider the linearly constrained, linear problem
$$eqalign{
&min &|Cx-a|_F cr
&{rm s.t.} &Ex=fcr
}$$
The projector into the nullspace of $E$ will be useful
$$eqalign{P &= (I-E^+E) implies EP = 0}$$
The general solution of the constraint equation is
$$eqalign{
x &= E^+f + Py cr
}$$ where the $y$ vector is a free parameter.
Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
$$eqalign{
y &= (CP)^+(a-CE^+f) cr
x &= E^+f + P (CP)^+(a-CE^+f) cr
}$$
We can apply this solution to the current problem by vectorization and making the following identifications
$$eqalign{
C &= (B^Totimes I) cr
E &= (v^Totimes I) cr
f &= 0 cr
a &= {rm vec}(A) cr
x &= {rm vec}(X) cr
}$$
This yields a closed-form solution of the problem (in terms of pseudoinverses)
$$eqalign{
x
&= P (CP)^+a cr
&= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
}$$
As a final step, reshape the vector $x$ into the matrix $X$.
But let's see if we can calculate the $X$ matrix more directly.
First recall that the pseudoinverse of the vector $v$ has a closed-form
$$v^+=frac{v^T}{v^Tv}$$
as does the pseudoinverse of a Kronecker product
$$(Aotimes B)^+ = A^+otimes B^+ $$
Therefore we can do the following
$$eqalign{
E^+ &= frac{v}{v^Tv}otimes I cr
P &= I-E^+E cr
&= Iotimes I - vv^+otimes Icr
&= (I-vv^+)otimes Icr
&= Hotimes I cr
CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
x&= P(CP)^+a cr
&= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
&= big(H(B^TH)^+otimes Ibig)a cr
&= {rm vec}big(A(HB)^+Hbig) cr
X&= A(HB)^+H crcr
}$$
As a quick sanity check, since $Hv=0$ one also has $Xv=0$.
And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.
add a comment |
The closed-form solution to this problem is
$$eqalign{
X &= A,(HB)^+H cr
}$$
where $M^+$ denotes the pseudoinverse of $M$
and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$
The derivation goes as follows.
Consider the linearly constrained, linear problem
$$eqalign{
&min &|Cx-a|_F cr
&{rm s.t.} &Ex=fcr
}$$
The projector into the nullspace of $E$ will be useful
$$eqalign{P &= (I-E^+E) implies EP = 0}$$
The general solution of the constraint equation is
$$eqalign{
x &= E^+f + Py cr
}$$ where the $y$ vector is a free parameter.
Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
$$eqalign{
y &= (CP)^+(a-CE^+f) cr
x &= E^+f + P (CP)^+(a-CE^+f) cr
}$$
We can apply this solution to the current problem by vectorization and making the following identifications
$$eqalign{
C &= (B^Totimes I) cr
E &= (v^Totimes I) cr
f &= 0 cr
a &= {rm vec}(A) cr
x &= {rm vec}(X) cr
}$$
This yields a closed-form solution of the problem (in terms of pseudoinverses)
$$eqalign{
x
&= P (CP)^+a cr
&= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
}$$
As a final step, reshape the vector $x$ into the matrix $X$.
But let's see if we can calculate the $X$ matrix more directly.
First recall that the pseudoinverse of the vector $v$ has a closed-form
$$v^+=frac{v^T}{v^Tv}$$
as does the pseudoinverse of a Kronecker product
$$(Aotimes B)^+ = A^+otimes B^+ $$
Therefore we can do the following
$$eqalign{
E^+ &= frac{v}{v^Tv}otimes I cr
P &= I-E^+E cr
&= Iotimes I - vv^+otimes Icr
&= (I-vv^+)otimes Icr
&= Hotimes I cr
CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
x&= P(CP)^+a cr
&= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
&= big(H(B^TH)^+otimes Ibig)a cr
&= {rm vec}big(A(HB)^+Hbig) cr
X&= A(HB)^+H crcr
}$$
As a quick sanity check, since $Hv=0$ one also has $Xv=0$.
And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.
add a comment |
The closed-form solution to this problem is
$$eqalign{
X &= A,(HB)^+H cr
}$$
where $M^+$ denotes the pseudoinverse of $M$
and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$
The derivation goes as follows.
Consider the linearly constrained, linear problem
$$eqalign{
&min &|Cx-a|_F cr
&{rm s.t.} &Ex=fcr
}$$
The projector into the nullspace of $E$ will be useful
$$eqalign{P &= (I-E^+E) implies EP = 0}$$
The general solution of the constraint equation is
$$eqalign{
x &= E^+f + Py cr
}$$ where the $y$ vector is a free parameter.
Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
$$eqalign{
y &= (CP)^+(a-CE^+f) cr
x &= E^+f + P (CP)^+(a-CE^+f) cr
}$$
We can apply this solution to the current problem by vectorization and making the following identifications
$$eqalign{
C &= (B^Totimes I) cr
E &= (v^Totimes I) cr
f &= 0 cr
a &= {rm vec}(A) cr
x &= {rm vec}(X) cr
}$$
This yields a closed-form solution of the problem (in terms of pseudoinverses)
$$eqalign{
x
&= P (CP)^+a cr
&= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
}$$
As a final step, reshape the vector $x$ into the matrix $X$.
But let's see if we can calculate the $X$ matrix more directly.
First recall that the pseudoinverse of the vector $v$ has a closed-form
$$v^+=frac{v^T}{v^Tv}$$
as does the pseudoinverse of a Kronecker product
$$(Aotimes B)^+ = A^+otimes B^+ $$
Therefore we can do the following
$$eqalign{
E^+ &= frac{v}{v^Tv}otimes I cr
P &= I-E^+E cr
&= Iotimes I - vv^+otimes Icr
&= (I-vv^+)otimes Icr
&= Hotimes I cr
CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
x&= P(CP)^+a cr
&= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
&= big(H(B^TH)^+otimes Ibig)a cr
&= {rm vec}big(A(HB)^+Hbig) cr
X&= A(HB)^+H crcr
}$$
As a quick sanity check, since $Hv=0$ one also has $Xv=0$.
And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.
The closed-form solution to this problem is
$$eqalign{
X &= A,(HB)^+H cr
}$$
where $M^+$ denotes the pseudoinverse of $M$
and $,H=(I-vv^+)= big(I-frac{vv^T}{v^Tv}big)$
The derivation goes as follows.
Consider the linearly constrained, linear problem
$$eqalign{
&min &|Cx-a|_F cr
&{rm s.t.} &Ex=fcr
}$$
The projector into the nullspace of $E$ will be useful
$$eqalign{P &= (I-E^+E) implies EP = 0}$$
The general solution of the constraint equation is
$$eqalign{
x &= E^+f + Py cr
}$$ where the $y$ vector is a free parameter.
Substitute this into the objective function and solve for $y$, then revert back to the original $x$ variable
$$eqalign{
y &= (CP)^+(a-CE^+f) cr
x &= E^+f + P (CP)^+(a-CE^+f) cr
}$$
We can apply this solution to the current problem by vectorization and making the following identifications
$$eqalign{
C &= (B^Totimes I) cr
E &= (v^Totimes I) cr
f &= 0 cr
a &= {rm vec}(A) cr
x &= {rm vec}(X) cr
}$$
This yields a closed-form solution of the problem (in terms of pseudoinverses)
$$eqalign{
x
&= P (CP)^+a cr
&= big(I-E^+Ebig),big(C-CE^+Ebig)^+a cr
}$$
As a final step, reshape the vector $x$ into the matrix $X$.
But let's see if we can calculate the $X$ matrix more directly.
First recall that the pseudoinverse of the vector $v$ has a closed-form
$$v^+=frac{v^T}{v^Tv}$$
as does the pseudoinverse of a Kronecker product
$$(Aotimes B)^+ = A^+otimes B^+ $$
Therefore we can do the following
$$eqalign{
E^+ &= frac{v}{v^Tv}otimes I cr
P &= I-E^+E cr
&= Iotimes I - vv^+otimes Icr
&= (I-vv^+)otimes Icr
&= Hotimes I cr
CP &= (B^Totimes I),(Hotimes I) cr&= B^THotimes I cr
x&= P(CP)^+a cr
&= (Hotimes I)Big((B^TH)^+otimes IBig)a cr
&= big(H(B^TH)^+otimes Ibig)a cr
&= {rm vec}big(A(HB)^+Hbig) cr
X&= A(HB)^+H crcr
}$$
As a quick sanity check, since $Hv=0$ one also has $Xv=0$.
And $,XB=A(HB)^+(HB)approx A,$ since $(M^+M)$ is an orthoprojector for any $M$.
edited 2 days ago
answered Jun 28 '18 at 17:09
greggreg
7,5451821
7,5451821
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f347753%2fminimize-a-xb-f-subject-to-xv-0%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown