How to prove PCA using induction
$begingroup$
In Deep Learning (Goodfellow, et al), the optimization objective of PCA is formulated as
$D^* = argmin_D ||X - XDD^T||_F^2, s.t. D^T D=I$
The book gives the proof of the 1-dimension case, i.e.
$argmin_{d} || X - X dd^T||_F^2, s.t. d^T d = 1 $
equals the eigenvector of $X^TX$ with the largest eigenvalue. And the author says the general case (when $D$ is an $m times l$ matrix, where $l>1$) can be easily proved by induction.
Could anyone please show me how I can prove that using induction?
I know that when $D^T D = I$:
$
D^*
= argmin_D ||X - XDD^T||_F^2
= argmin_D tr D^T X^T X D
$
and
$
tr D^T X^T X D = left(sum_{i=1}^{l-1} left(d^{(i)}right)^T X^TX d^{(i)}right) + left(d^{(l)}right)^T X^TX d^{(l)}
$
where the left-hand side of the addition reaches maximum when $d^{(i)}$ is the $ith$ largest eigenvector of $X^T X$ according to induction hypothesis. But how can I be sure that the result of the addition in a whole is also maximal?
linear-algebra
$endgroup$
add a comment |
$begingroup$
In Deep Learning (Goodfellow, et al), the optimization objective of PCA is formulated as
$D^* = argmin_D ||X - XDD^T||_F^2, s.t. D^T D=I$
The book gives the proof of the 1-dimension case, i.e.
$argmin_{d} || X - X dd^T||_F^2, s.t. d^T d = 1 $
equals the eigenvector of $X^TX$ with the largest eigenvalue. And the author says the general case (when $D$ is an $m times l$ matrix, where $l>1$) can be easily proved by induction.
Could anyone please show me how I can prove that using induction?
I know that when $D^T D = I$:
$
D^*
= argmin_D ||X - XDD^T||_F^2
= argmin_D tr D^T X^T X D
$
and
$
tr D^T X^T X D = left(sum_{i=1}^{l-1} left(d^{(i)}right)^T X^TX d^{(i)}right) + left(d^{(l)}right)^T X^TX d^{(l)}
$
where the left-hand side of the addition reaches maximum when $d^{(i)}$ is the $ith$ largest eigenvector of $X^T X$ according to induction hypothesis. But how can I be sure that the result of the addition in a whole is also maximal?
linear-algebra
$endgroup$
$begingroup$
Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
$endgroup$
– Kentzo
Nov 28 '18 at 0:46
add a comment |
$begingroup$
In Deep Learning (Goodfellow, et al), the optimization objective of PCA is formulated as
$D^* = argmin_D ||X - XDD^T||_F^2, s.t. D^T D=I$
The book gives the proof of the 1-dimension case, i.e.
$argmin_{d} || X - X dd^T||_F^2, s.t. d^T d = 1 $
equals the eigenvector of $X^TX$ with the largest eigenvalue. And the author says the general case (when $D$ is an $m times l$ matrix, where $l>1$) can be easily proved by induction.
Could anyone please show me how I can prove that using induction?
I know that when $D^T D = I$:
$
D^*
= argmin_D ||X - XDD^T||_F^2
= argmin_D tr D^T X^T X D
$
and
$
tr D^T X^T X D = left(sum_{i=1}^{l-1} left(d^{(i)}right)^T X^TX d^{(i)}right) + left(d^{(l)}right)^T X^TX d^{(l)}
$
where the left-hand side of the addition reaches maximum when $d^{(i)}$ is the $ith$ largest eigenvector of $X^T X$ according to induction hypothesis. But how can I be sure that the result of the addition in a whole is also maximal?
linear-algebra
$endgroup$
In Deep Learning (Goodfellow, et al), the optimization objective of PCA is formulated as
$D^* = argmin_D ||X - XDD^T||_F^2, s.t. D^T D=I$
The book gives the proof of the 1-dimension case, i.e.
$argmin_{d} || X - X dd^T||_F^2, s.t. d^T d = 1 $
equals the eigenvector of $X^TX$ with the largest eigenvalue. And the author says the general case (when $D$ is an $m times l$ matrix, where $l>1$) can be easily proved by induction.
Could anyone please show me how I can prove that using induction?
I know that when $D^T D = I$:
$
D^*
= argmin_D ||X - XDD^T||_F^2
= argmin_D tr D^T X^T X D
$
and
$
tr D^T X^T X D = left(sum_{i=1}^{l-1} left(d^{(i)}right)^T X^TX d^{(i)}right) + left(d^{(l)}right)^T X^TX d^{(l)}
$
where the left-hand side of the addition reaches maximum when $d^{(i)}$ is the $ith$ largest eigenvector of $X^T X$ according to induction hypothesis. But how can I be sure that the result of the addition in a whole is also maximal?
linear-algebra
linear-algebra
asked May 14 '17 at 3:23
Lifu HuangLifu Huang
1696
1696
$begingroup$
Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
$endgroup$
– Kentzo
Nov 28 '18 at 0:46
add a comment |
$begingroup$
Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
$endgroup$
– Kentzo
Nov 28 '18 at 0:46
$begingroup$
Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
$endgroup$
– Kentzo
Nov 28 '18 at 0:46
$begingroup$
Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
$endgroup$
– Kentzo
Nov 28 '18 at 0:46
add a comment |
3 Answers
3
active
oldest
votes
$begingroup$
I am at the same point actually, and wondering if you found an answer.
My idea was to use the Theorem of Fan which says:
Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then
$lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$
$endgroup$
add a comment |
$begingroup$
Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
Deleting this dimension will cause an error which is given by,
$J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
= frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
= frac{1}{m}u_{j}^{T}XX^{T}u_{j}$
Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$
$s.t. u_{j}^{T}u_{j} = 1$
Using Lagrange multiplier,
$widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$
Minimize this equation,
$frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$
Hence,
$Su_{j}=lambda _{j}u_{j}$
It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
Then,
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
= sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
= sum_{j = n-t}^{n}lambda _{j}$
If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).
Hope these can help you.
referrence:
https://blog.csdn.net/Dark_Scope/article/details/53150883
$endgroup$
add a comment |
$begingroup$
We will start from
$$begin{align}
D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
&= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
end{align}
$$
Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.
The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.
Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.
Notation:
Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.
Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$
Lemma:
$$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
&=v_l^TX^TXv_l end{align}$$
Proof:
Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
$$
begin{align}
d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
&= q^T Lambda q qquad [where q = V^Td^{(l)}]\
&= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
&= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
end{align}
$$
Now,
$$
begin{align}
sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
&= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
&= 1 qquad quad [because Vert d^{(l)} Vert = 1]
end{align}
$$
Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$
We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2280047%2fhow-to-prove-pca-using-induction%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I am at the same point actually, and wondering if you found an answer.
My idea was to use the Theorem of Fan which says:
Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then
$lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$
$endgroup$
add a comment |
$begingroup$
I am at the same point actually, and wondering if you found an answer.
My idea was to use the Theorem of Fan which says:
Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then
$lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$
$endgroup$
add a comment |
$begingroup$
I am at the same point actually, and wondering if you found an answer.
My idea was to use the Theorem of Fan which says:
Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then
$lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$
$endgroup$
I am at the same point actually, and wondering if you found an answer.
My idea was to use the Theorem of Fan which says:
Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then
$lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$
edited Jun 5 '17 at 18:32
answered Jun 5 '17 at 12:19
M. RoM. Ro
164
164
add a comment |
add a comment |
$begingroup$
Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
Deleting this dimension will cause an error which is given by,
$J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
= frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
= frac{1}{m}u_{j}^{T}XX^{T}u_{j}$
Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$
$s.t. u_{j}^{T}u_{j} = 1$
Using Lagrange multiplier,
$widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$
Minimize this equation,
$frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$
Hence,
$Su_{j}=lambda _{j}u_{j}$
It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
Then,
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
= sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
= sum_{j = n-t}^{n}lambda _{j}$
If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).
Hope these can help you.
referrence:
https://blog.csdn.net/Dark_Scope/article/details/53150883
$endgroup$
add a comment |
$begingroup$
Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
Deleting this dimension will cause an error which is given by,
$J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
= frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
= frac{1}{m}u_{j}^{T}XX^{T}u_{j}$
Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$
$s.t. u_{j}^{T}u_{j} = 1$
Using Lagrange multiplier,
$widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$
Minimize this equation,
$frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$
Hence,
$Su_{j}=lambda _{j}u_{j}$
It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
Then,
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
= sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
= sum_{j = n-t}^{n}lambda _{j}$
If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).
Hope these can help you.
referrence:
https://blog.csdn.net/Dark_Scope/article/details/53150883
$endgroup$
add a comment |
$begingroup$
Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
Deleting this dimension will cause an error which is given by,
$J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
= frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
= frac{1}{m}u_{j}^{T}XX^{T}u_{j}$
Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$
$s.t. u_{j}^{T}u_{j} = 1$
Using Lagrange multiplier,
$widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$
Minimize this equation,
$frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$
Hence,
$Su_{j}=lambda _{j}u_{j}$
It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
Then,
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
= sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
= sum_{j = n-t}^{n}lambda _{j}$
If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).
Hope these can help you.
referrence:
https://blog.csdn.net/Dark_Scope/article/details/53150883
$endgroup$
Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
Deleting this dimension will cause an error which is given by,
$J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
= frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
= frac{1}{m}u_{j}^{T}XX^{T}u_{j}$
Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$
$s.t. u_{j}^{T}u_{j} = 1$
Using Lagrange multiplier,
$widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$
Minimize this equation,
$frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$
Hence,
$Su_{j}=lambda _{j}u_{j}$
It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
Then,
$J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
= sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
= sum_{j = n-t}^{n}lambda _{j}$
If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).
Hope these can help you.
referrence:
https://blog.csdn.net/Dark_Scope/article/details/53150883
answered Jun 1 '18 at 11:13
DavidDavid
111
111
add a comment |
add a comment |
$begingroup$
We will start from
$$begin{align}
D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
&= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
end{align}
$$
Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.
The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.
Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.
Notation:
Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.
Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$
Lemma:
$$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
&=v_l^TX^TXv_l end{align}$$
Proof:
Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
$$
begin{align}
d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
&= q^T Lambda q qquad [where q = V^Td^{(l)}]\
&= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
&= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
end{align}
$$
Now,
$$
begin{align}
sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
&= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
&= 1 qquad quad [because Vert d^{(l)} Vert = 1]
end{align}
$$
Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$
We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.
$endgroup$
add a comment |
$begingroup$
We will start from
$$begin{align}
D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
&= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
end{align}
$$
Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.
The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.
Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.
Notation:
Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.
Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$
Lemma:
$$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
&=v_l^TX^TXv_l end{align}$$
Proof:
Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
$$
begin{align}
d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
&= q^T Lambda q qquad [where q = V^Td^{(l)}]\
&= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
&= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
end{align}
$$
Now,
$$
begin{align}
sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
&= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
&= 1 qquad quad [because Vert d^{(l)} Vert = 1]
end{align}
$$
Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$
We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.
$endgroup$
add a comment |
$begingroup$
We will start from
$$begin{align}
D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
&= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
end{align}
$$
Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.
The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.
Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.
Notation:
Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.
Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$
Lemma:
$$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
&=v_l^TX^TXv_l end{align}$$
Proof:
Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
$$
begin{align}
d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
&= q^T Lambda q qquad [where q = V^Td^{(l)}]\
&= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
&= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
end{align}
$$
Now,
$$
begin{align}
sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
&= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
&= 1 qquad quad [because Vert d^{(l)} Vert = 1]
end{align}
$$
Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$
We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.
$endgroup$
We will start from
$$begin{align}
D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
&= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
end{align}
$$
Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.
The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.
Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.
Notation:
Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.
Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$
Lemma:
$$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
&=v_l^TX^TXv_l end{align}$$
Proof:
Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
$$
begin{align}
d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
&= q^T Lambda q qquad [where q = V^Td^{(l)}]\
&= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
&= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
end{align}
$$
Now,
$$
begin{align}
sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
&= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
&= 1 qquad quad [because Vert d^{(l)} Vert = 1]
end{align}
$$
Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$
We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.
edited Jan 29 at 18:38
community wiki
2 revs
Chayan Ghosh
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2280047%2fhow-to-prove-pca-using-induction%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
$endgroup$
– Kentzo
Nov 28 '18 at 0:46