How to prove PCA using induction












6












$begingroup$


In Deep Learning (Goodfellow, et al), the optimization objective of PCA is formulated as



$D^* = argmin_D ||X - XDD^T||_F^2, s.t. D^T D=I$



The book gives the proof of the 1-dimension case, i.e.



$argmin_{d} || X - X dd^T||_F^2, s.t. d^T d = 1 $



equals the eigenvector of $X^TX$ with the largest eigenvalue. And the author says the general case (when $D$ is an $m times l$ matrix, where $l>1$) can be easily proved by induction.



Could anyone please show me how I can prove that using induction?



I know that when $D^T D = I$:



$
D^*
= argmin_D ||X - XDD^T||_F^2
= argmin_D tr D^T X^T X D
$



and
$
tr D^T X^T X D = left(sum_{i=1}^{l-1} left(d^{(i)}right)^T X^TX d^{(i)}right) + left(d^{(l)}right)^T X^TX d^{(l)}
$



where the left-hand side of the addition reaches maximum when $d^{(i)}$ is the $ith$ largest eigenvector of $X^T X$ according to induction hypothesis. But how can I be sure that the result of the addition in a whole is also maximal?










share|cite|improve this question









$endgroup$












  • $begingroup$
    Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
    $endgroup$
    – Kentzo
    Nov 28 '18 at 0:46
















6












$begingroup$


In Deep Learning (Goodfellow, et al), the optimization objective of PCA is formulated as



$D^* = argmin_D ||X - XDD^T||_F^2, s.t. D^T D=I$



The book gives the proof of the 1-dimension case, i.e.



$argmin_{d} || X - X dd^T||_F^2, s.t. d^T d = 1 $



equals the eigenvector of $X^TX$ with the largest eigenvalue. And the author says the general case (when $D$ is an $m times l$ matrix, where $l>1$) can be easily proved by induction.



Could anyone please show me how I can prove that using induction?



I know that when $D^T D = I$:



$
D^*
= argmin_D ||X - XDD^T||_F^2
= argmin_D tr D^T X^T X D
$



and
$
tr D^T X^T X D = left(sum_{i=1}^{l-1} left(d^{(i)}right)^T X^TX d^{(i)}right) + left(d^{(l)}right)^T X^TX d^{(l)}
$



where the left-hand side of the addition reaches maximum when $d^{(i)}$ is the $ith$ largest eigenvector of $X^T X$ according to induction hypothesis. But how can I be sure that the result of the addition in a whole is also maximal?










share|cite|improve this question









$endgroup$












  • $begingroup$
    Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
    $endgroup$
    – Kentzo
    Nov 28 '18 at 0:46














6












6








6


2



$begingroup$


In Deep Learning (Goodfellow, et al), the optimization objective of PCA is formulated as



$D^* = argmin_D ||X - XDD^T||_F^2, s.t. D^T D=I$



The book gives the proof of the 1-dimension case, i.e.



$argmin_{d} || X - X dd^T||_F^2, s.t. d^T d = 1 $



equals the eigenvector of $X^TX$ with the largest eigenvalue. And the author says the general case (when $D$ is an $m times l$ matrix, where $l>1$) can be easily proved by induction.



Could anyone please show me how I can prove that using induction?



I know that when $D^T D = I$:



$
D^*
= argmin_D ||X - XDD^T||_F^2
= argmin_D tr D^T X^T X D
$



and
$
tr D^T X^T X D = left(sum_{i=1}^{l-1} left(d^{(i)}right)^T X^TX d^{(i)}right) + left(d^{(l)}right)^T X^TX d^{(l)}
$



where the left-hand side of the addition reaches maximum when $d^{(i)}$ is the $ith$ largest eigenvector of $X^T X$ according to induction hypothesis. But how can I be sure that the result of the addition in a whole is also maximal?










share|cite|improve this question









$endgroup$




In Deep Learning (Goodfellow, et al), the optimization objective of PCA is formulated as



$D^* = argmin_D ||X - XDD^T||_F^2, s.t. D^T D=I$



The book gives the proof of the 1-dimension case, i.e.



$argmin_{d} || X - X dd^T||_F^2, s.t. d^T d = 1 $



equals the eigenvector of $X^TX$ with the largest eigenvalue. And the author says the general case (when $D$ is an $m times l$ matrix, where $l>1$) can be easily proved by induction.



Could anyone please show me how I can prove that using induction?



I know that when $D^T D = I$:



$
D^*
= argmin_D ||X - XDD^T||_F^2
= argmin_D tr D^T X^T X D
$



and
$
tr D^T X^T X D = left(sum_{i=1}^{l-1} left(d^{(i)}right)^T X^TX d^{(i)}right) + left(d^{(l)}right)^T X^TX d^{(l)}
$



where the left-hand side of the addition reaches maximum when $d^{(i)}$ is the $ith$ largest eigenvector of $X^T X$ according to induction hypothesis. But how can I be sure that the result of the addition in a whole is also maximal?







linear-algebra






share|cite|improve this question













share|cite|improve this question











share|cite|improve this question




share|cite|improve this question










asked May 14 '17 at 3:23









Lifu HuangLifu Huang

1696




1696












  • $begingroup$
    Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
    $endgroup$
    – Kentzo
    Nov 28 '18 at 0:46


















  • $begingroup$
    Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
    $endgroup$
    – Kentzo
    Nov 28 '18 at 0:46
















$begingroup$
Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
$endgroup$
– Kentzo
Nov 28 '18 at 0:46




$begingroup$
Perhaps the idea is that if you define $X'^TX' = W'^TLambda' W'$ where $Lambda'$ and $W'$ are identical to $Lambda$ and $W$ from $X^TX = W^TLambda W$ but without the largest eigenvalue and the corresponding eigenvector. By solving it you will end up with $d'$ which is, again, the eigenvector of $X'^TX'$ with the largest eigenvalue or the second largest eigenvector of the original matrix $X^TX$.
$endgroup$
– Kentzo
Nov 28 '18 at 0:46










3 Answers
3






active

oldest

votes


















1












$begingroup$

I am at the same point actually, and wondering if you found an answer.



My idea was to use the Theorem of Fan which says:



Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then



$lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$






share|cite|improve this answer











$endgroup$





















    1












    $begingroup$

    Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
    Deleting this dimension will cause an error which is given by,
    $J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
    = frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
    = frac{1}{m}u_{j}^{T}XX^{T}u_{j}$



    Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
    $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$



    $s.t. u_{j}^{T}u_{j} = 1$



    Using Lagrange multiplier,



    $widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$



    Minimize this equation,



    $frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$



    Hence,



    $Su_{j}=lambda _{j}u_{j}$



    It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
    Then,



    $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
    = sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
    = sum_{j = n-t}^{n}lambda _{j}$



    If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).



    Hope these can help you.



    referrence:
    https://blog.csdn.net/Dark_Scope/article/details/53150883






    share|cite|improve this answer









    $endgroup$





















      1












      $begingroup$

      We will start from
      $$begin{align}
      D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
      &= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
      end{align}
      $$

      Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.



      The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.



      Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.



      Notation:
      Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.

      Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$



      Lemma:
      $$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
      &=v_l^TX^TXv_l end{align}$$



      Proof:
      Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
      $$
      begin{align}
      d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
      &= q^T Lambda q qquad [where q = V^Td^{(l)}]\
      &= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
      &= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
      end{align}
      $$

      Now,
      $$
      begin{align}
      sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
      &= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
      &= 1 qquad quad [because Vert d^{(l)} Vert = 1]
      end{align}
      $$



      Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$



      We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.






      share|cite|improve this answer











      $endgroup$













        Your Answer





        StackExchange.ifUsing("editor", function () {
        return StackExchange.using("mathjaxEditing", function () {
        StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
        StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
        });
        });
        }, "mathjax-editing");

        StackExchange.ready(function() {
        var channelOptions = {
        tags: "".split(" "),
        id: "69"
        };
        initTagRenderer("".split(" "), "".split(" "), channelOptions);

        StackExchange.using("externalEditor", function() {
        // Have to fire editor after snippets, if snippets enabled
        if (StackExchange.settings.snippets.snippetsEnabled) {
        StackExchange.using("snippets", function() {
        createEditor();
        });
        }
        else {
        createEditor();
        }
        });

        function createEditor() {
        StackExchange.prepareEditor({
        heartbeatType: 'answer',
        autoActivateHeartbeat: false,
        convertImagesToLinks: true,
        noModals: true,
        showLowRepImageUploadWarning: true,
        reputationToPostImages: 10,
        bindNavPrevention: true,
        postfix: "",
        imageUploader: {
        brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
        contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
        allowUrls: true
        },
        noCode: true, onDemand: true,
        discardSelector: ".discard-answer"
        ,immediatelyShowMarkdownHelp:true
        });


        }
        });














        draft saved

        draft discarded


















        StackExchange.ready(
        function () {
        StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2280047%2fhow-to-prove-pca-using-induction%23new-answer', 'question_page');
        }
        );

        Post as a guest















        Required, but never shown

























        3 Answers
        3






        active

        oldest

        votes








        3 Answers
        3






        active

        oldest

        votes









        active

        oldest

        votes






        active

        oldest

        votes









        1












        $begingroup$

        I am at the same point actually, and wondering if you found an answer.



        My idea was to use the Theorem of Fan which says:



        Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then



        $lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$






        share|cite|improve this answer











        $endgroup$


















          1












          $begingroup$

          I am at the same point actually, and wondering if you found an answer.



          My idea was to use the Theorem of Fan which says:



          Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then



          $lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$






          share|cite|improve this answer











          $endgroup$
















            1












            1








            1





            $begingroup$

            I am at the same point actually, and wondering if you found an answer.



            My idea was to use the Theorem of Fan which says:



            Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then



            $lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$






            share|cite|improve this answer











            $endgroup$



            I am at the same point actually, and wondering if you found an answer.



            My idea was to use the Theorem of Fan which says:



            Let $X in S^n$ be a symmetric matrix with eigenvalues $lambda_1 geqslant ldots geqslant lambda_k$, then



            $lambda_1+ cdots + lambda_k = max {Tr(XDD^T): D in mathbb{R}^{n times k} D^TD=I_k}$







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited Jun 5 '17 at 18:32

























            answered Jun 5 '17 at 12:19









            M. RoM. Ro

            164




            164























                1












                $begingroup$

                Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
                Deleting this dimension will cause an error which is given by,
                $J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
                = frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
                = frac{1}{m}u_{j}^{T}XX^{T}u_{j}$



                Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
                $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$



                $s.t. u_{j}^{T}u_{j} = 1$



                Using Lagrange multiplier,



                $widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$



                Minimize this equation,



                $frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$



                Hence,



                $Su_{j}=lambda _{j}u_{j}$



                It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
                Then,



                $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
                = sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
                = sum_{j = n-t}^{n}lambda _{j}$



                If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).



                Hope these can help you.



                referrence:
                https://blog.csdn.net/Dark_Scope/article/details/53150883






                share|cite|improve this answer









                $endgroup$


















                  1












                  $begingroup$

                  Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
                  Deleting this dimension will cause an error which is given by,
                  $J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
                  = frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
                  = frac{1}{m}u_{j}^{T}XX^{T}u_{j}$



                  Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
                  $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$



                  $s.t. u_{j}^{T}u_{j} = 1$



                  Using Lagrange multiplier,



                  $widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$



                  Minimize this equation,



                  $frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$



                  Hence,



                  $Su_{j}=lambda _{j}u_{j}$



                  It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
                  Then,



                  $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
                  = sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
                  = sum_{j = n-t}^{n}lambda _{j}$



                  If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).



                  Hope these can help you.



                  referrence:
                  https://blog.csdn.net/Dark_Scope/article/details/53150883






                  share|cite|improve this answer









                  $endgroup$
















                    1












                    1








                    1





                    $begingroup$

                    Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
                    Deleting this dimension will cause an error which is given by,
                    $J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
                    = frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
                    = frac{1}{m}u_{j}^{T}XX^{T}u_{j}$



                    Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
                    $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$



                    $s.t. u_{j}^{T}u_{j} = 1$



                    Using Lagrange multiplier,



                    $widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$



                    Minimize this equation,



                    $frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$



                    Hence,



                    $Su_{j}=lambda _{j}u_{j}$



                    It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
                    Then,



                    $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
                    = sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
                    = sum_{j = n-t}^{n}lambda _{j}$



                    If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).



                    Hope these can help you.



                    referrence:
                    https://blog.csdn.net/Dark_Scope/article/details/53150883






                    share|cite|improve this answer









                    $endgroup$



                    Suppose we need to project the data set X onto a vector $u_{j}$. $u_{j}$ is a unit vector namely $u_{j}^{T}u_{j} = 1$.
                    Deleting this dimension will cause an error which is given by,
                    $J_{j} = frac{1}{m}left | X^{T}u_{j} right |^{2}
                    = frac{1}{m}left ( X^{T}u_{j} right )^{T}left ( X^{T}u_{j} right )
                    = frac{1}{m}u_{j}^{T}XX^{T}u_{j}$



                    Let S denotes $frac{1}{m}XX^{T}$ and suppose we need to get rid of $t$ dimensions, the cost function is
                    $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}$



                    $s.t. u_{j}^{T}u_{j} = 1$



                    Using Lagrange multiplier,



                    $widetilde{J} = sum_{j = n-t}^{n}u_{j}^{T}Su_{j} + lambda _{j}left ( 1 - u_{j}^{T}u_{j} right )$



                    Minimize this equation,



                    $frac{partial widetilde{J}}{partial u_{j}} = Su_{j} - lambda _{j}u_{j} = 0$



                    Hence,



                    $Su_{j}=lambda _{j}u_{j}$



                    It is obvious that $u_{j}$ is the eigenvector of S and $lambda _{j}$ is the eigenvalue.
                    Then,



                    $J = sum_{j = n-t}^{n}u_{j}^{T}Su_{j}
                    = sum_{j = n-t}^{n}u_{j}^{T}lambda _{j}u_{j}
                    = sum_{j = n-t}^{n}lambda _{j}$



                    If we want to obtain the minimum J, we need to set aside the large eigenvalues and get rid of the small ones(These values correspond to the $lambda _{j}$ in the previous equation).



                    Hope these can help you.



                    referrence:
                    https://blog.csdn.net/Dark_Scope/article/details/53150883







                    share|cite|improve this answer












                    share|cite|improve this answer



                    share|cite|improve this answer










                    answered Jun 1 '18 at 11:13









                    DavidDavid

                    111




                    111























                        1












                        $begingroup$

                        We will start from
                        $$begin{align}
                        D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
                        &= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
                        end{align}
                        $$

                        Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.



                        The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.



                        Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.



                        Notation:
                        Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.

                        Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$



                        Lemma:
                        $$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
                        &=v_l^TX^TXv_l end{align}$$



                        Proof:
                        Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
                        $$
                        begin{align}
                        d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
                        &= q^T Lambda q qquad [where q = V^Td^{(l)}]\
                        &= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
                        &= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
                        end{align}
                        $$

                        Now,
                        $$
                        begin{align}
                        sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
                        &= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
                        &= 1 qquad quad [because Vert d^{(l)} Vert = 1]
                        end{align}
                        $$



                        Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$



                        We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.






                        share|cite|improve this answer











                        $endgroup$


















                          1












                          $begingroup$

                          We will start from
                          $$begin{align}
                          D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
                          &= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
                          end{align}
                          $$

                          Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.



                          The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.



                          Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.



                          Notation:
                          Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.

                          Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$



                          Lemma:
                          $$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
                          &=v_l^TX^TXv_l end{align}$$



                          Proof:
                          Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
                          $$
                          begin{align}
                          d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
                          &= q^T Lambda q qquad [where q = V^Td^{(l)}]\
                          &= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
                          &= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
                          end{align}
                          $$

                          Now,
                          $$
                          begin{align}
                          sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
                          &= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
                          &= 1 qquad quad [because Vert d^{(l)} Vert = 1]
                          end{align}
                          $$



                          Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$



                          We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.






                          share|cite|improve this answer











                          $endgroup$
















                            1












                            1








                            1





                            $begingroup$

                            We will start from
                            $$begin{align}
                            D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
                            &= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
                            end{align}
                            $$

                            Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.



                            The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.



                            Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.



                            Notation:
                            Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.

                            Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$



                            Lemma:
                            $$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
                            &=v_l^TX^TXv_l end{align}$$



                            Proof:
                            Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
                            $$
                            begin{align}
                            d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
                            &= q^T Lambda q qquad [where q = V^Td^{(l)}]\
                            &= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
                            &= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
                            end{align}
                            $$

                            Now,
                            $$
                            begin{align}
                            sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
                            &= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
                            &= 1 qquad quad [because Vert d^{(l)} Vert = 1]
                            end{align}
                            $$



                            Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$



                            We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.






                            share|cite|improve this answer











                            $endgroup$



                            We will start from
                            $$begin{align}
                            D^* &= underset{D}{argmax};Tr (D^TX^TXD)\
                            &= underset{D}{argmax}left[Tr (D_{l-1}^TX^TXD_{l-1}) + d^{(l)T}X^TXd^{(l)}right]
                            end{align}
                            $$

                            Where we used the notation $D_{k}$ to denote the matrix with first $l-1$ columns of $D$.



                            The 2 summands in the expression share no common terms of $D$ and hence can be maximized independently adhering to the constraints $D_{l-1}$ has orthonormal columns and $d^{(l)}$ is unit norm and orthogonal to all columns of $D_{l-1}$.



                            Using the induction hypothesis, we conclude that $Tr (D_{l-1}^TX^TXD_{l-1})$ (with the constraint that the columns of $D_{l-1}$ are orthonormal) is maximized when $D_{l-1}$ comprises of the orthonormal eigenvectors corresponding the $l-1$ largest eigenvalues.



                            Notation:
                            Suppose $lambda_1 geqslant ... geqslantlambda_n$ are the eigenvalues and $v_1, ..., v_n$ are the corresponding orthonormal eigenvectors.

                            Denote $H_{l-1} = span{v_1, ...,v_{l-1}}$ and $H_{l-1}^{bot}$ the orthogonal subspace of $H_{l-1}$ i.e. $H_{l-1}^{bot} = span{v_l,...,v_n}$



                            Lemma:
                            $$begin{align}lambda_l &= underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} quad s.t. Vert d^{(l)}Vert = 1, d^{(l)} in H_{l-1}^bot \
                            &=v_l^TX^TXv_l end{align}$$



                            Proof:
                            Let $Sigma = X^TX$. Because it's a symmetric positive semidefinite matrix, eigendecomposition exists and let it be $Sigma = VLambda V^T$ where columns of $V$ are $v_1,...,v_n$ in that order and hence $Lambda=diag(lambda_1,...,lambda_n)$.
                            $$
                            begin{align}
                            d^{(l)T}Sigma d^{(l)} &= d^{(l)T} VLambda V^T d^{(l)}\
                            &= q^T Lambda q qquad [where q = V^Td^{(l)}]\
                            &= sum_{i=1}^n q_i^2 lambda_i qquad [where q_i = (V^Td^{(l)})_i = v_i^T d^{(l)}]\
                            &= sum_{i=l}^n q_i^2 lambda_i qquad [because d^{(l)} in H_{l-1}^bot implies q_i = v_i^T d^{(l)} = 0 forall i < l]\
                            end{align}
                            $$

                            Now,
                            $$
                            begin{align}
                            sum_{i=l}^n q_i^2 lambda_i &= sum_{i=1}^n q_i^2 lambda_i = Vert q Vert = Vert V^T d^{(l)} Vert \
                            &= Vert d^{(l)} Vert qquad [because V and hence V^T is orthogonal] \
                            &= 1 qquad quad [because Vert d^{(l)} Vert = 1]
                            end{align}
                            $$



                            Therefore $d^{(l)T} Sigma d^{(l)}$ is a convex combination of $lambda_l,...,lambda_n$ and $$underset{d^{(l)}}{max} d^{(l)T}Sigma d^{(l)} = underset{d^{(l)}}{max} d^{(l)T}X^TXd^{(l)} = v_l^TX^TXv_l = lambda_l (qed)$$



                            We conclude that $D^*$ is obtained by augmenting $D_{l-1}$ with the column $v_l$ which completes the original proof.







                            share|cite|improve this answer














                            share|cite|improve this answer



                            share|cite|improve this answer








                            edited Jan 29 at 18:38


























                            community wiki





                            2 revs
                            Chayan Ghosh































                                draft saved

                                draft discarded




















































                                Thanks for contributing an answer to Mathematics Stack Exchange!


                                • Please be sure to answer the question. Provide details and share your research!

                                But avoid



                                • Asking for help, clarification, or responding to other answers.

                                • Making statements based on opinion; back them up with references or personal experience.


                                Use MathJax to format equations. MathJax reference.


                                To learn more, see our tips on writing great answers.




                                draft saved


                                draft discarded














                                StackExchange.ready(
                                function () {
                                StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f2280047%2fhow-to-prove-pca-using-induction%23new-answer', 'question_page');
                                }
                                );

                                Post as a guest















                                Required, but never shown





















































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown

































                                Required, but never shown














                                Required, but never shown












                                Required, but never shown







                                Required, but never shown







                                Popular posts from this blog

                                Mario Kart Wii

                                What does “Dominus providebit” mean?

                                Antonio Litta Visconti Arese