Multivariate Conditional Entropy as a test of correlation between random variables












4












$begingroup$


I use the word columns to mean the data from which a random variable can be estimated. It is a sample of a random variable.



I am working with $N$ columns of weakly correlated data. Furthermore, I am assuming that column $A$ (within this data) is well correlated with the other $N-1$ columns of this data. That is to say, I believe that column $A$ is a function (roughly, due to noise) of the other $N-1$ columns in my data. I would like to prove this. We say that column 1, $C_1$, is sampled from set $X_1$, but specifically, we are sampling a function as I describe next. Suppose we have a function $f:(X_1,X_2,cdots , X_{N-1}) rightarrow X_A$, then a sample is the $k$th row, $(C_1(k), cdots, C_{N-1}(k), A(k))$. I have discovered that Multivariate Conditional Entropy is a measure of whether or not a column is a function of $M$ other columns. This is to say that when all columns are sampled, with no noise, from an exact function $f:(X_1,X_2,cdots , X_{N-1}) rightarrow X_A$, then the multivariate conditional entropy is zero, so $H(A|C1, cdots C_{N-1}) = 0$. I want to use this fact on actual data where there is noise. What I expect is that, as I add columns, the conditional entropy should go down. So the series
$$H(A|(C_1, cdots C_j))$$
goes to zero as as $j rightarrow N-1$. Also, if I were to throw in some random columns, ie, columns randomly sampled from the same space as $C_i$, the conditional entropy should increase, or at least not decrease. This is because $A$ is not a function of those random samples. Does this sound reasonable?



What I have found is that, as you add columns (those actually sampled from the function), the entropy goes down. However, if you substitute random columns instead of the actual columns which are actual samples of the function, you get no increase in conditional entropy. I must conclude that you cannot use Conditional Entropy for this purpose, finding correlations. Is that so?



Update:
I believe what is going on here is the curse of dimensionality. As I add columns, the size of the space of the domain increases exponentially. Thus, any fixed, finite set of samples tends toward distinct points, which satisfy the criteria of being "the evaluation of a function at an element". I am working with Real numbers as data, so I am cutting my columns into bins and this number of bins is a parameter that I can adjust in accordance with the number of samples I have. with that said, I still think the problem with this method is exponential explosion of the size of the Domain space. I want to try PCA to solve this. Does this sound like a good idea? Recall that I am trying to prove that $A$ is a function of $N-1$ other columns using Multivariate Conditional Entropy.



Note: I have tried this question over at Cross Validated but it is not getting much attention: https://stats.stackexchange.com/questions/388609/multivariate-conditional-entropy-as-a-test-of-correlation-between-random-variabl



This might be an example of performing feature selection via minimizing multivariate conditional entropy. There are already information theoretic algorithms for feature selection. Perhaps I am suggesting a new one, but I am not sure.



I have added the category theory tag because my work actually involves, according to my working theory of what I am doing, the learning of finitely presented categories themselves. Thus, this specific problem is the learning of a function, which is the core of all concrete categories. See my research here










share|cite|improve this question











$endgroup$

















    4












    $begingroup$


    I use the word columns to mean the data from which a random variable can be estimated. It is a sample of a random variable.



    I am working with $N$ columns of weakly correlated data. Furthermore, I am assuming that column $A$ (within this data) is well correlated with the other $N-1$ columns of this data. That is to say, I believe that column $A$ is a function (roughly, due to noise) of the other $N-1$ columns in my data. I would like to prove this. We say that column 1, $C_1$, is sampled from set $X_1$, but specifically, we are sampling a function as I describe next. Suppose we have a function $f:(X_1,X_2,cdots , X_{N-1}) rightarrow X_A$, then a sample is the $k$th row, $(C_1(k), cdots, C_{N-1}(k), A(k))$. I have discovered that Multivariate Conditional Entropy is a measure of whether or not a column is a function of $M$ other columns. This is to say that when all columns are sampled, with no noise, from an exact function $f:(X_1,X_2,cdots , X_{N-1}) rightarrow X_A$, then the multivariate conditional entropy is zero, so $H(A|C1, cdots C_{N-1}) = 0$. I want to use this fact on actual data where there is noise. What I expect is that, as I add columns, the conditional entropy should go down. So the series
    $$H(A|(C_1, cdots C_j))$$
    goes to zero as as $j rightarrow N-1$. Also, if I were to throw in some random columns, ie, columns randomly sampled from the same space as $C_i$, the conditional entropy should increase, or at least not decrease. This is because $A$ is not a function of those random samples. Does this sound reasonable?



    What I have found is that, as you add columns (those actually sampled from the function), the entropy goes down. However, if you substitute random columns instead of the actual columns which are actual samples of the function, you get no increase in conditional entropy. I must conclude that you cannot use Conditional Entropy for this purpose, finding correlations. Is that so?



    Update:
    I believe what is going on here is the curse of dimensionality. As I add columns, the size of the space of the domain increases exponentially. Thus, any fixed, finite set of samples tends toward distinct points, which satisfy the criteria of being "the evaluation of a function at an element". I am working with Real numbers as data, so I am cutting my columns into bins and this number of bins is a parameter that I can adjust in accordance with the number of samples I have. with that said, I still think the problem with this method is exponential explosion of the size of the Domain space. I want to try PCA to solve this. Does this sound like a good idea? Recall that I am trying to prove that $A$ is a function of $N-1$ other columns using Multivariate Conditional Entropy.



    Note: I have tried this question over at Cross Validated but it is not getting much attention: https://stats.stackexchange.com/questions/388609/multivariate-conditional-entropy-as-a-test-of-correlation-between-random-variabl



    This might be an example of performing feature selection via minimizing multivariate conditional entropy. There are already information theoretic algorithms for feature selection. Perhaps I am suggesting a new one, but I am not sure.



    I have added the category theory tag because my work actually involves, according to my working theory of what I am doing, the learning of finitely presented categories themselves. Thus, this specific problem is the learning of a function, which is the core of all concrete categories. See my research here










    share|cite|improve this question











    $endgroup$















      4












      4








      4





      $begingroup$


      I use the word columns to mean the data from which a random variable can be estimated. It is a sample of a random variable.



      I am working with $N$ columns of weakly correlated data. Furthermore, I am assuming that column $A$ (within this data) is well correlated with the other $N-1$ columns of this data. That is to say, I believe that column $A$ is a function (roughly, due to noise) of the other $N-1$ columns in my data. I would like to prove this. We say that column 1, $C_1$, is sampled from set $X_1$, but specifically, we are sampling a function as I describe next. Suppose we have a function $f:(X_1,X_2,cdots , X_{N-1}) rightarrow X_A$, then a sample is the $k$th row, $(C_1(k), cdots, C_{N-1}(k), A(k))$. I have discovered that Multivariate Conditional Entropy is a measure of whether or not a column is a function of $M$ other columns. This is to say that when all columns are sampled, with no noise, from an exact function $f:(X_1,X_2,cdots , X_{N-1}) rightarrow X_A$, then the multivariate conditional entropy is zero, so $H(A|C1, cdots C_{N-1}) = 0$. I want to use this fact on actual data where there is noise. What I expect is that, as I add columns, the conditional entropy should go down. So the series
      $$H(A|(C_1, cdots C_j))$$
      goes to zero as as $j rightarrow N-1$. Also, if I were to throw in some random columns, ie, columns randomly sampled from the same space as $C_i$, the conditional entropy should increase, or at least not decrease. This is because $A$ is not a function of those random samples. Does this sound reasonable?



      What I have found is that, as you add columns (those actually sampled from the function), the entropy goes down. However, if you substitute random columns instead of the actual columns which are actual samples of the function, you get no increase in conditional entropy. I must conclude that you cannot use Conditional Entropy for this purpose, finding correlations. Is that so?



      Update:
      I believe what is going on here is the curse of dimensionality. As I add columns, the size of the space of the domain increases exponentially. Thus, any fixed, finite set of samples tends toward distinct points, which satisfy the criteria of being "the evaluation of a function at an element". I am working with Real numbers as data, so I am cutting my columns into bins and this number of bins is a parameter that I can adjust in accordance with the number of samples I have. with that said, I still think the problem with this method is exponential explosion of the size of the Domain space. I want to try PCA to solve this. Does this sound like a good idea? Recall that I am trying to prove that $A$ is a function of $N-1$ other columns using Multivariate Conditional Entropy.



      Note: I have tried this question over at Cross Validated but it is not getting much attention: https://stats.stackexchange.com/questions/388609/multivariate-conditional-entropy-as-a-test-of-correlation-between-random-variabl



      This might be an example of performing feature selection via minimizing multivariate conditional entropy. There are already information theoretic algorithms for feature selection. Perhaps I am suggesting a new one, but I am not sure.



      I have added the category theory tag because my work actually involves, according to my working theory of what I am doing, the learning of finitely presented categories themselves. Thus, this specific problem is the learning of a function, which is the core of all concrete categories. See my research here










      share|cite|improve this question











      $endgroup$




      I use the word columns to mean the data from which a random variable can be estimated. It is a sample of a random variable.



      I am working with $N$ columns of weakly correlated data. Furthermore, I am assuming that column $A$ (within this data) is well correlated with the other $N-1$ columns of this data. That is to say, I believe that column $A$ is a function (roughly, due to noise) of the other $N-1$ columns in my data. I would like to prove this. We say that column 1, $C_1$, is sampled from set $X_1$, but specifically, we are sampling a function as I describe next. Suppose we have a function $f:(X_1,X_2,cdots , X_{N-1}) rightarrow X_A$, then a sample is the $k$th row, $(C_1(k), cdots, C_{N-1}(k), A(k))$. I have discovered that Multivariate Conditional Entropy is a measure of whether or not a column is a function of $M$ other columns. This is to say that when all columns are sampled, with no noise, from an exact function $f:(X_1,X_2,cdots , X_{N-1}) rightarrow X_A$, then the multivariate conditional entropy is zero, so $H(A|C1, cdots C_{N-1}) = 0$. I want to use this fact on actual data where there is noise. What I expect is that, as I add columns, the conditional entropy should go down. So the series
      $$H(A|(C_1, cdots C_j))$$
      goes to zero as as $j rightarrow N-1$. Also, if I were to throw in some random columns, ie, columns randomly sampled from the same space as $C_i$, the conditional entropy should increase, or at least not decrease. This is because $A$ is not a function of those random samples. Does this sound reasonable?



      What I have found is that, as you add columns (those actually sampled from the function), the entropy goes down. However, if you substitute random columns instead of the actual columns which are actual samples of the function, you get no increase in conditional entropy. I must conclude that you cannot use Conditional Entropy for this purpose, finding correlations. Is that so?



      Update:
      I believe what is going on here is the curse of dimensionality. As I add columns, the size of the space of the domain increases exponentially. Thus, any fixed, finite set of samples tends toward distinct points, which satisfy the criteria of being "the evaluation of a function at an element". I am working with Real numbers as data, so I am cutting my columns into bins and this number of bins is a parameter that I can adjust in accordance with the number of samples I have. with that said, I still think the problem with this method is exponential explosion of the size of the Domain space. I want to try PCA to solve this. Does this sound like a good idea? Recall that I am trying to prove that $A$ is a function of $N-1$ other columns using Multivariate Conditional Entropy.



      Note: I have tried this question over at Cross Validated but it is not getting much attention: https://stats.stackexchange.com/questions/388609/multivariate-conditional-entropy-as-a-test-of-correlation-between-random-variabl



      This might be an example of performing feature selection via minimizing multivariate conditional entropy. There are already information theoretic algorithms for feature selection. Perhaps I am suggesting a new one, but I am not sure.



      I have added the category theory tag because my work actually involves, according to my working theory of what I am doing, the learning of finitely presented categories themselves. Thus, this specific problem is the learning of a function, which is the core of all concrete categories. See my research here







      category-theory random-variables machine-learning information-theory conditional-probability






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Jan 31 at 15:30







      Ben Sprott

















      asked Jan 23 at 16:56









      Ben SprottBen Sprott

      426312




      426312






















          0






          active

          oldest

          votes











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "69"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: true,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: 10,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3084746%2fmultivariate-conditional-entropy-as-a-test-of-correlation-between-random-variabl%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          0






          active

          oldest

          votes








          0






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes
















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Mathematics Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3084746%2fmultivariate-conditional-entropy-as-a-test-of-correlation-between-random-variabl%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Mario Kart Wii

          What does “Dominus providebit” mean?

          The Binding of Isaac: Rebirth/Afterbirth