Bound error due to approximation by Markov chain












1












$begingroup$


Basic setting



I am working with a sequence of random variables $mathbf{X} := X_1, X_2, dots$, for which I know the Markov property does not hold exactly, but approximately:
$$
Pr[X_{n+1}=x mid X_{n}=x_{n}] approx Pr[X_{n+1}=x mid X_1=x_1, dots X_{n}=x_{n}]
$$



I am approximating $mathbf{X}$ by a Markov chain $mathbf{Y}$, given by
$$
Pr[Y_{n+1}=x mid Y_{n}=x_{n}] := Pr[X_{n+1}=x mid X_{n}=x_{n}]
$$



Goal



I want to bound the error introduced by approximating $mathbf{X}$ by $mathbf{Y}$. One reasonable approach is measuring the KL divergence (I am open to other approaches if needed):
$$
D_text{KL}(mathbf{X} || mathbf{Y}) := mathbb{E}_{x sim mathbf{X}}left[log frac{Pr[mathbf{X}=x]}{Pr[mathbf{Y} = y]}right]
$$




Question



Is there a reasonable/interpretable assumption I can place on $mathbf{X}$ that
ensures that $mathbf{X}$ is close to its Markov chain approximation
$mathbf{Y}$?




Details




  • Technically, I am asking for an assumption on $mathbf{X}$ that implies that $D_text{KL}(mathbf{X} || mathbf{Y})$ is small.

  • Of course, I can directly bound $D_text{KL}(mathbf{X} || mathbf{Y})$, but I would prefer to place a more fundamental, simpler assumption on $mathbf{X}$ that implies small $D_text{KL}(mathbf{X} || mathbf{Y})$.

  • Ideally, the assumption should be standard/well-established.

  • In my case, I am actually working with a Markov chain of order $m$, but I am assuming any answer for order $1$ can be generalized to order $m$.

  • In my case, my Markov chain is finite, but again, I am assuming this does not make much of a difference.










share|cite|improve this question











$endgroup$












  • $begingroup$
    I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
    $endgroup$
    – Micapps
    Feb 5 at 11:15












  • $begingroup$
    @Micapps this was a typo, thanks for pointing it out. should make more sense now.
    $endgroup$
    – Peter
    Feb 5 at 12:34
















1












$begingroup$


Basic setting



I am working with a sequence of random variables $mathbf{X} := X_1, X_2, dots$, for which I know the Markov property does not hold exactly, but approximately:
$$
Pr[X_{n+1}=x mid X_{n}=x_{n}] approx Pr[X_{n+1}=x mid X_1=x_1, dots X_{n}=x_{n}]
$$



I am approximating $mathbf{X}$ by a Markov chain $mathbf{Y}$, given by
$$
Pr[Y_{n+1}=x mid Y_{n}=x_{n}] := Pr[X_{n+1}=x mid X_{n}=x_{n}]
$$



Goal



I want to bound the error introduced by approximating $mathbf{X}$ by $mathbf{Y}$. One reasonable approach is measuring the KL divergence (I am open to other approaches if needed):
$$
D_text{KL}(mathbf{X} || mathbf{Y}) := mathbb{E}_{x sim mathbf{X}}left[log frac{Pr[mathbf{X}=x]}{Pr[mathbf{Y} = y]}right]
$$




Question



Is there a reasonable/interpretable assumption I can place on $mathbf{X}$ that
ensures that $mathbf{X}$ is close to its Markov chain approximation
$mathbf{Y}$?




Details




  • Technically, I am asking for an assumption on $mathbf{X}$ that implies that $D_text{KL}(mathbf{X} || mathbf{Y})$ is small.

  • Of course, I can directly bound $D_text{KL}(mathbf{X} || mathbf{Y})$, but I would prefer to place a more fundamental, simpler assumption on $mathbf{X}$ that implies small $D_text{KL}(mathbf{X} || mathbf{Y})$.

  • Ideally, the assumption should be standard/well-established.

  • In my case, I am actually working with a Markov chain of order $m$, but I am assuming any answer for order $1$ can be generalized to order $m$.

  • In my case, my Markov chain is finite, but again, I am assuming this does not make much of a difference.










share|cite|improve this question











$endgroup$












  • $begingroup$
    I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
    $endgroup$
    – Micapps
    Feb 5 at 11:15












  • $begingroup$
    @Micapps this was a typo, thanks for pointing it out. should make more sense now.
    $endgroup$
    – Peter
    Feb 5 at 12:34














1












1








1





$begingroup$


Basic setting



I am working with a sequence of random variables $mathbf{X} := X_1, X_2, dots$, for which I know the Markov property does not hold exactly, but approximately:
$$
Pr[X_{n+1}=x mid X_{n}=x_{n}] approx Pr[X_{n+1}=x mid X_1=x_1, dots X_{n}=x_{n}]
$$



I am approximating $mathbf{X}$ by a Markov chain $mathbf{Y}$, given by
$$
Pr[Y_{n+1}=x mid Y_{n}=x_{n}] := Pr[X_{n+1}=x mid X_{n}=x_{n}]
$$



Goal



I want to bound the error introduced by approximating $mathbf{X}$ by $mathbf{Y}$. One reasonable approach is measuring the KL divergence (I am open to other approaches if needed):
$$
D_text{KL}(mathbf{X} || mathbf{Y}) := mathbb{E}_{x sim mathbf{X}}left[log frac{Pr[mathbf{X}=x]}{Pr[mathbf{Y} = y]}right]
$$




Question



Is there a reasonable/interpretable assumption I can place on $mathbf{X}$ that
ensures that $mathbf{X}$ is close to its Markov chain approximation
$mathbf{Y}$?




Details




  • Technically, I am asking for an assumption on $mathbf{X}$ that implies that $D_text{KL}(mathbf{X} || mathbf{Y})$ is small.

  • Of course, I can directly bound $D_text{KL}(mathbf{X} || mathbf{Y})$, but I would prefer to place a more fundamental, simpler assumption on $mathbf{X}$ that implies small $D_text{KL}(mathbf{X} || mathbf{Y})$.

  • Ideally, the assumption should be standard/well-established.

  • In my case, I am actually working with a Markov chain of order $m$, but I am assuming any answer for order $1$ can be generalized to order $m$.

  • In my case, my Markov chain is finite, but again, I am assuming this does not make much of a difference.










share|cite|improve this question











$endgroup$




Basic setting



I am working with a sequence of random variables $mathbf{X} := X_1, X_2, dots$, for which I know the Markov property does not hold exactly, but approximately:
$$
Pr[X_{n+1}=x mid X_{n}=x_{n}] approx Pr[X_{n+1}=x mid X_1=x_1, dots X_{n}=x_{n}]
$$



I am approximating $mathbf{X}$ by a Markov chain $mathbf{Y}$, given by
$$
Pr[Y_{n+1}=x mid Y_{n}=x_{n}] := Pr[X_{n+1}=x mid X_{n}=x_{n}]
$$



Goal



I want to bound the error introduced by approximating $mathbf{X}$ by $mathbf{Y}$. One reasonable approach is measuring the KL divergence (I am open to other approaches if needed):
$$
D_text{KL}(mathbf{X} || mathbf{Y}) := mathbb{E}_{x sim mathbf{X}}left[log frac{Pr[mathbf{X}=x]}{Pr[mathbf{Y} = y]}right]
$$




Question



Is there a reasonable/interpretable assumption I can place on $mathbf{X}$ that
ensures that $mathbf{X}$ is close to its Markov chain approximation
$mathbf{Y}$?




Details




  • Technically, I am asking for an assumption on $mathbf{X}$ that implies that $D_text{KL}(mathbf{X} || mathbf{Y})$ is small.

  • Of course, I can directly bound $D_text{KL}(mathbf{X} || mathbf{Y})$, but I would prefer to place a more fundamental, simpler assumption on $mathbf{X}$ that implies small $D_text{KL}(mathbf{X} || mathbf{Y})$.

  • Ideally, the assumption should be standard/well-established.

  • In my case, I am actually working with a Markov chain of order $m$, but I am assuming any answer for order $1$ can be generalized to order $m$.

  • In my case, my Markov chain is finite, but again, I am assuming this does not make much of a difference.







soft-question markov-chains markov-process






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Feb 5 at 12:33







Peter

















asked Jan 25 at 11:44









PeterPeter

368114




368114












  • $begingroup$
    I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
    $endgroup$
    – Micapps
    Feb 5 at 11:15












  • $begingroup$
    @Micapps this was a typo, thanks for pointing it out. should make more sense now.
    $endgroup$
    – Peter
    Feb 5 at 12:34


















  • $begingroup$
    I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
    $endgroup$
    – Micapps
    Feb 5 at 11:15












  • $begingroup$
    @Micapps this was a typo, thanks for pointing it out. should make more sense now.
    $endgroup$
    – Peter
    Feb 5 at 12:34
















$begingroup$
I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
$endgroup$
– Micapps
Feb 5 at 11:15






$begingroup$
I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
$endgroup$
– Micapps
Feb 5 at 11:15














$begingroup$
@Micapps this was a typo, thanks for pointing it out. should make more sense now.
$endgroup$
– Peter
Feb 5 at 12:34




$begingroup$
@Micapps this was a typo, thanks for pointing it out. should make more sense now.
$endgroup$
– Peter
Feb 5 at 12:34










1 Answer
1






active

oldest

votes


















0












$begingroup$

I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.



For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.



This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).



In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
    $endgroup$
    – Peter
    Feb 6 at 9:20











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3087003%2fbound-error-due-to-approximation-by-markov-chain%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









0












$begingroup$

I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.



For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.



This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).



In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
    $endgroup$
    – Peter
    Feb 6 at 9:20
















0












$begingroup$

I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.



For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.



This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).



In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.






share|cite|improve this answer









$endgroup$













  • $begingroup$
    I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
    $endgroup$
    – Peter
    Feb 6 at 9:20














0












0








0





$begingroup$

I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.



For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.



This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).



In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.






share|cite|improve this answer









$endgroup$



I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.



For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.



This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).



In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Feb 5 at 18:30









RadostRadost

61910




61910












  • $begingroup$
    I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
    $endgroup$
    – Peter
    Feb 6 at 9:20


















  • $begingroup$
    I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
    $endgroup$
    – Peter
    Feb 6 at 9:20
















$begingroup$
I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
$endgroup$
– Peter
Feb 6 at 9:20




$begingroup$
I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
$endgroup$
– Peter
Feb 6 at 9:20


















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3087003%2fbound-error-due-to-approximation-by-markov-chain%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Mario Kart Wii

What does “Dominus providebit” mean?

Antonio Litta Visconti Arese