Bound error due to approximation by Markov chain
$begingroup$
Basic setting
I am working with a sequence of random variables $mathbf{X} := X_1, X_2, dots$, for which I know the Markov property does not hold exactly, but approximately:
$$
Pr[X_{n+1}=x mid X_{n}=x_{n}] approx Pr[X_{n+1}=x mid X_1=x_1, dots X_{n}=x_{n}]
$$
I am approximating $mathbf{X}$ by a Markov chain $mathbf{Y}$, given by
$$
Pr[Y_{n+1}=x mid Y_{n}=x_{n}] := Pr[X_{n+1}=x mid X_{n}=x_{n}]
$$
Goal
I want to bound the error introduced by approximating $mathbf{X}$ by $mathbf{Y}$. One reasonable approach is measuring the KL divergence (I am open to other approaches if needed):
$$
D_text{KL}(mathbf{X} || mathbf{Y}) := mathbb{E}_{x sim mathbf{X}}left[log frac{Pr[mathbf{X}=x]}{Pr[mathbf{Y} = y]}right]
$$
Question
Is there a reasonable/interpretable assumption I can place on $mathbf{X}$ that
ensures that $mathbf{X}$ is close to its Markov chain approximation
$mathbf{Y}$?
Details
- Technically, I am asking for an assumption on $mathbf{X}$ that implies that $D_text{KL}(mathbf{X} || mathbf{Y})$ is small.
- Of course, I can directly bound $D_text{KL}(mathbf{X} || mathbf{Y})$, but I would prefer to place a more fundamental, simpler assumption on $mathbf{X}$ that implies small $D_text{KL}(mathbf{X} || mathbf{Y})$.
- Ideally, the assumption should be standard/well-established.
- In my case, I am actually working with a Markov chain of order $m$, but I am assuming any answer for order $1$ can be generalized to order $m$.
- In my case, my Markov chain is finite, but again, I am assuming this does not make much of a difference.
soft-question markov-chains markov-process
$endgroup$
add a comment |
$begingroup$
Basic setting
I am working with a sequence of random variables $mathbf{X} := X_1, X_2, dots$, for which I know the Markov property does not hold exactly, but approximately:
$$
Pr[X_{n+1}=x mid X_{n}=x_{n}] approx Pr[X_{n+1}=x mid X_1=x_1, dots X_{n}=x_{n}]
$$
I am approximating $mathbf{X}$ by a Markov chain $mathbf{Y}$, given by
$$
Pr[Y_{n+1}=x mid Y_{n}=x_{n}] := Pr[X_{n+1}=x mid X_{n}=x_{n}]
$$
Goal
I want to bound the error introduced by approximating $mathbf{X}$ by $mathbf{Y}$. One reasonable approach is measuring the KL divergence (I am open to other approaches if needed):
$$
D_text{KL}(mathbf{X} || mathbf{Y}) := mathbb{E}_{x sim mathbf{X}}left[log frac{Pr[mathbf{X}=x]}{Pr[mathbf{Y} = y]}right]
$$
Question
Is there a reasonable/interpretable assumption I can place on $mathbf{X}$ that
ensures that $mathbf{X}$ is close to its Markov chain approximation
$mathbf{Y}$?
Details
- Technically, I am asking for an assumption on $mathbf{X}$ that implies that $D_text{KL}(mathbf{X} || mathbf{Y})$ is small.
- Of course, I can directly bound $D_text{KL}(mathbf{X} || mathbf{Y})$, but I would prefer to place a more fundamental, simpler assumption on $mathbf{X}$ that implies small $D_text{KL}(mathbf{X} || mathbf{Y})$.
- Ideally, the assumption should be standard/well-established.
- In my case, I am actually working with a Markov chain of order $m$, but I am assuming any answer for order $1$ can be generalized to order $m$.
- In my case, my Markov chain is finite, but again, I am assuming this does not make much of a difference.
soft-question markov-chains markov-process
$endgroup$
$begingroup$
I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
$endgroup$
– Micapps
Feb 5 at 11:15
$begingroup$
@Micapps this was a typo, thanks for pointing it out. should make more sense now.
$endgroup$
– Peter
Feb 5 at 12:34
add a comment |
$begingroup$
Basic setting
I am working with a sequence of random variables $mathbf{X} := X_1, X_2, dots$, for which I know the Markov property does not hold exactly, but approximately:
$$
Pr[X_{n+1}=x mid X_{n}=x_{n}] approx Pr[X_{n+1}=x mid X_1=x_1, dots X_{n}=x_{n}]
$$
I am approximating $mathbf{X}$ by a Markov chain $mathbf{Y}$, given by
$$
Pr[Y_{n+1}=x mid Y_{n}=x_{n}] := Pr[X_{n+1}=x mid X_{n}=x_{n}]
$$
Goal
I want to bound the error introduced by approximating $mathbf{X}$ by $mathbf{Y}$. One reasonable approach is measuring the KL divergence (I am open to other approaches if needed):
$$
D_text{KL}(mathbf{X} || mathbf{Y}) := mathbb{E}_{x sim mathbf{X}}left[log frac{Pr[mathbf{X}=x]}{Pr[mathbf{Y} = y]}right]
$$
Question
Is there a reasonable/interpretable assumption I can place on $mathbf{X}$ that
ensures that $mathbf{X}$ is close to its Markov chain approximation
$mathbf{Y}$?
Details
- Technically, I am asking for an assumption on $mathbf{X}$ that implies that $D_text{KL}(mathbf{X} || mathbf{Y})$ is small.
- Of course, I can directly bound $D_text{KL}(mathbf{X} || mathbf{Y})$, but I would prefer to place a more fundamental, simpler assumption on $mathbf{X}$ that implies small $D_text{KL}(mathbf{X} || mathbf{Y})$.
- Ideally, the assumption should be standard/well-established.
- In my case, I am actually working with a Markov chain of order $m$, but I am assuming any answer for order $1$ can be generalized to order $m$.
- In my case, my Markov chain is finite, but again, I am assuming this does not make much of a difference.
soft-question markov-chains markov-process
$endgroup$
Basic setting
I am working with a sequence of random variables $mathbf{X} := X_1, X_2, dots$, for which I know the Markov property does not hold exactly, but approximately:
$$
Pr[X_{n+1}=x mid X_{n}=x_{n}] approx Pr[X_{n+1}=x mid X_1=x_1, dots X_{n}=x_{n}]
$$
I am approximating $mathbf{X}$ by a Markov chain $mathbf{Y}$, given by
$$
Pr[Y_{n+1}=x mid Y_{n}=x_{n}] := Pr[X_{n+1}=x mid X_{n}=x_{n}]
$$
Goal
I want to bound the error introduced by approximating $mathbf{X}$ by $mathbf{Y}$. One reasonable approach is measuring the KL divergence (I am open to other approaches if needed):
$$
D_text{KL}(mathbf{X} || mathbf{Y}) := mathbb{E}_{x sim mathbf{X}}left[log frac{Pr[mathbf{X}=x]}{Pr[mathbf{Y} = y]}right]
$$
Question
Is there a reasonable/interpretable assumption I can place on $mathbf{X}$ that
ensures that $mathbf{X}$ is close to its Markov chain approximation
$mathbf{Y}$?
Details
- Technically, I am asking for an assumption on $mathbf{X}$ that implies that $D_text{KL}(mathbf{X} || mathbf{Y})$ is small.
- Of course, I can directly bound $D_text{KL}(mathbf{X} || mathbf{Y})$, but I would prefer to place a more fundamental, simpler assumption on $mathbf{X}$ that implies small $D_text{KL}(mathbf{X} || mathbf{Y})$.
- Ideally, the assumption should be standard/well-established.
- In my case, I am actually working with a Markov chain of order $m$, but I am assuming any answer for order $1$ can be generalized to order $m$.
- In my case, my Markov chain is finite, but again, I am assuming this does not make much of a difference.
soft-question markov-chains markov-process
soft-question markov-chains markov-process
edited Feb 5 at 12:33
Peter
asked Jan 25 at 11:44
PeterPeter
368114
368114
$begingroup$
I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
$endgroup$
– Micapps
Feb 5 at 11:15
$begingroup$
@Micapps this was a typo, thanks for pointing it out. should make more sense now.
$endgroup$
– Peter
Feb 5 at 12:34
add a comment |
$begingroup$
I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
$endgroup$
– Micapps
Feb 5 at 11:15
$begingroup$
@Micapps this was a typo, thanks for pointing it out. should make more sense now.
$endgroup$
– Peter
Feb 5 at 12:34
$begingroup$
I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
$endgroup$
– Micapps
Feb 5 at 11:15
$begingroup$
I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
$endgroup$
– Micapps
Feb 5 at 11:15
$begingroup$
@Micapps this was a typo, thanks for pointing it out. should make more sense now.
$endgroup$
– Peter
Feb 5 at 12:34
$begingroup$
@Micapps this was a typo, thanks for pointing it out. should make more sense now.
$endgroup$
– Peter
Feb 5 at 12:34
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.
For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.
This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).
In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.
$endgroup$
$begingroup$
I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
$endgroup$
– Peter
Feb 6 at 9:20
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3087003%2fbound-error-due-to-approximation-by-markov-chain%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.
For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.
This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).
In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.
$endgroup$
$begingroup$
I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
$endgroup$
– Peter
Feb 6 at 9:20
add a comment |
$begingroup$
I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.
For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.
This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).
In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.
$endgroup$
$begingroup$
I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
$endgroup$
– Peter
Feb 6 at 9:20
add a comment |
$begingroup$
I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.
For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.
This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).
In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.
$endgroup$
I can't see a simple solution to that as the problems (or opportunities perhaps) arise even in the case when both $X$ and $Y$ are Markov.
For arbitrarily small $varepsilon$ we can perturb the equation of the $X$ chain and get a $Y$ with governing equation different at any point by at most $varepsilon$ (the difference in transition probabilities for example) and having equilibrium distribution with $D_{KL}( X || Y )$ as small as desired.
This can be done for example by taking Metropolis-Hastings algorithm realization chains with the same proposition distribution and two completely different target distributions provided they vary slow enough (e.g. they are Lipshitz with sufficiently small constant and proposition distribution decays quickly enough) to ensure that disturbance in the transition matrix is small enough (cf. Wiki: Metropolis-Hastings algorithm).
In summary if equilibrium behavior is the thing you're trying to model even very small change in the equations can change it completely, if on the other hand you're only interested in local behaviour there might be some hope.
answered Feb 5 at 18:30
RadostRadost
61910
61910
$begingroup$
I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
$endgroup$
– Peter
Feb 6 at 9:20
add a comment |
$begingroup$
I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
$endgroup$
– Peter
Feb 6 at 9:20
$begingroup$
I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
$endgroup$
– Peter
Feb 6 at 9:20
$begingroup$
I am not sure I fully understand this answer. First, I $X$ and $Y$ are both Markov, then $X=Y$, so no problem can arise? Second, my question is not about the equilibrium distribution (maybe it is ambiguous?), so it seems the rest of the answer does not apply? Happy to be wrong though...
$endgroup$
– Peter
Feb 6 at 9:20
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3087003%2fbound-error-due-to-approximation-by-markov-chain%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
I didn't understand the definition of $Y$. Specifically, what is $x_1$ used in the definition?
$endgroup$
– Micapps
Feb 5 at 11:15
$begingroup$
@Micapps this was a typo, thanks for pointing it out. should make more sense now.
$endgroup$
– Peter
Feb 5 at 12:34