Figuring out distribution from adding smaller distributions?
$begingroup$
Suppose sam wants to know how long it usually takes him to get to work. He wants to know the 50th percentile, 90th percentile, and 99th percentile of how long, in minutes, it takes him to get to work.
Sam's route to work is split up into N segments. For each segment, the time it takes him to traverse that segment is drawn from some distribution over minutes.
Suppose I know the distribution of each segment. That is, for each segment, I know how long it takes to traverse that segment at the 50th percentile, the 90th percentile, the 99th percentile, etc.
How can I figure out the distribution of it takes Sam to get to work from knowing the distributions of the segments?
(Sorry if something doesn't make sense -- edits are welcome. For software engineers: I'm actually trying to figure out how to estimate the latency of a service call composed of several other service calls)
statistics probability-distributions
$endgroup$
add a comment |
$begingroup$
Suppose sam wants to know how long it usually takes him to get to work. He wants to know the 50th percentile, 90th percentile, and 99th percentile of how long, in minutes, it takes him to get to work.
Sam's route to work is split up into N segments. For each segment, the time it takes him to traverse that segment is drawn from some distribution over minutes.
Suppose I know the distribution of each segment. That is, for each segment, I know how long it takes to traverse that segment at the 50th percentile, the 90th percentile, the 99th percentile, etc.
How can I figure out the distribution of it takes Sam to get to work from knowing the distributions of the segments?
(Sorry if something doesn't make sense -- edits are welcome. For software engineers: I'm actually trying to figure out how to estimate the latency of a service call composed of several other service calls)
statistics probability-distributions
$endgroup$
$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50
$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04
$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10
$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57
add a comment |
$begingroup$
Suppose sam wants to know how long it usually takes him to get to work. He wants to know the 50th percentile, 90th percentile, and 99th percentile of how long, in minutes, it takes him to get to work.
Sam's route to work is split up into N segments. For each segment, the time it takes him to traverse that segment is drawn from some distribution over minutes.
Suppose I know the distribution of each segment. That is, for each segment, I know how long it takes to traverse that segment at the 50th percentile, the 90th percentile, the 99th percentile, etc.
How can I figure out the distribution of it takes Sam to get to work from knowing the distributions of the segments?
(Sorry if something doesn't make sense -- edits are welcome. For software engineers: I'm actually trying to figure out how to estimate the latency of a service call composed of several other service calls)
statistics probability-distributions
$endgroup$
Suppose sam wants to know how long it usually takes him to get to work. He wants to know the 50th percentile, 90th percentile, and 99th percentile of how long, in minutes, it takes him to get to work.
Sam's route to work is split up into N segments. For each segment, the time it takes him to traverse that segment is drawn from some distribution over minutes.
Suppose I know the distribution of each segment. That is, for each segment, I know how long it takes to traverse that segment at the 50th percentile, the 90th percentile, the 99th percentile, etc.
How can I figure out the distribution of it takes Sam to get to work from knowing the distributions of the segments?
(Sorry if something doesn't make sense -- edits are welcome. For software engineers: I'm actually trying to figure out how to estimate the latency of a service call composed of several other service calls)
statistics probability-distributions
statistics probability-distributions
asked Jan 18 at 3:41
AndreAndre
61
61
$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50
$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04
$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10
$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57
add a comment |
$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50
$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04
$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10
$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57
$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50
$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50
$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04
$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04
$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10
$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10
$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57
$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.
Ten normal segments. Suppose each of ten independent segments is
distributed $mathsf{Norm}(mu = 10, sigma=1).$ Then quantiles .5, .9 and .99 for each segment are about 10, 11.3, and 12.3, respectively. (Computations in R.)
qnorm(c(.5,.9,.99), 10, 1)
10.00000 11.28155 12.32635
The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$
qnorm(c(.5,.9,.99), 100, sqrt(10))
100.0000 104.0526 107.3566
Ten exponential segments. Suppose each of ten independent segments is
distributed $mathsf{Exp}(text{rate} = .1).$ Then quantiles .5, .9 and .99 for each segment are about 6.9, 23, and 46, respectively. (The mean and standard deviation are both $10.)$
qexp(c(.5,.9,.99), .1)
6.931472 23.025851 46.051702
The sum of ten such segments is distributed $mathsf{Gamma}(text{shape}=10, text{rate}=.1.)$ (The mean is 100 and the variance is 1000.) The corresponding quantiles of the sum are about 97, 142, and 188.
qgamma(c(.5,.9,.99), 10, .1)
96.68715 142.05990 187.83117
Similar approximate results from a simulation:
set.seed(118)
x = replicate(10^6, sum(rexp(10,.1))) # vector or a million sums of ten
mean(x); var(x); quantile(x, c(.5,.9,.99))
99.97961
1000.581
50% 90% 99%
96.67126 142.06101 188.03925
Sum of a dozen uniform segments.
Suppose each of ten independent segments is
distributed $mathsf{Unif}(0,1).$ Then quantiles .5, .9 and .99 for each segment are .5, .9, and .99, respectively. (The mean is 1/2 and the variance is 1/12.)
qunif(c(.5,.9,.99))
0.50 0.90 0.99
According to the Central Limit Theorem, the sum of 12 such segments is distributed nearly as $mathsf{Norm}(mu=6, sigma=1)$ From simulation, the approximate corresponding quantiles of the sum are about 6, 7.3, and 8.3.
set.seed(2019)
x = replicate(10^6, sum(runif(12)))
mean(x); var(x); quantile(x, c(.5,.9,.99))
6.001354
1.000741
50% 90% 99%
6.002158 7.289854 8.310085
Note: In general, if the segments are independent and identically distribututed with known mean and variance, and there are enough of them that the Central Limit
Theorem applies, then you might find the mean and variance of the nearly normal sum, and from them the desired quantiles.
$endgroup$
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3077809%2ffiguring-out-distribution-from-adding-smaller-distributions%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.
Ten normal segments. Suppose each of ten independent segments is
distributed $mathsf{Norm}(mu = 10, sigma=1).$ Then quantiles .5, .9 and .99 for each segment are about 10, 11.3, and 12.3, respectively. (Computations in R.)
qnorm(c(.5,.9,.99), 10, 1)
10.00000 11.28155 12.32635
The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$
qnorm(c(.5,.9,.99), 100, sqrt(10))
100.0000 104.0526 107.3566
Ten exponential segments. Suppose each of ten independent segments is
distributed $mathsf{Exp}(text{rate} = .1).$ Then quantiles .5, .9 and .99 for each segment are about 6.9, 23, and 46, respectively. (The mean and standard deviation are both $10.)$
qexp(c(.5,.9,.99), .1)
6.931472 23.025851 46.051702
The sum of ten such segments is distributed $mathsf{Gamma}(text{shape}=10, text{rate}=.1.)$ (The mean is 100 and the variance is 1000.) The corresponding quantiles of the sum are about 97, 142, and 188.
qgamma(c(.5,.9,.99), 10, .1)
96.68715 142.05990 187.83117
Similar approximate results from a simulation:
set.seed(118)
x = replicate(10^6, sum(rexp(10,.1))) # vector or a million sums of ten
mean(x); var(x); quantile(x, c(.5,.9,.99))
99.97961
1000.581
50% 90% 99%
96.67126 142.06101 188.03925
Sum of a dozen uniform segments.
Suppose each of ten independent segments is
distributed $mathsf{Unif}(0,1).$ Then quantiles .5, .9 and .99 for each segment are .5, .9, and .99, respectively. (The mean is 1/2 and the variance is 1/12.)
qunif(c(.5,.9,.99))
0.50 0.90 0.99
According to the Central Limit Theorem, the sum of 12 such segments is distributed nearly as $mathsf{Norm}(mu=6, sigma=1)$ From simulation, the approximate corresponding quantiles of the sum are about 6, 7.3, and 8.3.
set.seed(2019)
x = replicate(10^6, sum(runif(12)))
mean(x); var(x); quantile(x, c(.5,.9,.99))
6.001354
1.000741
50% 90% 99%
6.002158 7.289854 8.310085
Note: In general, if the segments are independent and identically distribututed with known mean and variance, and there are enough of them that the Central Limit
Theorem applies, then you might find the mean and variance of the nearly normal sum, and from them the desired quantiles.
$endgroup$
add a comment |
$begingroup$
Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.
Ten normal segments. Suppose each of ten independent segments is
distributed $mathsf{Norm}(mu = 10, sigma=1).$ Then quantiles .5, .9 and .99 for each segment are about 10, 11.3, and 12.3, respectively. (Computations in R.)
qnorm(c(.5,.9,.99), 10, 1)
10.00000 11.28155 12.32635
The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$
qnorm(c(.5,.9,.99), 100, sqrt(10))
100.0000 104.0526 107.3566
Ten exponential segments. Suppose each of ten independent segments is
distributed $mathsf{Exp}(text{rate} = .1).$ Then quantiles .5, .9 and .99 for each segment are about 6.9, 23, and 46, respectively. (The mean and standard deviation are both $10.)$
qexp(c(.5,.9,.99), .1)
6.931472 23.025851 46.051702
The sum of ten such segments is distributed $mathsf{Gamma}(text{shape}=10, text{rate}=.1.)$ (The mean is 100 and the variance is 1000.) The corresponding quantiles of the sum are about 97, 142, and 188.
qgamma(c(.5,.9,.99), 10, .1)
96.68715 142.05990 187.83117
Similar approximate results from a simulation:
set.seed(118)
x = replicate(10^6, sum(rexp(10,.1))) # vector or a million sums of ten
mean(x); var(x); quantile(x, c(.5,.9,.99))
99.97961
1000.581
50% 90% 99%
96.67126 142.06101 188.03925
Sum of a dozen uniform segments.
Suppose each of ten independent segments is
distributed $mathsf{Unif}(0,1).$ Then quantiles .5, .9 and .99 for each segment are .5, .9, and .99, respectively. (The mean is 1/2 and the variance is 1/12.)
qunif(c(.5,.9,.99))
0.50 0.90 0.99
According to the Central Limit Theorem, the sum of 12 such segments is distributed nearly as $mathsf{Norm}(mu=6, sigma=1)$ From simulation, the approximate corresponding quantiles of the sum are about 6, 7.3, and 8.3.
set.seed(2019)
x = replicate(10^6, sum(runif(12)))
mean(x); var(x); quantile(x, c(.5,.9,.99))
6.001354
1.000741
50% 90% 99%
6.002158 7.289854 8.310085
Note: In general, if the segments are independent and identically distribututed with known mean and variance, and there are enough of them that the Central Limit
Theorem applies, then you might find the mean and variance of the nearly normal sum, and from them the desired quantiles.
$endgroup$
add a comment |
$begingroup$
Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.
Ten normal segments. Suppose each of ten independent segments is
distributed $mathsf{Norm}(mu = 10, sigma=1).$ Then quantiles .5, .9 and .99 for each segment are about 10, 11.3, and 12.3, respectively. (Computations in R.)
qnorm(c(.5,.9,.99), 10, 1)
10.00000 11.28155 12.32635
The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$
qnorm(c(.5,.9,.99), 100, sqrt(10))
100.0000 104.0526 107.3566
Ten exponential segments. Suppose each of ten independent segments is
distributed $mathsf{Exp}(text{rate} = .1).$ Then quantiles .5, .9 and .99 for each segment are about 6.9, 23, and 46, respectively. (The mean and standard deviation are both $10.)$
qexp(c(.5,.9,.99), .1)
6.931472 23.025851 46.051702
The sum of ten such segments is distributed $mathsf{Gamma}(text{shape}=10, text{rate}=.1.)$ (The mean is 100 and the variance is 1000.) The corresponding quantiles of the sum are about 97, 142, and 188.
qgamma(c(.5,.9,.99), 10, .1)
96.68715 142.05990 187.83117
Similar approximate results from a simulation:
set.seed(118)
x = replicate(10^6, sum(rexp(10,.1))) # vector or a million sums of ten
mean(x); var(x); quantile(x, c(.5,.9,.99))
99.97961
1000.581
50% 90% 99%
96.67126 142.06101 188.03925
Sum of a dozen uniform segments.
Suppose each of ten independent segments is
distributed $mathsf{Unif}(0,1).$ Then quantiles .5, .9 and .99 for each segment are .5, .9, and .99, respectively. (The mean is 1/2 and the variance is 1/12.)
qunif(c(.5,.9,.99))
0.50 0.90 0.99
According to the Central Limit Theorem, the sum of 12 such segments is distributed nearly as $mathsf{Norm}(mu=6, sigma=1)$ From simulation, the approximate corresponding quantiles of the sum are about 6, 7.3, and 8.3.
set.seed(2019)
x = replicate(10^6, sum(runif(12)))
mean(x); var(x); quantile(x, c(.5,.9,.99))
6.001354
1.000741
50% 90% 99%
6.002158 7.289854 8.310085
Note: In general, if the segments are independent and identically distribututed with known mean and variance, and there are enough of them that the Central Limit
Theorem applies, then you might find the mean and variance of the nearly normal sum, and from them the desired quantiles.
$endgroup$
Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.
Ten normal segments. Suppose each of ten independent segments is
distributed $mathsf{Norm}(mu = 10, sigma=1).$ Then quantiles .5, .9 and .99 for each segment are about 10, 11.3, and 12.3, respectively. (Computations in R.)
qnorm(c(.5,.9,.99), 10, 1)
10.00000 11.28155 12.32635
The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$
qnorm(c(.5,.9,.99), 100, sqrt(10))
100.0000 104.0526 107.3566
Ten exponential segments. Suppose each of ten independent segments is
distributed $mathsf{Exp}(text{rate} = .1).$ Then quantiles .5, .9 and .99 for each segment are about 6.9, 23, and 46, respectively. (The mean and standard deviation are both $10.)$
qexp(c(.5,.9,.99), .1)
6.931472 23.025851 46.051702
The sum of ten such segments is distributed $mathsf{Gamma}(text{shape}=10, text{rate}=.1.)$ (The mean is 100 and the variance is 1000.) The corresponding quantiles of the sum are about 97, 142, and 188.
qgamma(c(.5,.9,.99), 10, .1)
96.68715 142.05990 187.83117
Similar approximate results from a simulation:
set.seed(118)
x = replicate(10^6, sum(rexp(10,.1))) # vector or a million sums of ten
mean(x); var(x); quantile(x, c(.5,.9,.99))
99.97961
1000.581
50% 90% 99%
96.67126 142.06101 188.03925
Sum of a dozen uniform segments.
Suppose each of ten independent segments is
distributed $mathsf{Unif}(0,1).$ Then quantiles .5, .9 and .99 for each segment are .5, .9, and .99, respectively. (The mean is 1/2 and the variance is 1/12.)
qunif(c(.5,.9,.99))
0.50 0.90 0.99
According to the Central Limit Theorem, the sum of 12 such segments is distributed nearly as $mathsf{Norm}(mu=6, sigma=1)$ From simulation, the approximate corresponding quantiles of the sum are about 6, 7.3, and 8.3.
set.seed(2019)
x = replicate(10^6, sum(runif(12)))
mean(x); var(x); quantile(x, c(.5,.9,.99))
6.001354
1.000741
50% 90% 99%
6.002158 7.289854 8.310085
Note: In general, if the segments are independent and identically distribututed with known mean and variance, and there are enough of them that the Central Limit
Theorem applies, then you might find the mean and variance of the nearly normal sum, and from them the desired quantiles.
edited Jan 18 at 10:29
answered Jan 18 at 9:51
BruceETBruceET
35.6k71440
35.6k71440
add a comment |
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3077809%2ffiguring-out-distribution-from-adding-smaller-distributions%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50
$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04
$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10
$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57