Figuring out distribution from adding smaller distributions?

Suppose sam wants to know how long it usually takes him to get to work. He wants to know the 50th percentile, 90th percentile, and 99th percentile of how long, in minutes, it takes him to get to work.

Sam's route to work is split up into N segments. For each segment, the time it takes him to traverse that segment is drawn from some distribution over minutes.

Suppose I know the distribution of each segment. That is, for each segment, I know how long it takes to traverse that segment at the 50th percentile, the 90th percentile, the 99th percentile, etc.

How can I figure out the distribution of it takes Sam to get to work from knowing the distributions of the segments?

(Sorry if something doesn't make sense -- edits are welcome. For software engineers: I'm actually trying to figure out how to estimate the latency of a service call composed of several other service calls)

asked Jan 18 at 3:41

Andre

$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50

$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04

$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10

$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57

add a comment |

Sam's route to work is split up into N segments. For each segment, the time it takes him to traverse that segment is drawn from some distribution over minutes.

Suppose I know the distribution of each segment. That is, for each segment, I know how long it takes to traverse that segment at the 50th percentile, the 90th percentile, the 99th percentile, etc.

How can I figure out the distribution of it takes Sam to get to work from knowing the distributions of the segments?

asked Jan 18 at 3:41

Andre

$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50

$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04

$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10

$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57

add a comment |

Sam's route to work is split up into N segments. For each segment, the time it takes him to traverse that segment is drawn from some distribution over minutes.

Suppose I know the distribution of each segment. That is, for each segment, I know how long it takes to traverse that segment at the 50th percentile, the 90th percentile, the 99th percentile, etc.

How can I figure out the distribution of it takes Sam to get to work from knowing the distributions of the segments?

asked Jan 18 at 3:41

Andre

Sam's route to work is split up into N segments. For each segment, the time it takes him to traverse that segment is drawn from some distribution over minutes.

Suppose I know the distribution of each segment. That is, for each segment, I know how long it takes to traverse that segment at the 50th percentile, the 90th percentile, the 99th percentile, etc.

How can I figure out the distribution of it takes Sam to get to work from knowing the distributions of the segments?

statistics probability-distributions

asked Jan 18 at 3:41

Andre

asked Jan 18 at 3:41

Andre

asked Jan 18 at 3:41

Andre

asked Jan 18 at 3:41

Andre

asked Jan 18 at 3:41

Andre

$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50

$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04

$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10

$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57

add a comment |

$begingroup$
If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.
$endgroup$
– BruceET
Jan 18 at 8:50

$begingroup$
In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.
$endgroup$
– user121049
Jan 18 at 9:04

$begingroup$
Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.
$endgroup$
– Andre
Jan 18 at 17:10

$begingroup$
Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.
$endgroup$
– BruceET
Jan 18 at 18:57

If each segment is normal, then you can deduce its mean and variance from the 50th, 90th and 99th percentiles. Then you can add the N means to get the overall mean and (assuming segments are independently distributed) add the N variances to get the overall variance. The sum of normal segments will be normal, From the overall mean and variance you can deduce any desired percentiles. // This is a harder problem if your segments are not normal. // Another approach: if you know the distn's of independent segments it easy to simulate the dist'n of the total, and then find quantiles.

– BruceET
Jan 18 at 8:50

In your travelling example Sam may walk quicker if the train in the last section was delayed. Could something similar be true in your actual application? If so, you have correlations and it all gets more complicated.

– user121049
Jan 18 at 9:04

Each segment is independent of the others, N is somewhat small, and the segments are not normal, and their distributions are unknown.

– Andre
Jan 18 at 17:10

Mean of sum of RVs is sum of individual means. Unfortunately, there is no such relationship for medians. If distributions are symmetrical medians are near means and you might get some sort of rough approx. In my answer, this works roughly for symmetrical normal and uniform dist'ns but not at all for highly skewed exponential distn's. // I'd try to get data to learn something about distn's and then use simulation.

– BruceET
Jan 18 at 18:57

add a comment |

1 Answer
1

active

oldest

votes

Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.

Ten normal segments. Suppose each of ten independent segments is
distributed $mathsf{Norm}(mu = 10, sigma=1).$ Then quantiles .5, .9 and .99 for each segment are about 10, 11.3, and 12.3, respectively. (Computations in R.)

qnorm(c(.5,.9,.99), 10, 1)

10.00000 11.28155 12.32635

The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$

qnorm(c(.5,.9,.99), 100, sqrt(10))

100.0000 104.0526 107.3566

Ten exponential segments. Suppose each of ten independent segments is
distributed $mathsf{Exp}(text{rate} = .1).$ Then quantiles .5, .9 and .99 for each segment are about 6.9, 23, and 46, respectively. (The mean and standard deviation are both $10.)$

qexp(c(.5,.9,.99), .1)

 6.931472 23.025851 46.051702

The sum of ten such segments is distributed $mathsf{Gamma}(text{shape}=10, text{rate}=.1.)$ (The mean is 100 and the variance is 1000.) The corresponding quantiles of the sum are about 97, 142, and 188.

qgamma(c(.5,.9,.99), 10, .1)

96.68715 142.05990 187.83117

Similar approximate results from a simulation:

set.seed(118)

x = replicate(10^6, sum(rexp(10,.1)))  # vector or a million sums of ten

mean(x); var(x); quantile(x, c(.5,.9,.99))

99.97961

1000.581

      50%       90%       99% 

 96.67126 142.06101 188.03925

Sum of a dozen uniform segments.
Suppose each of ten independent segments is
distributed $mathsf{Unif}(0,1).$ Then quantiles .5, .9 and .99 for each segment are .5, .9, and .99, respectively. (The mean is 1/2 and the variance is 1/12.)

qunif(c(.5,.9,.99))

0.50 0.90 0.99

According to the Central Limit Theorem, the sum of 12 such segments is distributed nearly as $mathsf{Norm}(mu=6, sigma=1)$ From simulation, the approximate corresponding quantiles of the sum are about 6, 7.3, and 8.3.

set.seed(2019)

x = replicate(10^6, sum(runif(12)))

mean(x); var(x); quantile(x, c(.5,.9,.99))

6.001354

1.000741

     50%      90%      99% 

6.002158 7.289854 8.310085

Note: In general, if the segments are independent and identically distribututed with known mean and variance, and there are enough of them that the Central Limit
Theorem applies, then you might find the mean and variance of the nearly normal sum, and from them the desired quantiles.

edited Jan 18 at 10:29

answered Jan 18 at 9:51

BruceET

35.6k71440

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3077809%2ffiguring-out-distribution-from-adding-smaller-distributions%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.

qnorm(c(.5,.9,.99), 10, 1)

10.00000 11.28155 12.32635

The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$

qnorm(c(.5,.9,.99), 100, sqrt(10))

100.0000 104.0526 107.3566

qexp(c(.5,.9,.99), .1)

 6.931472 23.025851 46.051702

qgamma(c(.5,.9,.99), 10, .1)

96.68715 142.05990 187.83117

Similar approximate results from a simulation:

set.seed(118)

x = replicate(10^6, sum(rexp(10,.1)))  # vector or a million sums of ten

mean(x); var(x); quantile(x, c(.5,.9,.99))

99.97961

1000.581

      50%       90%       99% 

 96.67126 142.06101 188.03925

qunif(c(.5,.9,.99))

0.50 0.90 0.99

set.seed(2019)

x = replicate(10^6, sum(runif(12)))

mean(x); var(x); quantile(x, c(.5,.9,.99))

6.001354

1.000741

     50%      90%      99% 

6.002158 7.289854 8.310085

edited Jan 18 at 10:29

answered Jan 18 at 9:51

BruceET

35.6k71440

add a comment |

Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.

qnorm(c(.5,.9,.99), 10, 1)

10.00000 11.28155 12.32635

The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$

qnorm(c(.5,.9,.99), 100, sqrt(10))

100.0000 104.0526 107.3566

qexp(c(.5,.9,.99), .1)

 6.931472 23.025851 46.051702

qgamma(c(.5,.9,.99), 10, .1)

96.68715 142.05990 187.83117

Similar approximate results from a simulation:

set.seed(118)

x = replicate(10^6, sum(rexp(10,.1)))  # vector or a million sums of ten

mean(x); var(x); quantile(x, c(.5,.9,.99))

99.97961

1000.581

      50%       90%       99% 

 96.67126 142.06101 188.03925

qunif(c(.5,.9,.99))

0.50 0.90 0.99

set.seed(2019)

x = replicate(10^6, sum(runif(12)))

mean(x); var(x); quantile(x, c(.5,.9,.99))

6.001354

1.000741

     50%      90%      99% 

6.002158 7.289854 8.310085

edited Jan 18 at 10:29

answered Jan 18 at 9:51

BruceET

35.6k71440

add a comment |

Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.

qnorm(c(.5,.9,.99), 10, 1)

10.00000 11.28155 12.32635

The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$

qnorm(c(.5,.9,.99), 100, sqrt(10))

100.0000 104.0526 107.3566

qexp(c(.5,.9,.99), .1)

 6.931472 23.025851 46.051702

qgamma(c(.5,.9,.99), 10, .1)

96.68715 142.05990 187.83117

Similar approximate results from a simulation:

set.seed(118)

x = replicate(10^6, sum(rexp(10,.1)))  # vector or a million sums of ten

mean(x); var(x); quantile(x, c(.5,.9,.99))

99.97961

1000.581

      50%       90%       99% 

 96.67126 142.06101 188.03925

qunif(c(.5,.9,.99))

0.50 0.90 0.99

set.seed(2019)

x = replicate(10^6, sum(runif(12)))

mean(x); var(x); quantile(x, c(.5,.9,.99))

6.001354

1.000741

     50%      90%      99% 

6.002158 7.289854 8.310085

edited Jan 18 at 10:29

answered Jan 18 at 9:51

BruceET

35.6k71440

Comment continued: Here are three specific examples to illustrate
see that finding the exact quantiles of sums of distributions is not trivial.

qnorm(c(.5,.9,.99), 10, 1)

10.00000 11.28155 12.32635

The sum of ten such segments is distributed $mathsf{Norm}(mu=100, sigma=sqrt{10}).$
The corresponding quantiles of this distribution are about 100, 104, and 107.$

qnorm(c(.5,.9,.99), 100, sqrt(10))

100.0000 104.0526 107.3566

qexp(c(.5,.9,.99), .1)

 6.931472 23.025851 46.051702

qgamma(c(.5,.9,.99), 10, .1)

96.68715 142.05990 187.83117

Similar approximate results from a simulation:

set.seed(118)

x = replicate(10^6, sum(rexp(10,.1)))  # vector or a million sums of ten

mean(x); var(x); quantile(x, c(.5,.9,.99))

99.97961

1000.581

      50%       90%       99% 

 96.67126 142.06101 188.03925

qunif(c(.5,.9,.99))

0.50 0.90 0.99

set.seed(2019)

x = replicate(10^6, sum(runif(12)))

mean(x); var(x); quantile(x, c(.5,.9,.99))

6.001354

1.000741

     50%      90%      99% 

6.002158 7.289854 8.310085

edited Jan 18 at 10:29

answered Jan 18 at 9:51

BruceET

35.6k71440

edited Jan 18 at 10:29

answered Jan 18 at 9:51

BruceET

35.6k71440

answered Jan 18 at 9:51

BruceET

35.6k71440

answered Jan 18 at 9:51

BruceET

35.6k71440

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Vtgyjfy