Using Tweets as a Random seed












15














I would like to start by saying I know nothing about Cryptography and was reading up on how to choose a random seed and this link is something that I found. What I basically understood that the seed has to be sufficiently random that guessing the seed would be hard.



So the question is would the hash of a Tweet, at any given time, be a good candidate for a random seed? This is mainly because the content of a Tweet can be practically anything as it's being generated by a huge percentage of the world population.



That said, I understand it is possible to game it by mass tweeting a specific string continuously from multiple accounts flooding the tweet stream with predictable seeds. So if this can be mitigated by blacklisting the bad usernames, is using tweets for seeds a viable option?










share|improve this question







New contributor




aa8y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 17




    Tweets aren't secret, so the attacker will know them.
    – James K Polk
    2 days ago










  • The idea was they would be a secret at any given point of time since multiple people could be producing them across the world any given microsecond.
    – aa8y
    yesterday






  • 1




    Do not take this account twitter.com/big_ben_clock as seed ;-)
    – ddddavidee
    2 hours ago
















15














I would like to start by saying I know nothing about Cryptography and was reading up on how to choose a random seed and this link is something that I found. What I basically understood that the seed has to be sufficiently random that guessing the seed would be hard.



So the question is would the hash of a Tweet, at any given time, be a good candidate for a random seed? This is mainly because the content of a Tweet can be practically anything as it's being generated by a huge percentage of the world population.



That said, I understand it is possible to game it by mass tweeting a specific string continuously from multiple accounts flooding the tweet stream with predictable seeds. So if this can be mitigated by blacklisting the bad usernames, is using tweets for seeds a viable option?










share|improve this question







New contributor




aa8y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 17




    Tweets aren't secret, so the attacker will know them.
    – James K Polk
    2 days ago










  • The idea was they would be a secret at any given point of time since multiple people could be producing them across the world any given microsecond.
    – aa8y
    yesterday






  • 1




    Do not take this account twitter.com/big_ben_clock as seed ;-)
    – ddddavidee
    2 hours ago














15












15








15


2





I would like to start by saying I know nothing about Cryptography and was reading up on how to choose a random seed and this link is something that I found. What I basically understood that the seed has to be sufficiently random that guessing the seed would be hard.



So the question is would the hash of a Tweet, at any given time, be a good candidate for a random seed? This is mainly because the content of a Tweet can be practically anything as it's being generated by a huge percentage of the world population.



That said, I understand it is possible to game it by mass tweeting a specific string continuously from multiple accounts flooding the tweet stream with predictable seeds. So if this can be mitigated by blacklisting the bad usernames, is using tweets for seeds a viable option?










share|improve this question







New contributor




aa8y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I would like to start by saying I know nothing about Cryptography and was reading up on how to choose a random seed and this link is something that I found. What I basically understood that the seed has to be sufficiently random that guessing the seed would be hard.



So the question is would the hash of a Tweet, at any given time, be a good candidate for a random seed? This is mainly because the content of a Tweet can be practically anything as it's being generated by a huge percentage of the world population.



That said, I understand it is possible to game it by mass tweeting a specific string continuously from multiple accounts flooding the tweet stream with predictable seeds. So if this can be mitigated by blacklisting the bad usernames, is using tweets for seeds a viable option?







randomness pseudo-random-generator






share|improve this question







New contributor




aa8y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




aa8y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




aa8y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 2 days ago









aa8yaa8y

18415




18415




New contributor




aa8y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





aa8y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






aa8y is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 17




    Tweets aren't secret, so the attacker will know them.
    – James K Polk
    2 days ago










  • The idea was they would be a secret at any given point of time since multiple people could be producing them across the world any given microsecond.
    – aa8y
    yesterday






  • 1




    Do not take this account twitter.com/big_ben_clock as seed ;-)
    – ddddavidee
    2 hours ago














  • 17




    Tweets aren't secret, so the attacker will know them.
    – James K Polk
    2 days ago










  • The idea was they would be a secret at any given point of time since multiple people could be producing them across the world any given microsecond.
    – aa8y
    yesterday






  • 1




    Do not take this account twitter.com/big_ben_clock as seed ;-)
    – ddddavidee
    2 hours ago








17




17




Tweets aren't secret, so the attacker will know them.
– James K Polk
2 days ago




Tweets aren't secret, so the attacker will know them.
– James K Polk
2 days ago












The idea was they would be a secret at any given point of time since multiple people could be producing them across the world any given microsecond.
– aa8y
yesterday




The idea was they would be a secret at any given point of time since multiple people could be producing them across the world any given microsecond.
– aa8y
yesterday




1




1




Do not take this account twitter.com/big_ben_clock as seed ;-)
– ddddavidee
2 hours ago




Do not take this account twitter.com/big_ben_clock as seed ;-)
– ddddavidee
2 hours ago










4 Answers
4






active

oldest

votes


















18














What you are suggesting is not a good idea for a general purpose random number generator. It could be meaningful for very specific use cases if you need a random number generator whose output can be verified independently by a third party.



Even in those cases there are other sources of entropy which are potentially more suitable. The oldest mention of this approach known to me is RFC 2777. The suggested sources of entropy listed in RFC 2777 are:




  • lottery winning numbers

  • closing price of a stock on a particular day

  • daily balance in the US Treasury on a specified day

  • the volume of trading on the New York Stock exchange on a specified day

  • Sporting events


Every one of those looks like they are less likely to be subject to manipulation than posts on Twitter.



Reasons it's not a good general purpose approach



You'll have a cyclic dependency. Before you can retrieve posts from Twitter you'll need random numbers for a number of different purposes including:




  • If you use IPv4 you'll need randomness for the IPID header field.

  • If you use IPv6 you'll very likely need randomness for address configuration.

  • You need randomness to assign request IDs.

  • You need randomness for TCP sequence numbers.

  • You need randomness for SSL session setup.


Moreover the entropy of a Twitter post is hard to estimate. Some individual posts may have sufficient entropy on their own, but many will not. It's probably a safe estimate that posts have at least one bit of entropy on average, so if you were to hash together a thousand posts, you'd probably get sufficient entropy.



The resulting output is subject to manipulation by Twitter users. If your algorithm is known a user can compute what seed you'd calculate with different contents of their latest post and choose contents producing randomness that somehow suits that user.



The resulting output is also subject to manipulation by Twitter. Surely there will be Twitter employees who have access to information which will make the manipulation possible by any Twitter user even easier to pull off.



All of the input to the random number generator will be publicly known. That is bad for a general purpose random number generator, but can be useful in a few very specific use cases.






share|improve this answer

















  • 1




    TCP sequence numbers should be random to prevent TCP hijacking, but SSL defends against man-in-the-middle. A simple counter to avoid reuse of the same sequence number between TCP sessions with same same IP and port pairs would be sufficient, although maybe not as resistant to a DOS? But only if you have good randomness for SSL. So maybe your bullet list should say you want (not need) randomness for TCP. Similarly, The IP fragment ID just needs to be unique, not random, for each packet that isn't a fragment of a larger packet. SSL assumes lower layers are insecure.
    – Peter Cordes
    yesterday






  • 1




    @PeterCordes However, the randomness for securing the connection cannot be easily be removed, especially not for common services as twitter. Once those switch over to TLS 1.3 I'd say you'd at least need 64 bits of randomness. Maybe you could get around the issue by implementing a special version of TLS, but at least the ephemeral DH key pair generation would be affected, and the hello contains 32 bits of randomness as well.
    – Maarten Bodewes
    yesterday










  • @MaartenBodewes: Oh yeah, there's definitely a catch22 or chicken/egg problem here, and this answer is correct that you do need secure random numbers at some point to communicate securely over the Internet. But not at every layer, just in TLS I think. Crappy random numbers or fixed seeds or non-random sequences work for IP and TCP, especially if you're securing the data with crypto that authenticates the server to the client. But it won't work for TLS if you want TLS to actually protect you.
    – Peter Cordes
    yesterday










  • The random examples on your answer are called markov processes.
    – Pedro Lobito
    yesterday












  • Why would you use public data like stocks as a seed instead of directly publishing the seed? One would need to know your code to verify your results anyway, therefore just putting the seed in there (hardcoded or as text) seems far easier and less error-prone.
    – Sebb
    5 hours ago



















20














The other answers provide very good lists of reasons not to use Twitter as an entropy source. What follows is the flip side of your question:-



Why would you want to?



Tweets are typically read on tablets, PCs and phones. All of those have access to hardware entropy sources that can produce oodles of truly random bits for seeding anything. The zeitgeist is that you aim for 128 or 256 bits of entropy and then seed a cryptographically secure pseudo random number generator. That will meet all of your common random number needs.



You have seeding sources such as:-




  • The RdRand instruction built into most modern CPUs.


  • /dev/*random as part of the various favours of *nix.

  • Microsoft's Cryptography API.

  • The cameras built into phones and tablets.


There's not a lot of merit in pursuing Tweety entropy, other than for academic purposes.






share|improve this answer



















  • 2




    I understand that there are multiple better sources for seeding a random number generator. What I wanted to understand was why one can or cannot use a user generated social media content as a source for seeding.
    – aa8y
    yesterday










  • @aa8y I hoped to offer more secure alternatives, within the context of your opening " I know nothing about Cryptography". Quoting from a 1996 edition IT book led me to believe that you'd be unaware of (now) common hardware like RdRand and the ability to use camera devices as TRNGs.
    – Paul Uszak
    yesterday










  • dev/urandom is not a "seeding source", it's PRNG output. even /dev/random is not actually raw randomness, it's the same mechanism as urandom, but rate limited based on the estimated rate of incoming entropy from hardware. I assume the same applies to the MS Crypto API. so only the first and last list items count as real "seeding sources".
    – dn3s
    11 hours ago



















15














How are you going to decide which tweet to use? Randomly? This quickly leads to a chicken / egg problem.



What if the chosen tweet is one word? That would not add a lot of entropy.



What if twitter is unavailable? Are you just stopping your service that relies on the entropy or are you going to continue regardless?



How are you going to keep the chosen tweet secret? You can use TLS, but TLS requires a random number generator to operate.



How are you going to blacklist in advance? You don't know the attackers in advance, right?



What if twitter changes his API? Would you keep running if the tweet collection agent crashes or returns bad results?



What if your government decides to block Twitter? There are plenty of governments doing that.



What if you choose a heavily retweeted tweet? How much entropy would that contain?



Having something that provides entropy is just the first step. In general you want something that is local and hard to influence and easy to understand / validate. Twitter doesn't seem to be a good option for any of those requirements.






share|improve this answer

















  • 1




    I would take slight issue with "What if the chosen tweet is one word? That would not add a lot of entropy." as the definition of entropy needs to be considered in terms of the entire selection process, not the end result. The empty string '' returned amongst other unique possibilities each with true randomness $p = 2^{-1000}$ would be part of an entropy source 1000 bits strong. That may be a separate issue to accidentally loading a string that others will test against - i.e. the assumed generation model that an attacker might use.
    – Neil Slater
    yesterday










  • @NeilSlater That's a good point; however unless you're actually enumerating all possibilities in the domain I'm guessing that you'd still be left with a small seed.
    – Maarten Bodewes
    yesterday










  • The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. As far as selecting a tweet of a certain size or one which has not been highly retweeted, that should be easy, right? And for govt banning twitter, I didn't think of that, but honestly, it doesn't have to be Twitter. It can be any user generated content which is hard to predict. And I wanted to know if that is good enough. Twitter was just an example.
    – aa8y
    yesterday










  • Yeah, OK, I can see you want to generalize this idea. But you'd still have boot problems, availablity problems, problems with the secrecy - actually most of the list. As choosing just the latest tweet: that's so much vulnerable to being influenced by an adversary that I don't even want to think about it.
    – Maarten Bodewes
    yesterday



















3














Other answers have already pointed out the chicken/egg catch22 problem of securely communicating over the Internet before you have a random number, and other showstoppers and possible problems. But you're screwen even against a fully-remote attacker that can't sniff your packets.




The OP commented:

The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. [...]




Tweets are public, and thus your pool of seeds is available to the attacker.



On average, Tweet throughput is around 6000 tweets per second (source). An attacker that can guess your tweet-query time within one second has a search space of about 6000 tweets. You could say that's equivalent to 12.5 bits of entropy, vastly smaller than the hash length. Or an attacker can widen the window to 1 minute for an equivalent entropy of 18.4 bits, still trivial to brute force in seconds, probably only limited by the time to download all those tweets.



If an attacker controls or knows when a seed was generated, you're screwed. The tighter a time bound they can put on it, the smaller their search space. Even worse, the attacker can simply keep widening their time window with earlier and earlier tweets if they don't find a hit in the first 1-second window they check.



Many use-cases for secure seeding of PRNGs expose the sequence to the attacker so they can test guesses of the seed. Try them with the same PRNG your software uses, and check whether the resulting sequence matches what they've already seen. Then, with high probability, they can predict the next number they'll see.



There can be false-positive matches that lead to the same initial sequence, for multiple reasons:




  • They can only see (or work backwards to) rng() & 0xff (low 8 bits) or rng() % 100 (or some better way of generating a 0..99 range), not the full 32 or 64-bit random number value of each PRNG step.

  • The PRNG has a large hidden internal state, and multiple initial states lead to the same sequence of random numbers. (This is already necessary so that knowing one rng result doesn't uniquely determine the next.)


But by observing enough random data from the same seed, an attack can test a seed to a very high probability.



With only 6000 possible candidates, the chances of one giving the same initial sequence you observed but actually being different is negligible.



And if you test them all over a likely window (and are right about that time window), you can detect when you've uniquely identified the one tweet that produces the sequence you're seeing, so you can potentially "lock on" quite quickly even if you don't get many bits of data per observation of the sequence.





If the random number was used as an encryption key, an attacker that can detect "sane looking" plaintext can still attack this way, even if the "sane-looking" check is very weak / inclusive.




  • Check which (of the ~6000) tweets as seeds lead to sane-looking plaintext from the first key.

  • Of those few candidate tweets, check which produce sane-looking plaintext from the second key generated from the same sequence. If there were multiple different possibly-sane plaintexts from the first key, this probably rules out most of them. Repeat as necessary.


This might not be the most plausible example, but this kind of idea is applicable for other kinds of things where you don't directly see the random sequence, only a cryptographically-secure use of it. But if you have any mechanism for testing a guess by going through all the steps the target of the attack would take, you can still attack.



Or if you can trigger a re-seed at some known time, and use the service with your own known data to get (probably) some of the first random values generated with that seed, you might be able to work out the seed that it will continue to use for other users' requests.



Only 6000 tweets is a small enough search space that you can start to expand your search space in other dimensions, like allowing for the possibility that other users' requests might have slipped in between yours while you're using it as an oracle to encrypt known plaintext that lets you check. (Or some equivalent thing that lets you really check your PRNG sequence guesses.)






share|improve this answer








New contributor




Peter Cordes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.


















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "281"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    noCode: true, onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    aa8y is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f66289%2fusing-tweets-as-a-random-seed%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    18














    What you are suggesting is not a good idea for a general purpose random number generator. It could be meaningful for very specific use cases if you need a random number generator whose output can be verified independently by a third party.



    Even in those cases there are other sources of entropy which are potentially more suitable. The oldest mention of this approach known to me is RFC 2777. The suggested sources of entropy listed in RFC 2777 are:




    • lottery winning numbers

    • closing price of a stock on a particular day

    • daily balance in the US Treasury on a specified day

    • the volume of trading on the New York Stock exchange on a specified day

    • Sporting events


    Every one of those looks like they are less likely to be subject to manipulation than posts on Twitter.



    Reasons it's not a good general purpose approach



    You'll have a cyclic dependency. Before you can retrieve posts from Twitter you'll need random numbers for a number of different purposes including:




    • If you use IPv4 you'll need randomness for the IPID header field.

    • If you use IPv6 you'll very likely need randomness for address configuration.

    • You need randomness to assign request IDs.

    • You need randomness for TCP sequence numbers.

    • You need randomness for SSL session setup.


    Moreover the entropy of a Twitter post is hard to estimate. Some individual posts may have sufficient entropy on their own, but many will not. It's probably a safe estimate that posts have at least one bit of entropy on average, so if you were to hash together a thousand posts, you'd probably get sufficient entropy.



    The resulting output is subject to manipulation by Twitter users. If your algorithm is known a user can compute what seed you'd calculate with different contents of their latest post and choose contents producing randomness that somehow suits that user.



    The resulting output is also subject to manipulation by Twitter. Surely there will be Twitter employees who have access to information which will make the manipulation possible by any Twitter user even easier to pull off.



    All of the input to the random number generator will be publicly known. That is bad for a general purpose random number generator, but can be useful in a few very specific use cases.






    share|improve this answer

















    • 1




      TCP sequence numbers should be random to prevent TCP hijacking, but SSL defends against man-in-the-middle. A simple counter to avoid reuse of the same sequence number between TCP sessions with same same IP and port pairs would be sufficient, although maybe not as resistant to a DOS? But only if you have good randomness for SSL. So maybe your bullet list should say you want (not need) randomness for TCP. Similarly, The IP fragment ID just needs to be unique, not random, for each packet that isn't a fragment of a larger packet. SSL assumes lower layers are insecure.
      – Peter Cordes
      yesterday






    • 1




      @PeterCordes However, the randomness for securing the connection cannot be easily be removed, especially not for common services as twitter. Once those switch over to TLS 1.3 I'd say you'd at least need 64 bits of randomness. Maybe you could get around the issue by implementing a special version of TLS, but at least the ephemeral DH key pair generation would be affected, and the hello contains 32 bits of randomness as well.
      – Maarten Bodewes
      yesterday










    • @MaartenBodewes: Oh yeah, there's definitely a catch22 or chicken/egg problem here, and this answer is correct that you do need secure random numbers at some point to communicate securely over the Internet. But not at every layer, just in TLS I think. Crappy random numbers or fixed seeds or non-random sequences work for IP and TCP, especially if you're securing the data with crypto that authenticates the server to the client. But it won't work for TLS if you want TLS to actually protect you.
      – Peter Cordes
      yesterday










    • The random examples on your answer are called markov processes.
      – Pedro Lobito
      yesterday












    • Why would you use public data like stocks as a seed instead of directly publishing the seed? One would need to know your code to verify your results anyway, therefore just putting the seed in there (hardcoded or as text) seems far easier and less error-prone.
      – Sebb
      5 hours ago
















    18














    What you are suggesting is not a good idea for a general purpose random number generator. It could be meaningful for very specific use cases if you need a random number generator whose output can be verified independently by a third party.



    Even in those cases there are other sources of entropy which are potentially more suitable. The oldest mention of this approach known to me is RFC 2777. The suggested sources of entropy listed in RFC 2777 are:




    • lottery winning numbers

    • closing price of a stock on a particular day

    • daily balance in the US Treasury on a specified day

    • the volume of trading on the New York Stock exchange on a specified day

    • Sporting events


    Every one of those looks like they are less likely to be subject to manipulation than posts on Twitter.



    Reasons it's not a good general purpose approach



    You'll have a cyclic dependency. Before you can retrieve posts from Twitter you'll need random numbers for a number of different purposes including:




    • If you use IPv4 you'll need randomness for the IPID header field.

    • If you use IPv6 you'll very likely need randomness for address configuration.

    • You need randomness to assign request IDs.

    • You need randomness for TCP sequence numbers.

    • You need randomness for SSL session setup.


    Moreover the entropy of a Twitter post is hard to estimate. Some individual posts may have sufficient entropy on their own, but many will not. It's probably a safe estimate that posts have at least one bit of entropy on average, so if you were to hash together a thousand posts, you'd probably get sufficient entropy.



    The resulting output is subject to manipulation by Twitter users. If your algorithm is known a user can compute what seed you'd calculate with different contents of their latest post and choose contents producing randomness that somehow suits that user.



    The resulting output is also subject to manipulation by Twitter. Surely there will be Twitter employees who have access to information which will make the manipulation possible by any Twitter user even easier to pull off.



    All of the input to the random number generator will be publicly known. That is bad for a general purpose random number generator, but can be useful in a few very specific use cases.






    share|improve this answer

















    • 1




      TCP sequence numbers should be random to prevent TCP hijacking, but SSL defends against man-in-the-middle. A simple counter to avoid reuse of the same sequence number between TCP sessions with same same IP and port pairs would be sufficient, although maybe not as resistant to a DOS? But only if you have good randomness for SSL. So maybe your bullet list should say you want (not need) randomness for TCP. Similarly, The IP fragment ID just needs to be unique, not random, for each packet that isn't a fragment of a larger packet. SSL assumes lower layers are insecure.
      – Peter Cordes
      yesterday






    • 1




      @PeterCordes However, the randomness for securing the connection cannot be easily be removed, especially not for common services as twitter. Once those switch over to TLS 1.3 I'd say you'd at least need 64 bits of randomness. Maybe you could get around the issue by implementing a special version of TLS, but at least the ephemeral DH key pair generation would be affected, and the hello contains 32 bits of randomness as well.
      – Maarten Bodewes
      yesterday










    • @MaartenBodewes: Oh yeah, there's definitely a catch22 or chicken/egg problem here, and this answer is correct that you do need secure random numbers at some point to communicate securely over the Internet. But not at every layer, just in TLS I think. Crappy random numbers or fixed seeds or non-random sequences work for IP and TCP, especially if you're securing the data with crypto that authenticates the server to the client. But it won't work for TLS if you want TLS to actually protect you.
      – Peter Cordes
      yesterday










    • The random examples on your answer are called markov processes.
      – Pedro Lobito
      yesterday












    • Why would you use public data like stocks as a seed instead of directly publishing the seed? One would need to know your code to verify your results anyway, therefore just putting the seed in there (hardcoded or as text) seems far easier and less error-prone.
      – Sebb
      5 hours ago














    18












    18








    18






    What you are suggesting is not a good idea for a general purpose random number generator. It could be meaningful for very specific use cases if you need a random number generator whose output can be verified independently by a third party.



    Even in those cases there are other sources of entropy which are potentially more suitable. The oldest mention of this approach known to me is RFC 2777. The suggested sources of entropy listed in RFC 2777 are:




    • lottery winning numbers

    • closing price of a stock on a particular day

    • daily balance in the US Treasury on a specified day

    • the volume of trading on the New York Stock exchange on a specified day

    • Sporting events


    Every one of those looks like they are less likely to be subject to manipulation than posts on Twitter.



    Reasons it's not a good general purpose approach



    You'll have a cyclic dependency. Before you can retrieve posts from Twitter you'll need random numbers for a number of different purposes including:




    • If you use IPv4 you'll need randomness for the IPID header field.

    • If you use IPv6 you'll very likely need randomness for address configuration.

    • You need randomness to assign request IDs.

    • You need randomness for TCP sequence numbers.

    • You need randomness for SSL session setup.


    Moreover the entropy of a Twitter post is hard to estimate. Some individual posts may have sufficient entropy on their own, but many will not. It's probably a safe estimate that posts have at least one bit of entropy on average, so if you were to hash together a thousand posts, you'd probably get sufficient entropy.



    The resulting output is subject to manipulation by Twitter users. If your algorithm is known a user can compute what seed you'd calculate with different contents of their latest post and choose contents producing randomness that somehow suits that user.



    The resulting output is also subject to manipulation by Twitter. Surely there will be Twitter employees who have access to information which will make the manipulation possible by any Twitter user even easier to pull off.



    All of the input to the random number generator will be publicly known. That is bad for a general purpose random number generator, but can be useful in a few very specific use cases.






    share|improve this answer












    What you are suggesting is not a good idea for a general purpose random number generator. It could be meaningful for very specific use cases if you need a random number generator whose output can be verified independently by a third party.



    Even in those cases there are other sources of entropy which are potentially more suitable. The oldest mention of this approach known to me is RFC 2777. The suggested sources of entropy listed in RFC 2777 are:




    • lottery winning numbers

    • closing price of a stock on a particular day

    • daily balance in the US Treasury on a specified day

    • the volume of trading on the New York Stock exchange on a specified day

    • Sporting events


    Every one of those looks like they are less likely to be subject to manipulation than posts on Twitter.



    Reasons it's not a good general purpose approach



    You'll have a cyclic dependency. Before you can retrieve posts from Twitter you'll need random numbers for a number of different purposes including:




    • If you use IPv4 you'll need randomness for the IPID header field.

    • If you use IPv6 you'll very likely need randomness for address configuration.

    • You need randomness to assign request IDs.

    • You need randomness for TCP sequence numbers.

    • You need randomness for SSL session setup.


    Moreover the entropy of a Twitter post is hard to estimate. Some individual posts may have sufficient entropy on their own, but many will not. It's probably a safe estimate that posts have at least one bit of entropy on average, so if you were to hash together a thousand posts, you'd probably get sufficient entropy.



    The resulting output is subject to manipulation by Twitter users. If your algorithm is known a user can compute what seed you'd calculate with different contents of their latest post and choose contents producing randomness that somehow suits that user.



    The resulting output is also subject to manipulation by Twitter. Surely there will be Twitter employees who have access to information which will make the manipulation possible by any Twitter user even easier to pull off.



    All of the input to the random number generator will be publicly known. That is bad for a general purpose random number generator, but can be useful in a few very specific use cases.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered yesterday









    kasperdkasperd

    1,3071822




    1,3071822








    • 1




      TCP sequence numbers should be random to prevent TCP hijacking, but SSL defends against man-in-the-middle. A simple counter to avoid reuse of the same sequence number between TCP sessions with same same IP and port pairs would be sufficient, although maybe not as resistant to a DOS? But only if you have good randomness for SSL. So maybe your bullet list should say you want (not need) randomness for TCP. Similarly, The IP fragment ID just needs to be unique, not random, for each packet that isn't a fragment of a larger packet. SSL assumes lower layers are insecure.
      – Peter Cordes
      yesterday






    • 1




      @PeterCordes However, the randomness for securing the connection cannot be easily be removed, especially not for common services as twitter. Once those switch over to TLS 1.3 I'd say you'd at least need 64 bits of randomness. Maybe you could get around the issue by implementing a special version of TLS, but at least the ephemeral DH key pair generation would be affected, and the hello contains 32 bits of randomness as well.
      – Maarten Bodewes
      yesterday










    • @MaartenBodewes: Oh yeah, there's definitely a catch22 or chicken/egg problem here, and this answer is correct that you do need secure random numbers at some point to communicate securely over the Internet. But not at every layer, just in TLS I think. Crappy random numbers or fixed seeds or non-random sequences work for IP and TCP, especially if you're securing the data with crypto that authenticates the server to the client. But it won't work for TLS if you want TLS to actually protect you.
      – Peter Cordes
      yesterday










    • The random examples on your answer are called markov processes.
      – Pedro Lobito
      yesterday












    • Why would you use public data like stocks as a seed instead of directly publishing the seed? One would need to know your code to verify your results anyway, therefore just putting the seed in there (hardcoded or as text) seems far easier and less error-prone.
      – Sebb
      5 hours ago














    • 1




      TCP sequence numbers should be random to prevent TCP hijacking, but SSL defends against man-in-the-middle. A simple counter to avoid reuse of the same sequence number between TCP sessions with same same IP and port pairs would be sufficient, although maybe not as resistant to a DOS? But only if you have good randomness for SSL. So maybe your bullet list should say you want (not need) randomness for TCP. Similarly, The IP fragment ID just needs to be unique, not random, for each packet that isn't a fragment of a larger packet. SSL assumes lower layers are insecure.
      – Peter Cordes
      yesterday






    • 1




      @PeterCordes However, the randomness for securing the connection cannot be easily be removed, especially not for common services as twitter. Once those switch over to TLS 1.3 I'd say you'd at least need 64 bits of randomness. Maybe you could get around the issue by implementing a special version of TLS, but at least the ephemeral DH key pair generation would be affected, and the hello contains 32 bits of randomness as well.
      – Maarten Bodewes
      yesterday










    • @MaartenBodewes: Oh yeah, there's definitely a catch22 or chicken/egg problem here, and this answer is correct that you do need secure random numbers at some point to communicate securely over the Internet. But not at every layer, just in TLS I think. Crappy random numbers or fixed seeds or non-random sequences work for IP and TCP, especially if you're securing the data with crypto that authenticates the server to the client. But it won't work for TLS if you want TLS to actually protect you.
      – Peter Cordes
      yesterday










    • The random examples on your answer are called markov processes.
      – Pedro Lobito
      yesterday












    • Why would you use public data like stocks as a seed instead of directly publishing the seed? One would need to know your code to verify your results anyway, therefore just putting the seed in there (hardcoded or as text) seems far easier and less error-prone.
      – Sebb
      5 hours ago








    1




    1




    TCP sequence numbers should be random to prevent TCP hijacking, but SSL defends against man-in-the-middle. A simple counter to avoid reuse of the same sequence number between TCP sessions with same same IP and port pairs would be sufficient, although maybe not as resistant to a DOS? But only if you have good randomness for SSL. So maybe your bullet list should say you want (not need) randomness for TCP. Similarly, The IP fragment ID just needs to be unique, not random, for each packet that isn't a fragment of a larger packet. SSL assumes lower layers are insecure.
    – Peter Cordes
    yesterday




    TCP sequence numbers should be random to prevent TCP hijacking, but SSL defends against man-in-the-middle. A simple counter to avoid reuse of the same sequence number between TCP sessions with same same IP and port pairs would be sufficient, although maybe not as resistant to a DOS? But only if you have good randomness for SSL. So maybe your bullet list should say you want (not need) randomness for TCP. Similarly, The IP fragment ID just needs to be unique, not random, for each packet that isn't a fragment of a larger packet. SSL assumes lower layers are insecure.
    – Peter Cordes
    yesterday




    1




    1




    @PeterCordes However, the randomness for securing the connection cannot be easily be removed, especially not for common services as twitter. Once those switch over to TLS 1.3 I'd say you'd at least need 64 bits of randomness. Maybe you could get around the issue by implementing a special version of TLS, but at least the ephemeral DH key pair generation would be affected, and the hello contains 32 bits of randomness as well.
    – Maarten Bodewes
    yesterday




    @PeterCordes However, the randomness for securing the connection cannot be easily be removed, especially not for common services as twitter. Once those switch over to TLS 1.3 I'd say you'd at least need 64 bits of randomness. Maybe you could get around the issue by implementing a special version of TLS, but at least the ephemeral DH key pair generation would be affected, and the hello contains 32 bits of randomness as well.
    – Maarten Bodewes
    yesterday












    @MaartenBodewes: Oh yeah, there's definitely a catch22 or chicken/egg problem here, and this answer is correct that you do need secure random numbers at some point to communicate securely over the Internet. But not at every layer, just in TLS I think. Crappy random numbers or fixed seeds or non-random sequences work for IP and TCP, especially if you're securing the data with crypto that authenticates the server to the client. But it won't work for TLS if you want TLS to actually protect you.
    – Peter Cordes
    yesterday




    @MaartenBodewes: Oh yeah, there's definitely a catch22 or chicken/egg problem here, and this answer is correct that you do need secure random numbers at some point to communicate securely over the Internet. But not at every layer, just in TLS I think. Crappy random numbers or fixed seeds or non-random sequences work for IP and TCP, especially if you're securing the data with crypto that authenticates the server to the client. But it won't work for TLS if you want TLS to actually protect you.
    – Peter Cordes
    yesterday












    The random examples on your answer are called markov processes.
    – Pedro Lobito
    yesterday






    The random examples on your answer are called markov processes.
    – Pedro Lobito
    yesterday














    Why would you use public data like stocks as a seed instead of directly publishing the seed? One would need to know your code to verify your results anyway, therefore just putting the seed in there (hardcoded or as text) seems far easier and less error-prone.
    – Sebb
    5 hours ago




    Why would you use public data like stocks as a seed instead of directly publishing the seed? One would need to know your code to verify your results anyway, therefore just putting the seed in there (hardcoded or as text) seems far easier and less error-prone.
    – Sebb
    5 hours ago











    20














    The other answers provide very good lists of reasons not to use Twitter as an entropy source. What follows is the flip side of your question:-



    Why would you want to?



    Tweets are typically read on tablets, PCs and phones. All of those have access to hardware entropy sources that can produce oodles of truly random bits for seeding anything. The zeitgeist is that you aim for 128 or 256 bits of entropy and then seed a cryptographically secure pseudo random number generator. That will meet all of your common random number needs.



    You have seeding sources such as:-




    • The RdRand instruction built into most modern CPUs.


    • /dev/*random as part of the various favours of *nix.

    • Microsoft's Cryptography API.

    • The cameras built into phones and tablets.


    There's not a lot of merit in pursuing Tweety entropy, other than for academic purposes.






    share|improve this answer



















    • 2




      I understand that there are multiple better sources for seeding a random number generator. What I wanted to understand was why one can or cannot use a user generated social media content as a source for seeding.
      – aa8y
      yesterday










    • @aa8y I hoped to offer more secure alternatives, within the context of your opening " I know nothing about Cryptography". Quoting from a 1996 edition IT book led me to believe that you'd be unaware of (now) common hardware like RdRand and the ability to use camera devices as TRNGs.
      – Paul Uszak
      yesterday










    • dev/urandom is not a "seeding source", it's PRNG output. even /dev/random is not actually raw randomness, it's the same mechanism as urandom, but rate limited based on the estimated rate of incoming entropy from hardware. I assume the same applies to the MS Crypto API. so only the first and last list items count as real "seeding sources".
      – dn3s
      11 hours ago
















    20














    The other answers provide very good lists of reasons not to use Twitter as an entropy source. What follows is the flip side of your question:-



    Why would you want to?



    Tweets are typically read on tablets, PCs and phones. All of those have access to hardware entropy sources that can produce oodles of truly random bits for seeding anything. The zeitgeist is that you aim for 128 or 256 bits of entropy and then seed a cryptographically secure pseudo random number generator. That will meet all of your common random number needs.



    You have seeding sources such as:-




    • The RdRand instruction built into most modern CPUs.


    • /dev/*random as part of the various favours of *nix.

    • Microsoft's Cryptography API.

    • The cameras built into phones and tablets.


    There's not a lot of merit in pursuing Tweety entropy, other than for academic purposes.






    share|improve this answer



















    • 2




      I understand that there are multiple better sources for seeding a random number generator. What I wanted to understand was why one can or cannot use a user generated social media content as a source for seeding.
      – aa8y
      yesterday










    • @aa8y I hoped to offer more secure alternatives, within the context of your opening " I know nothing about Cryptography". Quoting from a 1996 edition IT book led me to believe that you'd be unaware of (now) common hardware like RdRand and the ability to use camera devices as TRNGs.
      – Paul Uszak
      yesterday










    • dev/urandom is not a "seeding source", it's PRNG output. even /dev/random is not actually raw randomness, it's the same mechanism as urandom, but rate limited based on the estimated rate of incoming entropy from hardware. I assume the same applies to the MS Crypto API. so only the first and last list items count as real "seeding sources".
      – dn3s
      11 hours ago














    20












    20








    20






    The other answers provide very good lists of reasons not to use Twitter as an entropy source. What follows is the flip side of your question:-



    Why would you want to?



    Tweets are typically read on tablets, PCs and phones. All of those have access to hardware entropy sources that can produce oodles of truly random bits for seeding anything. The zeitgeist is that you aim for 128 or 256 bits of entropy and then seed a cryptographically secure pseudo random number generator. That will meet all of your common random number needs.



    You have seeding sources such as:-




    • The RdRand instruction built into most modern CPUs.


    • /dev/*random as part of the various favours of *nix.

    • Microsoft's Cryptography API.

    • The cameras built into phones and tablets.


    There's not a lot of merit in pursuing Tweety entropy, other than for academic purposes.






    share|improve this answer














    The other answers provide very good lists of reasons not to use Twitter as an entropy source. What follows is the flip side of your question:-



    Why would you want to?



    Tweets are typically read on tablets, PCs and phones. All of those have access to hardware entropy sources that can produce oodles of truly random bits for seeding anything. The zeitgeist is that you aim for 128 or 256 bits of entropy and then seed a cryptographically secure pseudo random number generator. That will meet all of your common random number needs.



    You have seeding sources such as:-




    • The RdRand instruction built into most modern CPUs.


    • /dev/*random as part of the various favours of *nix.

    • Microsoft's Cryptography API.

    • The cameras built into phones and tablets.


    There's not a lot of merit in pursuing Tweety entropy, other than for academic purposes.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited yesterday

























    answered 2 days ago









    Paul UszakPaul Uszak

    7,21011535




    7,21011535








    • 2




      I understand that there are multiple better sources for seeding a random number generator. What I wanted to understand was why one can or cannot use a user generated social media content as a source for seeding.
      – aa8y
      yesterday










    • @aa8y I hoped to offer more secure alternatives, within the context of your opening " I know nothing about Cryptography". Quoting from a 1996 edition IT book led me to believe that you'd be unaware of (now) common hardware like RdRand and the ability to use camera devices as TRNGs.
      – Paul Uszak
      yesterday










    • dev/urandom is not a "seeding source", it's PRNG output. even /dev/random is not actually raw randomness, it's the same mechanism as urandom, but rate limited based on the estimated rate of incoming entropy from hardware. I assume the same applies to the MS Crypto API. so only the first and last list items count as real "seeding sources".
      – dn3s
      11 hours ago














    • 2




      I understand that there are multiple better sources for seeding a random number generator. What I wanted to understand was why one can or cannot use a user generated social media content as a source for seeding.
      – aa8y
      yesterday










    • @aa8y I hoped to offer more secure alternatives, within the context of your opening " I know nothing about Cryptography". Quoting from a 1996 edition IT book led me to believe that you'd be unaware of (now) common hardware like RdRand and the ability to use camera devices as TRNGs.
      – Paul Uszak
      yesterday










    • dev/urandom is not a "seeding source", it's PRNG output. even /dev/random is not actually raw randomness, it's the same mechanism as urandom, but rate limited based on the estimated rate of incoming entropy from hardware. I assume the same applies to the MS Crypto API. so only the first and last list items count as real "seeding sources".
      – dn3s
      11 hours ago








    2




    2




    I understand that there are multiple better sources for seeding a random number generator. What I wanted to understand was why one can or cannot use a user generated social media content as a source for seeding.
    – aa8y
    yesterday




    I understand that there are multiple better sources for seeding a random number generator. What I wanted to understand was why one can or cannot use a user generated social media content as a source for seeding.
    – aa8y
    yesterday












    @aa8y I hoped to offer more secure alternatives, within the context of your opening " I know nothing about Cryptography". Quoting from a 1996 edition IT book led me to believe that you'd be unaware of (now) common hardware like RdRand and the ability to use camera devices as TRNGs.
    – Paul Uszak
    yesterday




    @aa8y I hoped to offer more secure alternatives, within the context of your opening " I know nothing about Cryptography". Quoting from a 1996 edition IT book led me to believe that you'd be unaware of (now) common hardware like RdRand and the ability to use camera devices as TRNGs.
    – Paul Uszak
    yesterday












    dev/urandom is not a "seeding source", it's PRNG output. even /dev/random is not actually raw randomness, it's the same mechanism as urandom, but rate limited based on the estimated rate of incoming entropy from hardware. I assume the same applies to the MS Crypto API. so only the first and last list items count as real "seeding sources".
    – dn3s
    11 hours ago




    dev/urandom is not a "seeding source", it's PRNG output. even /dev/random is not actually raw randomness, it's the same mechanism as urandom, but rate limited based on the estimated rate of incoming entropy from hardware. I assume the same applies to the MS Crypto API. so only the first and last list items count as real "seeding sources".
    – dn3s
    11 hours ago











    15














    How are you going to decide which tweet to use? Randomly? This quickly leads to a chicken / egg problem.



    What if the chosen tweet is one word? That would not add a lot of entropy.



    What if twitter is unavailable? Are you just stopping your service that relies on the entropy or are you going to continue regardless?



    How are you going to keep the chosen tweet secret? You can use TLS, but TLS requires a random number generator to operate.



    How are you going to blacklist in advance? You don't know the attackers in advance, right?



    What if twitter changes his API? Would you keep running if the tweet collection agent crashes or returns bad results?



    What if your government decides to block Twitter? There are plenty of governments doing that.



    What if you choose a heavily retweeted tweet? How much entropy would that contain?



    Having something that provides entropy is just the first step. In general you want something that is local and hard to influence and easy to understand / validate. Twitter doesn't seem to be a good option for any of those requirements.






    share|improve this answer

















    • 1




      I would take slight issue with "What if the chosen tweet is one word? That would not add a lot of entropy." as the definition of entropy needs to be considered in terms of the entire selection process, not the end result. The empty string '' returned amongst other unique possibilities each with true randomness $p = 2^{-1000}$ would be part of an entropy source 1000 bits strong. That may be a separate issue to accidentally loading a string that others will test against - i.e. the assumed generation model that an attacker might use.
      – Neil Slater
      yesterday










    • @NeilSlater That's a good point; however unless you're actually enumerating all possibilities in the domain I'm guessing that you'd still be left with a small seed.
      – Maarten Bodewes
      yesterday










    • The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. As far as selecting a tweet of a certain size or one which has not been highly retweeted, that should be easy, right? And for govt banning twitter, I didn't think of that, but honestly, it doesn't have to be Twitter. It can be any user generated content which is hard to predict. And I wanted to know if that is good enough. Twitter was just an example.
      – aa8y
      yesterday










    • Yeah, OK, I can see you want to generalize this idea. But you'd still have boot problems, availablity problems, problems with the secrecy - actually most of the list. As choosing just the latest tweet: that's so much vulnerable to being influenced by an adversary that I don't even want to think about it.
      – Maarten Bodewes
      yesterday
















    15














    How are you going to decide which tweet to use? Randomly? This quickly leads to a chicken / egg problem.



    What if the chosen tweet is one word? That would not add a lot of entropy.



    What if twitter is unavailable? Are you just stopping your service that relies on the entropy or are you going to continue regardless?



    How are you going to keep the chosen tweet secret? You can use TLS, but TLS requires a random number generator to operate.



    How are you going to blacklist in advance? You don't know the attackers in advance, right?



    What if twitter changes his API? Would you keep running if the tweet collection agent crashes or returns bad results?



    What if your government decides to block Twitter? There are plenty of governments doing that.



    What if you choose a heavily retweeted tweet? How much entropy would that contain?



    Having something that provides entropy is just the first step. In general you want something that is local and hard to influence and easy to understand / validate. Twitter doesn't seem to be a good option for any of those requirements.






    share|improve this answer

















    • 1




      I would take slight issue with "What if the chosen tweet is one word? That would not add a lot of entropy." as the definition of entropy needs to be considered in terms of the entire selection process, not the end result. The empty string '' returned amongst other unique possibilities each with true randomness $p = 2^{-1000}$ would be part of an entropy source 1000 bits strong. That may be a separate issue to accidentally loading a string that others will test against - i.e. the assumed generation model that an attacker might use.
      – Neil Slater
      yesterday










    • @NeilSlater That's a good point; however unless you're actually enumerating all possibilities in the domain I'm guessing that you'd still be left with a small seed.
      – Maarten Bodewes
      yesterday










    • The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. As far as selecting a tweet of a certain size or one which has not been highly retweeted, that should be easy, right? And for govt banning twitter, I didn't think of that, but honestly, it doesn't have to be Twitter. It can be any user generated content which is hard to predict. And I wanted to know if that is good enough. Twitter was just an example.
      – aa8y
      yesterday










    • Yeah, OK, I can see you want to generalize this idea. But you'd still have boot problems, availablity problems, problems with the secrecy - actually most of the list. As choosing just the latest tweet: that's so much vulnerable to being influenced by an adversary that I don't even want to think about it.
      – Maarten Bodewes
      yesterday














    15












    15








    15






    How are you going to decide which tweet to use? Randomly? This quickly leads to a chicken / egg problem.



    What if the chosen tweet is one word? That would not add a lot of entropy.



    What if twitter is unavailable? Are you just stopping your service that relies on the entropy or are you going to continue regardless?



    How are you going to keep the chosen tweet secret? You can use TLS, but TLS requires a random number generator to operate.



    How are you going to blacklist in advance? You don't know the attackers in advance, right?



    What if twitter changes his API? Would you keep running if the tweet collection agent crashes or returns bad results?



    What if your government decides to block Twitter? There are plenty of governments doing that.



    What if you choose a heavily retweeted tweet? How much entropy would that contain?



    Having something that provides entropy is just the first step. In general you want something that is local and hard to influence and easy to understand / validate. Twitter doesn't seem to be a good option for any of those requirements.






    share|improve this answer












    How are you going to decide which tweet to use? Randomly? This quickly leads to a chicken / egg problem.



    What if the chosen tweet is one word? That would not add a lot of entropy.



    What if twitter is unavailable? Are you just stopping your service that relies on the entropy or are you going to continue regardless?



    How are you going to keep the chosen tweet secret? You can use TLS, but TLS requires a random number generator to operate.



    How are you going to blacklist in advance? You don't know the attackers in advance, right?



    What if twitter changes his API? Would you keep running if the tweet collection agent crashes or returns bad results?



    What if your government decides to block Twitter? There are plenty of governments doing that.



    What if you choose a heavily retweeted tweet? How much entropy would that contain?



    Having something that provides entropy is just the first step. In general you want something that is local and hard to influence and easy to understand / validate. Twitter doesn't seem to be a good option for any of those requirements.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered 2 days ago









    Maarten BodewesMaarten Bodewes

    53.3k677192




    53.3k677192








    • 1




      I would take slight issue with "What if the chosen tweet is one word? That would not add a lot of entropy." as the definition of entropy needs to be considered in terms of the entire selection process, not the end result. The empty string '' returned amongst other unique possibilities each with true randomness $p = 2^{-1000}$ would be part of an entropy source 1000 bits strong. That may be a separate issue to accidentally loading a string that others will test against - i.e. the assumed generation model that an attacker might use.
      – Neil Slater
      yesterday










    • @NeilSlater That's a good point; however unless you're actually enumerating all possibilities in the domain I'm guessing that you'd still be left with a small seed.
      – Maarten Bodewes
      yesterday










    • The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. As far as selecting a tweet of a certain size or one which has not been highly retweeted, that should be easy, right? And for govt banning twitter, I didn't think of that, but honestly, it doesn't have to be Twitter. It can be any user generated content which is hard to predict. And I wanted to know if that is good enough. Twitter was just an example.
      – aa8y
      yesterday










    • Yeah, OK, I can see you want to generalize this idea. But you'd still have boot problems, availablity problems, problems with the secrecy - actually most of the list. As choosing just the latest tweet: that's so much vulnerable to being influenced by an adversary that I don't even want to think about it.
      – Maarten Bodewes
      yesterday














    • 1




      I would take slight issue with "What if the chosen tweet is one word? That would not add a lot of entropy." as the definition of entropy needs to be considered in terms of the entire selection process, not the end result. The empty string '' returned amongst other unique possibilities each with true randomness $p = 2^{-1000}$ would be part of an entropy source 1000 bits strong. That may be a separate issue to accidentally loading a string that others will test against - i.e. the assumed generation model that an attacker might use.
      – Neil Slater
      yesterday










    • @NeilSlater That's a good point; however unless you're actually enumerating all possibilities in the domain I'm guessing that you'd still be left with a small seed.
      – Maarten Bodewes
      yesterday










    • The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. As far as selecting a tweet of a certain size or one which has not been highly retweeted, that should be easy, right? And for govt banning twitter, I didn't think of that, but honestly, it doesn't have to be Twitter. It can be any user generated content which is hard to predict. And I wanted to know if that is good enough. Twitter was just an example.
      – aa8y
      yesterday










    • Yeah, OK, I can see you want to generalize this idea. But you'd still have boot problems, availablity problems, problems with the secrecy - actually most of the list. As choosing just the latest tweet: that's so much vulnerable to being influenced by an adversary that I don't even want to think about it.
      – Maarten Bodewes
      yesterday








    1




    1




    I would take slight issue with "What if the chosen tweet is one word? That would not add a lot of entropy." as the definition of entropy needs to be considered in terms of the entire selection process, not the end result. The empty string '' returned amongst other unique possibilities each with true randomness $p = 2^{-1000}$ would be part of an entropy source 1000 bits strong. That may be a separate issue to accidentally loading a string that others will test against - i.e. the assumed generation model that an attacker might use.
    – Neil Slater
    yesterday




    I would take slight issue with "What if the chosen tweet is one word? That would not add a lot of entropy." as the definition of entropy needs to be considered in terms of the entire selection process, not the end result. The empty string '' returned amongst other unique possibilities each with true randomness $p = 2^{-1000}$ would be part of an entropy source 1000 bits strong. That may be a separate issue to accidentally loading a string that others will test against - i.e. the assumed generation model that an attacker might use.
    – Neil Slater
    yesterday












    @NeilSlater That's a good point; however unless you're actually enumerating all possibilities in the domain I'm guessing that you'd still be left with a small seed.
    – Maarten Bodewes
    yesterday




    @NeilSlater That's a good point; however unless you're actually enumerating all possibilities in the domain I'm guessing that you'd still be left with a small seed.
    – Maarten Bodewes
    yesterday












    The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. As far as selecting a tweet of a certain size or one which has not been highly retweeted, that should be easy, right? And for govt banning twitter, I didn't think of that, but honestly, it doesn't have to be Twitter. It can be any user generated content which is hard to predict. And I wanted to know if that is good enough. Twitter was just an example.
    – aa8y
    yesterday




    The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. As far as selecting a tweet of a certain size or one which has not been highly retweeted, that should be easy, right? And for govt banning twitter, I didn't think of that, but honestly, it doesn't have to be Twitter. It can be any user generated content which is hard to predict. And I wanted to know if that is good enough. Twitter was just an example.
    – aa8y
    yesterday












    Yeah, OK, I can see you want to generalize this idea. But you'd still have boot problems, availablity problems, problems with the secrecy - actually most of the list. As choosing just the latest tweet: that's so much vulnerable to being influenced by an adversary that I don't even want to think about it.
    – Maarten Bodewes
    yesterday




    Yeah, OK, I can see you want to generalize this idea. But you'd still have boot problems, availablity problems, problems with the secrecy - actually most of the list. As choosing just the latest tweet: that's so much vulnerable to being influenced by an adversary that I don't even want to think about it.
    – Maarten Bodewes
    yesterday











    3














    Other answers have already pointed out the chicken/egg catch22 problem of securely communicating over the Internet before you have a random number, and other showstoppers and possible problems. But you're screwen even against a fully-remote attacker that can't sniff your packets.




    The OP commented:

    The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. [...]




    Tweets are public, and thus your pool of seeds is available to the attacker.



    On average, Tweet throughput is around 6000 tweets per second (source). An attacker that can guess your tweet-query time within one second has a search space of about 6000 tweets. You could say that's equivalent to 12.5 bits of entropy, vastly smaller than the hash length. Or an attacker can widen the window to 1 minute for an equivalent entropy of 18.4 bits, still trivial to brute force in seconds, probably only limited by the time to download all those tweets.



    If an attacker controls or knows when a seed was generated, you're screwed. The tighter a time bound they can put on it, the smaller their search space. Even worse, the attacker can simply keep widening their time window with earlier and earlier tweets if they don't find a hit in the first 1-second window they check.



    Many use-cases for secure seeding of PRNGs expose the sequence to the attacker so they can test guesses of the seed. Try them with the same PRNG your software uses, and check whether the resulting sequence matches what they've already seen. Then, with high probability, they can predict the next number they'll see.



    There can be false-positive matches that lead to the same initial sequence, for multiple reasons:




    • They can only see (or work backwards to) rng() & 0xff (low 8 bits) or rng() % 100 (or some better way of generating a 0..99 range), not the full 32 or 64-bit random number value of each PRNG step.

    • The PRNG has a large hidden internal state, and multiple initial states lead to the same sequence of random numbers. (This is already necessary so that knowing one rng result doesn't uniquely determine the next.)


    But by observing enough random data from the same seed, an attack can test a seed to a very high probability.



    With only 6000 possible candidates, the chances of one giving the same initial sequence you observed but actually being different is negligible.



    And if you test them all over a likely window (and are right about that time window), you can detect when you've uniquely identified the one tweet that produces the sequence you're seeing, so you can potentially "lock on" quite quickly even if you don't get many bits of data per observation of the sequence.





    If the random number was used as an encryption key, an attacker that can detect "sane looking" plaintext can still attack this way, even if the "sane-looking" check is very weak / inclusive.




    • Check which (of the ~6000) tweets as seeds lead to sane-looking plaintext from the first key.

    • Of those few candidate tweets, check which produce sane-looking plaintext from the second key generated from the same sequence. If there were multiple different possibly-sane plaintexts from the first key, this probably rules out most of them. Repeat as necessary.


    This might not be the most plausible example, but this kind of idea is applicable for other kinds of things where you don't directly see the random sequence, only a cryptographically-secure use of it. But if you have any mechanism for testing a guess by going through all the steps the target of the attack would take, you can still attack.



    Or if you can trigger a re-seed at some known time, and use the service with your own known data to get (probably) some of the first random values generated with that seed, you might be able to work out the seed that it will continue to use for other users' requests.



    Only 6000 tweets is a small enough search space that you can start to expand your search space in other dimensions, like allowing for the possibility that other users' requests might have slipped in between yours while you're using it as an oracle to encrypt known plaintext that lets you check. (Or some equivalent thing that lets you really check your PRNG sequence guesses.)






    share|improve this answer








    New contributor




    Peter Cordes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.























      3














      Other answers have already pointed out the chicken/egg catch22 problem of securely communicating over the Internet before you have a random number, and other showstoppers and possible problems. But you're screwen even against a fully-remote attacker that can't sniff your packets.




      The OP commented:

      The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. [...]




      Tweets are public, and thus your pool of seeds is available to the attacker.



      On average, Tweet throughput is around 6000 tweets per second (source). An attacker that can guess your tweet-query time within one second has a search space of about 6000 tweets. You could say that's equivalent to 12.5 bits of entropy, vastly smaller than the hash length. Or an attacker can widen the window to 1 minute for an equivalent entropy of 18.4 bits, still trivial to brute force in seconds, probably only limited by the time to download all those tweets.



      If an attacker controls or knows when a seed was generated, you're screwed. The tighter a time bound they can put on it, the smaller their search space. Even worse, the attacker can simply keep widening their time window with earlier and earlier tweets if they don't find a hit in the first 1-second window they check.



      Many use-cases for secure seeding of PRNGs expose the sequence to the attacker so they can test guesses of the seed. Try them with the same PRNG your software uses, and check whether the resulting sequence matches what they've already seen. Then, with high probability, they can predict the next number they'll see.



      There can be false-positive matches that lead to the same initial sequence, for multiple reasons:




      • They can only see (or work backwards to) rng() & 0xff (low 8 bits) or rng() % 100 (or some better way of generating a 0..99 range), not the full 32 or 64-bit random number value of each PRNG step.

      • The PRNG has a large hidden internal state, and multiple initial states lead to the same sequence of random numbers. (This is already necessary so that knowing one rng result doesn't uniquely determine the next.)


      But by observing enough random data from the same seed, an attack can test a seed to a very high probability.



      With only 6000 possible candidates, the chances of one giving the same initial sequence you observed but actually being different is negligible.



      And if you test them all over a likely window (and are right about that time window), you can detect when you've uniquely identified the one tweet that produces the sequence you're seeing, so you can potentially "lock on" quite quickly even if you don't get many bits of data per observation of the sequence.





      If the random number was used as an encryption key, an attacker that can detect "sane looking" plaintext can still attack this way, even if the "sane-looking" check is very weak / inclusive.




      • Check which (of the ~6000) tweets as seeds lead to sane-looking plaintext from the first key.

      • Of those few candidate tweets, check which produce sane-looking plaintext from the second key generated from the same sequence. If there were multiple different possibly-sane plaintexts from the first key, this probably rules out most of them. Repeat as necessary.


      This might not be the most plausible example, but this kind of idea is applicable for other kinds of things where you don't directly see the random sequence, only a cryptographically-secure use of it. But if you have any mechanism for testing a guess by going through all the steps the target of the attack would take, you can still attack.



      Or if you can trigger a re-seed at some known time, and use the service with your own known data to get (probably) some of the first random values generated with that seed, you might be able to work out the seed that it will continue to use for other users' requests.



      Only 6000 tweets is a small enough search space that you can start to expand your search space in other dimensions, like allowing for the possibility that other users' requests might have slipped in between yours while you're using it as an oracle to encrypt known plaintext that lets you check. (Or some equivalent thing that lets you really check your PRNG sequence guesses.)






      share|improve this answer








      New contributor




      Peter Cordes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
      Check out our Code of Conduct.





















        3












        3








        3






        Other answers have already pointed out the chicken/egg catch22 problem of securely communicating over the Internet before you have a random number, and other showstoppers and possible problems. But you're screwen even against a fully-remote attacker that can't sniff your packets.




        The OP commented:

        The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. [...]




        Tweets are public, and thus your pool of seeds is available to the attacker.



        On average, Tweet throughput is around 6000 tweets per second (source). An attacker that can guess your tweet-query time within one second has a search space of about 6000 tweets. You could say that's equivalent to 12.5 bits of entropy, vastly smaller than the hash length. Or an attacker can widen the window to 1 minute for an equivalent entropy of 18.4 bits, still trivial to brute force in seconds, probably only limited by the time to download all those tweets.



        If an attacker controls or knows when a seed was generated, you're screwed. The tighter a time bound they can put on it, the smaller their search space. Even worse, the attacker can simply keep widening their time window with earlier and earlier tweets if they don't find a hit in the first 1-second window they check.



        Many use-cases for secure seeding of PRNGs expose the sequence to the attacker so they can test guesses of the seed. Try them with the same PRNG your software uses, and check whether the resulting sequence matches what they've already seen. Then, with high probability, they can predict the next number they'll see.



        There can be false-positive matches that lead to the same initial sequence, for multiple reasons:




        • They can only see (or work backwards to) rng() & 0xff (low 8 bits) or rng() % 100 (or some better way of generating a 0..99 range), not the full 32 or 64-bit random number value of each PRNG step.

        • The PRNG has a large hidden internal state, and multiple initial states lead to the same sequence of random numbers. (This is already necessary so that knowing one rng result doesn't uniquely determine the next.)


        But by observing enough random data from the same seed, an attack can test a seed to a very high probability.



        With only 6000 possible candidates, the chances of one giving the same initial sequence you observed but actually being different is negligible.



        And if you test them all over a likely window (and are right about that time window), you can detect when you've uniquely identified the one tweet that produces the sequence you're seeing, so you can potentially "lock on" quite quickly even if you don't get many bits of data per observation of the sequence.





        If the random number was used as an encryption key, an attacker that can detect "sane looking" plaintext can still attack this way, even if the "sane-looking" check is very weak / inclusive.




        • Check which (of the ~6000) tweets as seeds lead to sane-looking plaintext from the first key.

        • Of those few candidate tweets, check which produce sane-looking plaintext from the second key generated from the same sequence. If there were multiple different possibly-sane plaintexts from the first key, this probably rules out most of them. Repeat as necessary.


        This might not be the most plausible example, but this kind of idea is applicable for other kinds of things where you don't directly see the random sequence, only a cryptographically-secure use of it. But if you have any mechanism for testing a guess by going through all the steps the target of the attack would take, you can still attack.



        Or if you can trigger a re-seed at some known time, and use the service with your own known data to get (probably) some of the first random values generated with that seed, you might be able to work out the seed that it will continue to use for other users' requests.



        Only 6000 tweets is a small enough search space that you can start to expand your search space in other dimensions, like allowing for the possibility that other users' requests might have slipped in between yours while you're using it as an oracle to encrypt known plaintext that lets you check. (Or some equivalent thing that lets you really check your PRNG sequence guesses.)






        share|improve this answer








        New contributor




        Peter Cordes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        Other answers have already pointed out the chicken/egg catch22 problem of securely communicating over the Internet before you have a random number, and other showstoppers and possible problems. But you're screwen even against a fully-remote attacker that can't sniff your packets.




        The OP commented:

        The idea was to select the first tweet we see at the time we want to seed the random number generator to avoid the need for selecting the tweet at random. [...]




        Tweets are public, and thus your pool of seeds is available to the attacker.



        On average, Tweet throughput is around 6000 tweets per second (source). An attacker that can guess your tweet-query time within one second has a search space of about 6000 tweets. You could say that's equivalent to 12.5 bits of entropy, vastly smaller than the hash length. Or an attacker can widen the window to 1 minute for an equivalent entropy of 18.4 bits, still trivial to brute force in seconds, probably only limited by the time to download all those tweets.



        If an attacker controls or knows when a seed was generated, you're screwed. The tighter a time bound they can put on it, the smaller their search space. Even worse, the attacker can simply keep widening their time window with earlier and earlier tweets if they don't find a hit in the first 1-second window they check.



        Many use-cases for secure seeding of PRNGs expose the sequence to the attacker so they can test guesses of the seed. Try them with the same PRNG your software uses, and check whether the resulting sequence matches what they've already seen. Then, with high probability, they can predict the next number they'll see.



        There can be false-positive matches that lead to the same initial sequence, for multiple reasons:




        • They can only see (or work backwards to) rng() & 0xff (low 8 bits) or rng() % 100 (or some better way of generating a 0..99 range), not the full 32 or 64-bit random number value of each PRNG step.

        • The PRNG has a large hidden internal state, and multiple initial states lead to the same sequence of random numbers. (This is already necessary so that knowing one rng result doesn't uniquely determine the next.)


        But by observing enough random data from the same seed, an attack can test a seed to a very high probability.



        With only 6000 possible candidates, the chances of one giving the same initial sequence you observed but actually being different is negligible.



        And if you test them all over a likely window (and are right about that time window), you can detect when you've uniquely identified the one tweet that produces the sequence you're seeing, so you can potentially "lock on" quite quickly even if you don't get many bits of data per observation of the sequence.





        If the random number was used as an encryption key, an attacker that can detect "sane looking" plaintext can still attack this way, even if the "sane-looking" check is very weak / inclusive.




        • Check which (of the ~6000) tweets as seeds lead to sane-looking plaintext from the first key.

        • Of those few candidate tweets, check which produce sane-looking plaintext from the second key generated from the same sequence. If there were multiple different possibly-sane plaintexts from the first key, this probably rules out most of them. Repeat as necessary.


        This might not be the most plausible example, but this kind of idea is applicable for other kinds of things where you don't directly see the random sequence, only a cryptographically-secure use of it. But if you have any mechanism for testing a guess by going through all the steps the target of the attack would take, you can still attack.



        Or if you can trigger a re-seed at some known time, and use the service with your own known data to get (probably) some of the first random values generated with that seed, you might be able to work out the seed that it will continue to use for other users' requests.



        Only 6000 tweets is a small enough search space that you can start to expand your search space in other dimensions, like allowing for the possibility that other users' requests might have slipped in between yours while you're using it as an oracle to encrypt known plaintext that lets you check. (Or some equivalent thing that lets you really check your PRNG sequence guesses.)







        share|improve this answer








        New contributor




        Peter Cordes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        share|improve this answer



        share|improve this answer






        New contributor




        Peter Cordes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.









        answered yesterday









        Peter CordesPeter Cordes

        1314




        1314




        New contributor




        Peter Cordes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.





        New contributor





        Peter Cordes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






        Peter Cordes is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
        Check out our Code of Conduct.






















            aa8y is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            aa8y is a new contributor. Be nice, and check out our Code of Conduct.













            aa8y is a new contributor. Be nice, and check out our Code of Conduct.












            aa8y is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Cryptography Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcrypto.stackexchange.com%2fquestions%2f66289%2fusing-tweets-as-a-random-seed%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Mario Kart Wii

            What does “Dominus providebit” mean?

            Antonio Litta Visconti Arese