Human RNG: An Impractical Idea | Blog

Robonaut machines dexterous humanoid 39644

Human RNG: An Impractical Idea

Published July 07, 2016

OK, this is a silly idea, but I was curious. Can people generate random numbers? Specifically, I had a few questions in mind:

Can people generate seemingly random numbers?
Do people generate lower numbers more often?
Do people like or dislike certain numbers?

As an experiment, I decided to test this question with Amazon's Mechanical Turk (MTurk) service last weekend. MTurk allows you to post cheap, one-off jobs for a small fee. So I came up with a simple, clearly stated task to post:

A few notes before we get started:

I chose to initially limit answers to integers between 0 and 100 (inclusive) for simplicity. Further testing with answer ranges would be an interesting experiment.
I used a sample size of 100 people for all tests primarily due to costs. I'm curious about this, but not that curious.
MTurk workers can only respond to a given job once, so the same person can't copy their answer multiple times. This should help keep the data pseudo-random.
I had no protection against a bot answering the question on MTurk, but let's just assume there are no bots looking for questions like this on MTurk yet.

In my initial test, I paid a rate of $0.01 per answer. Amazon charges another cent on top of the 1-cent fee, so that brought the total cost per random number generated to $0.02. We're not off to a great start scalability-wise...

An hour and 39 minutes later, I had 100 responses.

Before posting the results though, let's see what random numbers from 0 to 100 generated from Javascript look like. Using this function:

function getRandomInteger() {
  return Math.floor(Math.random() * (100 + 1));
}

we'll get the following distribution (from a previous execution):

From a cursory glance, the computer's random number generation seems pretty random. It happened to choose the number 12 a bit more than anything else and a number of values are untouched, but that's not unexpected with a small sample size.

Now for the Mechanical Turk results:

These results don't look a whole lot different really, but there are a couple exceptions.

While number responses are spread out, responses seem a bit skewed towards numbers in the lower half of the range. With this sample size, it's pretty inconclusive though. For a random number set, It's still kind of usable though!

While playing around with Mechanical Turk on this set, I realized you could actually create tasks with no payment reward (though Amazon still charges a $0.01 fee per response). Neat. That cuts our cost in half! I wonder if people will provide less random numbers for free though. Or if they'll respond at all. Let's find out. Again with no payment reward:

Interesting. Again there appears to be a slight skew towards numbers in the lower half of our range. Without detailed analysis, the numbers still seem somewhat random.

I'll get to a bit more analysis at the end, but given the limited sample size, I thought it might be better to try using a smaller answer range for responses. I tried the same tests with numbers generated between 0 and 10 (inclusive) and sample sizes remaining at 100.

First the computer generated random number distribution:

This looks reasonable. It's not an even distribution by any means, but remember we're only using a sample size of 100.

How about the paid results from Mechanical Turk for this task?

Ok, this is starting to get interesting. There are a couple things to notice.

First, 7 is really popular. In fact, 7 is twice as favored as any other number with 28 responses, (8 being the next most common number with 14 responses).

Secondly, there appears to be a curve favoring numbers in the middle of the range. 0 and 10 each only have 1 response, though I suspect that may be a flaw of the question on MTurk. For users unsure of the task's inclusive numerical requirements, but wanting to guarantee that they meet the task's requirements, it would be reasonable not to choose either extreme.

Finally, let's look at the unpaid task results from Mechanical Turk:

This surprised me. This task shares the same general trends as the the preview paid MTurk task. Most surprisingly, people still seem to really like the number 7. Numbers 0 and 10 are unpopular again, reinforcing my suspicion that users were hesitant to choose extrema.

I've run all tests that I have planned (at least for now). Here is a summary of the results that are displayed above:

	Cost per response	Response time (average)	Mean response	Even/Odd responses	Invalid responses
JS 0-100	Free¹	15 nanoseconds²	47.29	41/59	0
MTurk 0-100 (paid)	$0.02	59 seconds	41.51	39/61	0
MTurk 0-100 (unpaid)	$0.01	3 minutes and 26 seconds	42.67	47/53	0
JS 0-10	Free¹	15 nanoseconds²	4.54	52/48	0
MTurk 0-10 (paid)	$0.02	1 minute and 37 seconds	5.89	31/59	5³
MTurk 0-10 (unpaid)	$0.01	5 minutes and 5 seconds	5.72	40/60	1³

There are some interesting numbers here. Here are some figures I noticed from the above results:

I spent a total of $6.00 obtaining responses from 400⁴ people. Mechanical Turk can really be a cheap method for getting simple feedback from people.
Humans are astronomically slower than computers (obviously). While it typically took workers on MTurk longer than a minute to discover the task and complete it, the average time spent per task as reported by Amazon, was 8 seconds. 8 seconds to read the question and submit a response is a very reasonable amount of time, but that's still 500 million times slower than the computer's 15 nanoseconds.
Humans are definitely faster to respond to paid tasks on MTurk. About 3 times faster actually. Then again, I was a little surprised people responded for free at all.
Humans generated odd numbers more often in every test. Odd numbers were 54.8% more common on average! I have no explanation here.
Humans generated the number 7 more often than any other number. 14.8% of all responses were 7! I don't have any explanation here either.
More paid responses were invalid than free responses. However, I suspect that this may be from workers that previously completed the 0-100 number generation task, as all invalid responses were between 11 and 100 (27, 28, 50, 55, 69, and 69). I was happy to see that less than 2% of all responses were invalid though.
Humans generated numbers in the lower half of the range more often when selecting between 0 and 100, while generating numbers in the upper half of the range more when selecting between 0 and 10. Even in the 0-100 tasks, responses between 6 and 10 were also more common than 0 through 5, matching the trend in the 0-10 tasks.
18 numbers were never generated by humans (14, 15, 26, 31, 32, 40, 53, 58, 60, 70, 74, 80, 90, 91, 92, 93, 98, and 100), and numbers in the 90s were less likely to be generated than any other "decade."

Conclusions:

Humans are bad at RNG, at least without knowing to randomize as best as possible. If I were to test further, I'd probably hire individual people and specifically direct them to make responses as random as possible.
Humans seem to have weird number biases about odd numbers and the number 7. A quick Google search suggests this trend is not uncommon. If there has been real research into why humans generate numbers with this bias, I'd love to know about it.
Random number generation is actually sort of scalable with humans, but extremely costly and slow when compared to computers. If I started putting thousands or tens of thousands of tasks for random numbers on Amazon, I'd fully expect workers to start writing scripts to answer them though. A more cost effective method might be to just hire individuals at minimum wage to continuously generate numbers.

Javascript RNG is considered free because costs are negligible.

The RNG Javascript function used in testing benchmarked on the computer used at approximately 15 nanoseconds per execution.

Some people entered numbers larger than 10. Those results were thrown out and replaced with real responses from someone else via MTurk's approval/rejection process.

Approximately 400 workers were tasked, though some workers may have completed multiple separate tasks.

mechanical turk

experiments