At work we have a standard coding test that we give all of our prospective software developers to complete during their interview day. The coding test revolves around a simple card game, and one of the things candidates have to do is shuffle a deck of cards.
Shuffling a deck of cards, or more generally randomising the order of an array, is a pretty simple task and almost every candidate does it the same way. Here’s my (very simple) version, in Perl:
sub shuffle {
my @array = @_;
my $length = scalar(@array);
for (my $i = 0; $i < $length; $i++) {
my $r = int(rand() * $length);
@array[$i, $r] = @array[$r, $i];
}
return @array;
}
That's pretty much the algorithm that every candidate writes, including me when I sat the test. We call it the Knuth shuffle, because it was described by Donald Knuth in his book The Art of Computer Programming Vol 2: Seminumerical Algorithms.
So, everyone writes it like that. And everyone gets it wrong.
The first clue that the above implementation is flawed is in the numbers. Generating a random number between 1 and N and repeating that N times gives NN possible execution paths, but there are only N! ways you can order the numbers 1 to N. This is a great example of where intuition can be confusing: it feels right that "every card gets the chance to be swapped with every other card", but the numbers don't back up the gut feeling.
I prefer to think of it another way; instead of thinking of a magician shuffling a deck of cards, think of lottery balls coming out of a lottery machine. You start with N balls. One is selected at random (N possible outcomes), then another (N-1), then another (N-2), and so on. If you continue until all the balls are selected you have this many possible outcomes:
N * (N-1) * (N-2) * ... * 1
Or to put it another way:
1 * 2 * ... * N = N!
Hurrah, N! outcomes!
In other words, for each element in the array, choose a random number between that element's index and the last index in the deck, and swap the cards at those two indices. An easier way to do this in code is to iterate backwards from the last index to zero and generate a random number between zero and the current index, then swap those two indices:
sub shuffle {
my @array = @_;
my $length = scalar(@array);
for (my $i = $length - 1; $i > 0; $i--) {
my $r = int(rand() * ($i + 1));
@array[$i, $r] = @array[$r, $i];
}
return @array;
}
And that's the correct Knuth shuffle, exactly as he describes it himself:
Algorithm P (Shuffling). Let X1, X2, ..., Xt be a set of t numbers to be shuffed.
P1. [Initialize.] Set j ← t.
P2. [Generate U.] Generate a random number U, uniformly distributed between zero and one.
P3. [Exchange.] Set k ← [jU] + 1. (Now k is a random integer, between 1 and j. Exchange Xk and Xj.
P4. [Decrease j.] Decrease j by 1. If j > 1, return to step P2.
The Art of Computer Programming, Volume 2: Seminumerical Algorithms, p145. Third Edition. DE Knuth.
7 comments ↓
perldoc -q shuffle has even simpler code for the Fisher-Yates shuffle (which starts at the end and works backwards).
Thanks chromatic. What perlfaq calls the Fisher-Yates shuffle is what I call the Good Knuth shuffle – it’s identical in semantics to the “correct” implementation shown here. The version in perlfaq is more concise, and takes a reference to an array which is certainly how I’d write this function for real-world use, but I’ve tried to write my examples here to be as simple (and hence readable) as possible.
Programmer test, sure, but when would you actually use this in daily programming? Seems quite Ivory Tower to me.
I’d be more concerned about a potential programmer knowing about properly indexing database migrations or adhering to current programming standards. $0.02.
Your correct shuffle is wrong, int(rand() * $i) is always smaller than $i, so it will never leave the current element in place and so will never generate a permutation where an element stays at the same position.
It is obvious from the numbers, too: the loop goes from $length – 1 to 1, so it has $length – 1 iterations, in the first iteration there are $length – 1 possible values for $r, in the second $length – 2, and the last iteration will always swap elements 0 and 1, for a total of (N-1)! different paths through the program, which are clearly not sufficient to generate N! different permutations.
Final proof:
for my $i (1..1000000) {
my @a = (0..10);
my @b = shuffle(@a);
for $j (0..10) {
if ($b[$j] == $j) {
die “Criticism is wrong $i”;
}
}
}
(I hope your blog system can handle code).
Well spotter Florian – I’ve updated the example code accoringly. :)
Not quite, you are still only generating (N-1)! different permutations (ignoring additional limitations from the PRNG you use):
my @a = (0..10);
for my $i (1..1000000) {
my @b = shuffle(@a);
for $j (0..10) {
if ($b[0] != $a[0]) {
die “Criticism is wrong $i”;
}
}
}
Knuth’s arrays are one based, not zero based like perl’s.
I should have put $i+1 in parentheses – fixed now. The code does now generate N! different permutations.
Leave a Comment