I recently read a paper called “A Sample of Brilliance” by Jon Bentley – it was published for September 1987 column of Programming Pearls. The column deals with generating random sequences if we are already given an random integer generating function (called RandInt(i, j)).

The column is small and advanced, and starts with the following implementation:

1
2
3
4
5
6
7
8
9
import random.randint as randint
def random_set(m, n):
    "Create a set of random m elements, between 1 and n"
    s = set()
    while len(s) < m:
        i = randint(1, n + 1)
        if i not in s:
            s.add(i)
    return s

While this looks like a great implementation, it kind of goes impossible when m = n, and the number is large. In such case, as the algorithm is finding the last few elements, the chances of every random element being inside the set are very high resulting in non-termination of the loop in some cases.

Floyd suggested a very nice approach: never include the last element of the range and include it only if the random generated value is repeated in the set:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
import random.randint as randint

def random_set_floyd(m, n):
    "Create a set of random m elements, between 1 and n"
    s = set()
    for i in range(n - m + 1, n + 1):
        t = randint(1, i)
        if t not in s:
            s.add(t)
        else:
            s.add(i)

    return s

The insight is that the set will always some numbers, and each new iteration is going to increase the range of numbers by 1. At any such iteration, if the random number generated is already in the set, just include the new range’s last number, which is guaranteed to be not present in the set as of the current iteration.

Based on this knowledge, we can also generate a random sequence instead of set:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import random.randint as randint

def random_sequence_floyd(m, n):
    "Create a set of random m elements, between 1 and n"
    s = []
    for i in range(n - m + 1, n + 1):
        t = randint(1, i)
        if t not in s:
            s.insert(0, t) # prepend
        else:
            # insert i before t in the sequence
            idx = s.index(t)
            s.insert(t, i)

    return s