Updated answer in response to bounty: SeeIs that your final answer?at the end, and other changes – basically answer is significantly rewritten.

To break your problem down in to requirements:

- you need a set of random numbers
- the numbers need to be unique
- the order of the returned numbers needs to be random

Your current code indicates that the range of random numbers is specified by `Random.Next()`

, which returns values in the `[0 .. Int32.MaxValue)`

range (note, it excludes `Int32.MaxValue`

). This is significant for the purpose of this question, because other answers have assumed that the range is configurable, and ‘small’.

If the range should be configurable, then the recommended algorithm would potentially be much larger.

Based on those assumptions, let’s do a code review…

# Code Style

### do … while

The most glaring problems here are the un-braced `do-while`

loop. You already knew it, but this code is ugly:

`do number = random.Next(); while (randomNumbers.Contains(number));`

It really should be braced:

```
do
{
number = random.Next();
} while (randomNumbers.Contains(number));
```

This makes the statement clear, and significantly reduces confusion. Always use braces for 1-liners.

### List Construction

The List class allows an initial capacity to be used. Since the capacity needs to just be `count`

, it makes sense to initialize the list with this capacity:

```
List<int> randomNumbers = new List<int>(count);
```

# Current Algorithm

This is where the most interesting observations can be made. Let’s analyze your current algorithm:

- Create a container for the results
- repeat until you have selected N values:
- Select a random value
- check if it has been previously selected
- if it is ‘new’, then add it to the container

This algorithm will produce random values, in a random order, and with good random characteristics (no skews, biases, gaps, etc.).

In other words, your results are good.

The problem is with performance….

There are two performance concerns here, one small, the other large:

- the do-while loop to avoid collisions
- the List container

## do-while performance

The do-while has a very low impact on performance… almost negligible. This is hotly debated, but, the reality is that you would need a very, very large `count`

before this becomes a problem. The reasoning is as follows:

Collisions happen when the random value was previously selected. For the specified range of `[0 .. Int32.MaxValue)`

, you would need a very large `count`

before collisions actually happened. For example, `count`

would have to be about 65,000 before there was better than a 50% chance that there was even a single collision.

In a general sense, given a Range of $N$, select $M$ numbers. If $M < sqrt{N}$ then the probability of a single collision is < 50%. Since the Range is very large, the probability is small.

Obviously, if the range was small, then the probabilities would be significantly affected. But the range is fixed at `Int32.MaxValue`

, so that’s OK.

Additionally, if the `count`

was large, then the probabilities would also be affected. How large would be very large? Well, you would be running in to very large arrays before you run in to significant problems….. I would venture you are hitting close to $frac{Int32.MaxValue}{2}$ before you run in to a significant issue with performance.

### List performance

This is without doubt your largest concern. You use the `randomNumbers.Contains(number)`

call to determine whether a value was previously selected. This requires a scan of all previously-selected values to determine. As mentioned, this will almost always return false, and will thus have to scan the entire list.

As the `count`

value increases, the length of time to perform the `Contains`

will increase at an quadratic rate, $O(n^2)$ where `n`

is `count`

.

This performance problem will become critical much sooner than the random-collision problem.

# Putting it together

The problem you have in your code is that you are trying to do too much at once, you are using a List because that is your return value, when a HashSet would be better. If you break the problem down in to stages, you will be able to solve things more elegantly.

If you add a duplicate value to a HashSet, it does not grow, and the operation performance is not dependent on the amount of data in the HashSet (it is $O(1)$). You can use the `Count`

of the HashSet to manage the data uniqueness.

Once you have a clean set of unique random numbers, you can dump them in to a List, then shuffle the list using an efficient shuffle.

Combining these data structures, in the right way, leads to an overall $O(n)$ solution, which will scale fairly well.

Here is some code, which **works in Ideone too**. Note, my C# is weak, so I have tried to make the logic clear.

```
using System;
using System.Collections.Generic;
public class Test
{
static Random random = new Random();
public static List<int> GenerateRandom(int count)
{
// generate count random values.
HashSet<int> candidates = new HashSet<int>();
while (candidates.Count < count)
{
// May strike a duplicate.
candidates.Add(random.Next());
}
// load them in to a list.
List<int> result = new List<int>();
result.AddRange(candidates);
// shuffle the results:
int i = result.Count;
while (i > 1)
{
i--;
int k = random.Next(i + 1);
int value = result[k];
result[k] = result[i];
result[i] = value;
}
return result;
}
public static void Main()
{
List<int> vals = GenerateRandom(10);
Console.WriteLine("Result: " + vals.Count);
vals.ForEach(Console.WriteLine);
}
}
```

The above code is my initial recommendation, and it will work well, and scale well for any reasonable number of values to return.

# Second Alternate Algorithm

The problem with the above algorithm is threefold:

- When count is very large, the probability of collision is increased, and performance may be affected
- Data will need to be in both the HashSet and the List at some point, so the space usage is doubled.
- The shuffle at the end is needed to keep the data in a random order (HashSet does not keep the data in any specific order, and the hashing algorithm will cause the order to become biased, and skewed).

These are only performance issues when the count is very large. Note that only the collisions at large count will impact the scalability of the solution (at large count it is no longer quite $O(n)$, and it will be come progressively worse when count approaches `Int32.MaxValue`

. Note that in real life this will not likely ever happen…. and memory will become a problem before performance does.

@JerryCoffin pointed to an alternate algorithm from Bob Floyd, where a trick is played to ensure that collisions never happen. This solves the problem of scalability at very large counts. It does not solve the need for both a HashSet and a List, and it does not solve the need for the shuffle.

The algorithm as presented:

`initialize set S to empty for J := N-M + 1 to N do T := RandInt(1, J) if T is not in S then insert T in S else insert J in S`

assumes that `RandInt(1, J)`

returns values ** inclusive** of J.

To understand the above algorithm, you need to realize that you choose a random value from a range that is smaller than the full range, and then after each value, you extend that to include one more. In the event of a collision, you can safely insert the max because it was never possible to include it before.

The chances of a collision increase at the same rate that the number of values decreases, so the probability of any one number being in the result is not skewed, or biased.

# Is this almost a final answer? No

So, using the above solution, in C#, would look like **(in Ideone)** (note, code is now different to Ideone):

```
public static List<int> GenerateRandom(int count)
{
// generate count random values.
HashSet<int> candidates = new HashSet<int>();
for (Int32 top = Int32.MaxValue - count; top < Int32.MaxValue; top++)
{
Console.WriteLine(top);
// May strike a duplicate.
if (!candidates.Add(random.Next(top + 1)))
{
candidates.Add(top);
}
}
// load them in to a list.
List<int> result = candidates.ToList();
// shuffle the results:
int i = result.Count;
while (i > 1)
{
i--;
int k = random.Next(i + 1);
int value = result[k];
result[k] = result[i];
result[i] = value;
}
return result;
}
```

Note that you need to shuffle the results still, so that the HashSet issue is resolved. Also note the need to do the fancy loop-condition `top > 0`

because at overflow time, things get messy.

# Can you avoid the shuffle?

So, this solves the need to do the collision loop, but, what about the shuffle. Can that be solved by maintaining the HashSet and the List **at the same time**. ** No!** Consider

**this function(in Ideone too)**:

```
public static List<int> GenerateRandom(int count)
{
List<int> result = new List<int>(count);
// generate count random values.
HashSet<int> candidates = new HashSet<int>();
for (Int32 top = Int32.MaxValue - count; top < Int32.MaxValue; top++)
{
// May strike a duplicate.
int value = random.Next(top + 1);
if (candidates.Add(value))
{
result.Add(value);
}
else
{
result.Add(top);
candidates.Add(top);
}
}
return result;
}
```

The above answer is never going to allow the first value in the result to have any of the `Max - Count`

to `Max`

values (because there will never be a collision on the first value, and the range is not complete at that point), and this is thus a broken random generator.

**Even with this collision-avoiding algorithm, you still need to shuffle the results in order to ensure a clean bias on the numbers.**

*TL;DR*

# Is This Your Final Answer? Yes!

Having played with this code a lot, it is apparent that it is useful to have a range-based input, as well as a Int32.MaxValue system. Messing with large ranges leads to potential overflows in the 32-bit integer space as well.

Working with @mjolka, the following code would be the best of both worlds:

```
static Random random = new Random();
// Note, max is exclusive here!
public static List<int> GenerateRandom(int count, int min, int max)
{
// initialize set S to empty
// for J := N-M + 1 to N do
// T := RandInt(1, J)
// if T is not in S then
// insert T in S
// else
// insert J in S
//
// adapted for C# which does not have an inclusive Next(..)
// and to make it from configurable range not just 1.
if (max <= min || count < 0 ||
// max - min > 0 required to avoid overflow
(count > max - min && max - min > 0))
{
// need to use 64-bit to support big ranges (negative min, positive max)
throw new ArgumentOutOfRangeException("Range " + min + " to " + max +
" (" + ((Int64)max - (Int64)min) + " values), or count " + count + " is illegal");
}
// generate count random values.
HashSet<int> candidates = new HashSet<int>();
// start count values before max, and end at max
for (int top = max - count; top < max; top++)
{
// May strike a duplicate.
// Need to add +1 to make inclusive generator
// +1 is safe even for MaxVal max value because top < max
if (!candidates.Add(random.Next(min, top + 1))) {
// collision, add inclusive max.
// which could not possibly have been added before.
candidates.Add(top);
}
}
// load them in to a list, to sort
List<int> result = candidates.ToList();
// shuffle the results because HashSet has messed
// with the order, and the algorithm does not produce
// random-ordered results (e.g. max-1 will never be the first value)
for (int i = result.Count - 1; i > 0; i--)
{
int k = random.Next(i + 1);
int tmp = result[k];
result[k] = result[i];
result[i] = tmp;
}
return result;
}
public static List<int> GenerateRandom(int count)
{
return GenerateRandom(count, 0, Int32.MaxValue);
}
```

Yes, there definitely is.

You generate a collection of elements, mash it around and start pulling items out of it. A quick oneliner would be:

```
Enumerable.Range(0,100).OrderBy(x => Guid.NewGuid()).Take(20);
```

or alternatively

```
Enumerable.Range(0,100).OrderBy(x => random.Next()).Take(20);
```

This will give you 20 unique random values from the 0 to 100 range.

The difference with your approach is that you have a worst-case scenario of infinity: what if you are reaaaaally unlucky and end up with the same value constantly? You’ll never get your required amount of random values.

**My approach on the other hand will first generate 100 values and then take a subset from it which you can criticize on its memory impact.** Considering you used `random.Next()`

, which uses half the integer range, you should in fact be wary of this since it will have a huge memory impact.

It also depends on your specific situation: if you have a very large pool of values (1.000.000) and you need 2 random values then your approach will be much better. But when you need 999.999 values from that same pool, my approach will ~~be much better~~ still be debatable.

It will take some time to generate those last values; a *lot* as you can test for yourself with this:

```
void Main()
{
var random = new Random();
var times = new TimeSpan[512];
var values = new bool[512];
var sw = new Stopwatch();
for(int i = 0; i < times.Length; i++)
{
sw.Restart();
while(true) {
int rand = random.Next();
if(rand > 7894500 && rand < 7894512)
{
int index = rand - 7894500;
if(!values[index])
{
values[index] = true;
break;
}
}
}
sw.Stop();
times[i] = sw.Elapsed;
Console.WriteLine ("Elapsed time: " + sw.Elapsed);
}
var orderedTime = times.OrderBy(x => x);
for(int i = 0; i < 512; i++)
{
Console.WriteLine (orderedTime.ElementAt(i));
}
}
```

It will keep looping randomly 512 times through your list of values and consider the element found once it finds the (randomly picked out by myself) values between `7894500`

and `7894512`

. Afterwards this value is considered visited to correctly mimic reality (in an earlier version all 512 turns had 512 values available to them). When you execute this you’ll notice it takes a lot of time to find the last values (sometimes it’s fast and it takes 39 ms, other times it takes over a minute). Evidently it’s fast at the start and slow at the end.

Compare that to the memory overhead of my approach which will first allocate 32 million integers, guids, padding, object overhead, etc and you’re out of a big chunk of memory.

You might be able to improve it by using a more “real” shuffling algorithm which doesn’t have the object and guid overhead.

Ultimately in an extreme situation where you need 32 million unique values in a randomized order out of a total population of 32 million + 1, you will either have to accept a big memory overhead or a big execution time overhead.

One last edit before this topic can be laid to rest from my part: I talked about it with @rolfl in chat and we have come to the conclusion that either of our solutions have a usage but it depends on what your situation is exactly. Summarized it would come to this:

If you have a big range of values (like 0 to int.MaxValue) then my solution will **eat** any memory your PC has. You’re looking at two collections with each 2.1 billion integers which you then take a slice from.

In my solution you first allocate this entire population, then you sort on it (different datastructure) and then you take some of it. If this “some” is not close to 2.1 billion you will have made a huge cost of allocating data without using it.

How does this compare to @rolfl’s answer? He basically allocates the data as it is needed: if he needs 32 million values then he will only allocate those 32 million (x2) and not more. If he needs 2.1 billion then he’ll end up with a memory footprint like I have but that’s an uncommon worst case scenario while for me it is standard behaviour.

The main disadvantage of his approach is that when the amount of values you want reaches the total population, it will become increasingly harder to get those last values because there will be many collisions. Yet again, this will only be a problem when the population is actually really big.

So when should you use my solution? Like most things, there is a tradeoff between readability and performance. If you have a relatively small population and a large dataset then the readability weighs up against the performance impact, in my opinion. And when the population is relatively small and the amount of values we want is near that, my solution will have a memory impact similar to that of the other approach but it will also avoid many collisions.

Take your pick depending on what your situation is.

Instead of using a `List<int>`

, you should use an `HashSet<int>`

. The `HashSet<>`

prohibites multiple identical values. And the `Add`

method returns a `bool`

that indicates if the element was added to the list, this way you could change this code :

```
public static List<int> GetRandomNumbers(int count)
{
List<int> randomNumbers = new List<int>();
for (int i=0; i<count; i++)
{
int number;
do number = random.Next();
while (randomNumbers.Contains(number));
randomNumbers.Add(number);
}
return randomNumbers;
}
```

to this :

```
public static IEnumerable<int> GetRandomNumbers(int count)
{
HashSet<int> randomNumbers = new HashSet<int>();
for (int i = 0; i < count; i++)
while (!randomNumbers.Add(random.Next()));
return randomNumbers;
}
```

Note that I changed the return value from `List<int>`

to `IEnumerable<int>`

, if you don’t plan to add/remove values from your list, you should return `IEnumerable<int>`

, if you plan to do it, return `ICollection<int>`

. You shouldn’t return a `List<int>`

because it is an implementation detail and that the `List<>`

isn’t made to be extensible.