-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: System.Linq.Shuffle()
#111221
Comments
See #78419 (comment), #73864 (comment).
|
Even if you say "you can easily implement it yourself," it is not efficient, and if someone (including me) has implemented it themselves many times, I think that is a reason to include it in the official API. I don't think it's a problem even if If "Random, not the collection, should be responsible", would that be |
Yes, the key point is how to implement the shuffle effect with randomness.
The
I agree but the similar problems aren't always the responsibility of runtime. As I mentioned above, the upper level developers/libraries can provides some encapsulation practice based on BCL. You can add the following extension method, which I think covers 99% usage scenes. public static void Shuffle<T>(this IList<T> source)
{
if (source is T[] array)
{
Random.Shared.Shuffle(array);
return;
}
if (source is List<T> list)
{
Random.Shared.Shuffle(CollectionsMarshal.AsSpan(list));
return;
}
ShuffleSlow(source);
static void ShuffleSlow(IList<T> values)
{
int n = values.Count;
var rand = Random.Shared;
for (int i = 0; i < n - 1; i++)
{
int j = rand.Next(i, n);
if (j != i)
{
(values[j], values[i]) = (values[i], values[j]);
}
}
}
} Even so, you should be realized:
So you can see, the answer depends on your use cases. You have to change the implement as you require. |
How could this be implemented without bias in cases where the length of the enumerable isn't known ahead of time or could potentially be infinite? |
To answer my own question, this is typically addressed using reservoir sampling. This necessarily introduces bias in the generated permutations, however it might be a good-enough compromise for some use cases. One potential implementation could involve implementing the classical shuffling algorithm for sources implementing |
As far as I understand, reservoir sampling is an algorithm that extracts elements partially, so it may be difficult to apply it to the current application, which requires shuffling the entire array. In my opinion, it would be implemented like Of course, there may be better ideas. |
That is my expectation as well. It would do the simple/obvious thing, basically: T[] data = source.ToArray();
Random.Shared.Shuffle(data); Which begs the question how much value adding this actually provides. |
The only benefit I can think of is avoiding copying when the source is a list. |
That would violate expectations set by other LINQ APIs like Order, OrderBy, Reverse, Except, Intersect, etc. |
In what way? |
Maybe I misunderstood your suggestion, which I thought was to mutate the source rather than copying. That said, it would still deviate from Reverse, for example, which snapshots the contents when calling GetEnumerator, such that changes to the source after that aren't reflected in the reversed data. There's a long-standing debate about that behavior, though. |
My thinking was that we could replicate the permutation algorithm for
I can see how changing |
Sure, if we believe that to be a common case, it could be optimized in combination with subsequent operations if the source is of a known length. |
I use this a little in Paint.NET. There is one location in my code where I randomize the order that tiles are rendered so as to reduce locking dependencies (spatially adjacent tiles often share an underlying cache node, which means they are tied to the same mutex). This helps rendering throughput quite a bit in this case. Here's my implementation of the standard Fisher-Yates shuffling algorithm. It may be possible to modernize some of this more, I wrote it a long time ago, but the core algorithm is dirt simple. internal static class ListAlgorithms
{
public static void FisherYatesShuffle<T, TList>(TList list, int startIndex, int length, Random random)
where TList : IList<T>
{
if (length == 0)
{
return;
}
// http://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
for (int i = startIndex + length - 1; i >= startIndex + 1; --i)
{
int j = startIndex + random.Next(i - startIndex + 1);
SwapElements<T, TList>(list, i, j);
}
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void FisherYatesShuffle<T>(IList<T> list)
{
FisherYatesShuffle(list, Random.Shared);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void FisherYatesShuffle<T>(IList<T> list, Random random)
{
FisherYatesShuffle(list, 0, list.Count, random);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void FisherYatesShuffle<T>(IList<T> list, int startIndex, int length, Random random)
{
FisherYatesShuffle<T, IList<T>>(list, startIndex, length, random);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void FisherYatesShuffle<T, TList>(TList list)
where TList : IList<T>
{
FisherYatesShuffle<T, TList>(list, Random.Shared);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static void FisherYatesShuffle<T, TList>(TList list, Random random)
where TList : IList<T>
{
FisherYatesShuffle<T, TList>(list, 0, list.Count, random);
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static void SwapElements<T, TList>(TList a, int i, int j)
where TList : IList<T>
{
if (i != j)
{
T local = a[i];
a[i] = a[j];
a[j] = local;
}
}
} |
Background and motivation
Shuffle is a universal operation, such as in games and preprocessing for machine learning.
LINQ includes APIs for changing the order, such as
OrderBy()
andReverse()
.However, there is no dedicated API for shuffling an
IEnumerable<T>
yet.(There is
System.Random.Shuffle()
forSpan<T>
, but if you want to take an arbitraryIEnumerable<T>
sequence, you'll probably need a custom implementation.)On StackOverflow, we often see examples of shuffling implemented with code like
.OrderBy(_ => Guid.NewGuid())
.However, this implementation is inefficient, and there are more efficient ways to shuffle.
Therefore, I propose to implement
System.Linq.Shuffle()
.API Proposal
API Usage
Alternative Designs
Possible considerations:
IOrderedEnumerable<T>
forOrderBy()
.Random
instance, which allows for reproducible shuffling, but requires a fixed implementation.Besides,
Shuffle()
allows for an easy (though not optimal) solution to #102229.Risks
No response
The text was updated successfully, but these errors were encountered: