Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shuffle integrity question. #630

Open
jbu4github opened this issue Feb 11, 2025 · 11 comments
Open

Shuffle integrity question. #630

jbu4github opened this issue Feb 11, 2025 · 11 comments

Comments

@jbu4github
Copy link

I suspect that something is wrong with the deck shuffle randomisation mechanic pre-game. In 9/10 games I played yesterday, I had both copies of a card I only have 2 of in the deck within the first two hands.

I want to take a look at the integrity of the algorithm used to shuffle, but I cant find it in the code, could someone point me to the file? I'm a javascript / typescript dev, not php, but I really want to test it.

@brubraz
Copy link
Collaborator

brubraz commented Feb 12, 2025

@brubraz
Copy link
Collaborator

brubraz commented Feb 12, 2025

Karabast uses the Fisher-Yates shuffle.

  • The chance of at least 1 repeated card is about 49.9%.
  • The chance of at least 2 repeated cards is around 8.12%.
  • The chance of a three-of-a-kind (triplet) is approximately 1.63%.

@jbu4github
Copy link
Author

Hello, I have spent all day simulating and analysing the shuffle as you have implemented.

Just theoretically, this modern one pass inline Fisher-Yates shuffle has a slight bias toward placing cards initially recorded as being at the bottom of the list, towards the top, and vice versa. You can imagine why this is so, as the inline swapping starts from the back, and any card which is thrust forward by a swap has another chance to further thrust forward when it is inevitably selected again.

I have replicated this in 1 million sample trials, where - on average - the card at the back appears 1 place higher than the card at the front, with a smooth gradation for all cards in-between.

The problems really compound if the initial deck list is recorded in order, where many deck lists have many cards in triples - or in some cases, 4s. I am assuming that you are reading in the list in this way, but I have not verified this in any way.

For a 51 card deck - each of 3 copies completely ordered - The probability of at least one of the first 3 triples appearing in a 6 card opening hand is 5% lower than any other triple, at 45.6%. This is a strange quirk of a single pass back to front inline swap of an ordered deck.

Probability of any of 0,1,2 in opening hand: 45.60%
Probability of any of 3,4,5 in opening hand: 45.60%
Probability of any of 6,7,8 in opening hand: 45.60%
Probability of any of 9,10,11 in opening hand: 48.30%
Probability of any of 12,13,14 in opening hand: 49.67%
Probability of any of 15,16,17 in opening hand: 49.50%
Probability of any of 18,19,20 in opening hand: 49.57%
Probability of any of 21,22,23 in opening hand: 49.66%
Probability of any of 24,25,26 in opening hand: 49.52%
Probability of any of 27,28,29 in opening hand: 49.63%
Probability of any of 30,31,32 in opening hand: 49.60%
Probability of any of 33,34,35 in opening hand: 49.59%
Probability of any of 36,37,38 in opening hand: 49.61%
Probability of any of 39,40,41 in opening hand: 49.69%
Probability of any of 42,43,44 in opening hand: 49.56%
Probability of any of 45,46,47 in opening hand: 49.54%
Probability of any of 48,49,50 in opening hand: 49.56%

Fortunately, this algorithm is extremely computationally efficient, and if you pass the shuffled list once more through the same algorithm, all of the anomalies statistically flatten out and disappear.

I therefore recommend that you implement a repeat call to this shuffle wherever it is necessary.

p.s I am also assuming that your seeding or random number generator is sound, and if it is seeded, I also recommend to use a second seed for the second shuffle, though looking at the numbers, its probably fine if you don't.

@brubraz
Copy link
Collaborator

brubraz commented Feb 12, 2025

The deck is shuffled three times using the Fisher-Yates algorithm. The seed remains unchanged during a mulligan to prevent players from reverting the turn to get a different starting hand.

https://discord.com/channels/1220057752961814568/1225597332301680640/1324583331928739870

@jbu4github
Copy link
Author

I've joined the discord, but those connection links aren't working for me, they just go to web and break.

Regarding the shuffle, that is worrying to hear, as its going to be very difficult to investigate further with my limited knowledge of a php development setup. I'm sure my TS implementation is identical to yours, and there's no other theoretical issues that I can see would reproduce what I'm seeing - just with the algorithm that is.

The only other possible culprits that I can think of are either the seed, or a general bug somewhere that those 3 subsequent shuffles aren't actually happening.

I'm 90% ish sure something is off though.

@OotTheMonk
Copy link
Collaborator

@jbu4github can you repeat your experiment and run each iteration through the algorithm 3 times, to replicate what Karabast does? Admittedly I'm surprised to hear that even with the first shuffle there's such a bias for the first ten cards. How did you do the testing? Was it with the actual shuffler that is used by Karabast, or just another implementation of Fisher-Yates?

@jbu4github
Copy link
Author

jbu4github commented Feb 16, 2025

First I implemented my own version of a shuffle (in typescript) using two arrays ensuring cards that are selected are never reselected, this shuffles perfectly first time showing an average bias of 0.07 positions over 1million trials.

https://pastebin.com/Hc46HwZv

Then I implemented karrabasts' shuffle (in typescript), as I understood the PHP code that was given to me. This uses one array, and swaps values inline, from the end of the array backwards. This shows that the item in position 0 (the start of the array) has an average position of 25.48 over 1million trials, wheras item 51 (in a 51 card deck) has an average position of 24.48, a bias of a whole 1 place lower closer toward the front. The average position of each item inbetween those forms a steady linear gradient between the two positions.

If I run the karrabast shuffled array, through the karrabast implementation again I get the same bias as my own shuffle of 0.07, 0.077, 0.069, 0.075. (each 1 million trials)

If I do it a third time, I get a bias of 0.064, 0.05, 0.049, 0.067, 0.061 (each 1 million trials) - roughly the same as two shuffles - almost negligible.

It is worth noting, that I am using the Math.random() in javascript, and I am not using a seed.

@jbu4github
Copy link
Author

jbu4github commented Feb 16, 2025

I have just implemented the seed generator, or the way that you are randomly selecting the indexes to swap according to the seed, and I can see the bias of 0.97 after 3 shuffles, I will investigate further.

Indeed it seems the bias remains even after 10 shuffles....

If the positions to swap are being generated by a fixed seed, and the seed isnt changing, its the same positions being swapped no?

@OotTheMonk
Copy link
Collaborator

Think of a seed as an index into a very long sequence of pseudorandom numbers. If you seed once and shuffle three times, the swapped positions should be different each iteration. The positions would only be the same if prior to each shuffle you're re-seeding the random number generating with the same value.

@jbu4github
Copy link
Author

jbu4github commented Feb 16, 2025

I can see how the seed is rotated each time by calling SeedRandom, if the the global $randomSeeded is set to false which it appears is always the case.

Either way, the first 3 shuffles are passing SKIPSEED, so they are using random_int, and therefore they are truely random, and not seeded. So seeding should have no bearing on the issue here.

@brubraz
Copy link
Collaborator

brubraz commented Feb 17, 2025

@jbu4github I understand, but in this case, we shouldn’t compare the results of random_int() with Math.random(). This is because Math.random() uses a Pseudo-Random Number Generator (PRNG), while random_int() uses a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG). To ensure true randomness, I recommend running tests in PHP using random_int().

As for the initial shuffle, we don’t use a seed. We shuffle three times without a seed using Fisher-Yates, ensuring an unbiased randomization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants