Start the sequence at zero and add scrambling #31

dcarrera · 2022-08-04T16:45:05Z

First of all, thank youfor writing Sobol.jl. I have been learning about Quasi Monte Carlo (QMC) recently and I'm so glad that there is a Sobol module in Julia. I want to suggest a fairly significant API change, but I hope to convince you that it's a very good idea.

1) Include the initial $(0,0,...)$ point in the Sobol sequence

The first value in the Sobol sequence is zero. This recent paper by Art Owen shows that the common practice of dropping the $(0,0,...)$ point (as done in Sobol.jl) ruins the digital net property of Sobol and makes its convergence worse. What makes Sobol a "digital net" is the fact that if you take the first $2^n$ Sobol points (in any dimension) you can partition your unit interval / square / hypercube into $2^k (k \le n)$ even slices and each slice will get its fair share $2^(n-k)$ of the points. This is the key to its value in QMC. If you skip the $(0,0,...)$ and draw $2^n$ points, the result is not a digital net. Let me illustrate:

using PyPlot

for x in 0:(1/4):1
    for y in 0:(1/4):1
        plot([x,x],[0,1],"C1--")
        plot([0,1],[y,y],"C1--")
    end
end

dim = 2
seq = SobolSeq(dim)

# Include the (0,0) point, as per Owen (2021).
plot([0],[0],"ko")

for j in 1:(N-1)
    local x,y = next!(seq)
    plot([x],[y],"C0o")
end

# The 'extra' point.
x,y = next!(seq)
plot([x],[y],"ro")

Notice how Sobol.jl leaves the bottom-left square empty and instead puts two points in another square. So that ruins the digital net, and the convergence. You can fix this by skipping the next $2^n - 1$ so that the next $2^n$ points do make a digital net. I suspect that this might be the origin of the common practice of skipping $2^n - 1$ points. This also addresses the concern that $(0,0,...)$ is a corner case that could lead to weird results. But as Owen's paper shows (Figure 1), an even better solution is to scramble the points.

2) Add support for scrambled Sobol points

Back in the 90's Owen proposed scrambling digital nets to create a hybrid method that brings the best features of MC and QMC. In Randomized QMC (RQMC) you scramble the points in a way that preserves the digital net but creates a random instance and removes the corner points like $(0,0,...)$. Figure 1 of Owen and Rudolf (2021) has a nice illustration of MC vs QMC vs RQMC.

Since Owen's proposal, several authors have proposed various ways to scramble the points and have shown empirically that RQMC has better convergence than QMC. But more importantly, RQMC recovers a feature from MC that is lost in QMC: The ability to measure uncertainty. In MC, if you draw $N$ points to compute some integral, you can repeat the experiment many times to measure the variance of that result. This is lost in QMC because the points are not random. But RQMC lets you do something similar by generating multiple scrambles.

A scrambled Sobol is already implemented in scipy.stats.qmc.Sobol and in Owen's R package (rsobol). Unfortunately I was not able to figure out either method works, so I could not translate them to Julia. It doesn't look complicated at all; I just don't understand SciPy, R, or Sobol direction vectors well enough. They don't seem to be doing exactly the same thing, but they both seem to be doing something short and clever with the direction vectors.

I should mention that the R code from Owen is BSD-licensed, so if you can read R, you can just take his method and convert it to Julia.

Notice that the SciPy and R implementations both default to scrambled points and you have to explicitly ask for unscrambled points.

3) Don't produce Sobol points one at a time, draw $2^n$ at a time

At this point I hope I have convinced you that drawing anything other than a power of 2 worth of Sobol points ruins the digital net, and misses what makes Sobol powerful. In Owen's article he says

Using 1000 points of a Sobol’ sequence may well be less accurate than using 512.

The SciPy API strongly encourages you to draw $2^n$ points, and while they allow you to draw a different number, they give you a big warning to not do that. Owen's R code goes even farther --- it doesn't have an option to draw anything but a power of two. The input parameters are the exponent and the dimensionality. So I think Sobol.jl should do the same thing. Ditch the next!() method and instead draw a power of two.

I realize that this was a long request and that I'm suggesting breaking the API, but I hope I have made a convincing case that this would turn Sobol.jl into a very powerful tool for anyone computing high-dimensional integrals with RQMC methods. Not coincidentally, this is precisely what I am hoping to do with it.

Let me know what you think.

References:

The text was updated successfully, but these errors were encountered:

dmetivie · 2022-11-07T15:39:23Z

I agree with your points 1) and 3). I am not sure why a lot of package do like that. It feels like the literature (including a lot of paper by A. Owen) do not present things like they are coded.

I coded 2) and packaged it RandomizedQuasiMonteCarlo.jl.
I converted the Owen R code to Julia (with a factor 10 without any Julia tricks).
I added support for scrambling in different base, which can be combined with Faure sequence, for example QuasiMonteCarlo.jl.
Note that I aim to merge my package into QuasiMonteCarlo.jl.
I hope it helps.

stevengj · 2022-11-07T23:51:14Z

Point 1 makes sense to me. I agree with recommending 2^n points, but I'm not sure about forcing it with the API.

Point 2 sounds like it would be a nice feature addition.

A PR for (1) would be a good starting point, and a PR for (2) would be welcome if someone can figure out the algorithm.

C.f. JuliaMath/Sobol.jl#31

ParadaCarleton · 2022-12-05T01:34:34Z

Point 1 makes sense to me. I agree with recommending 2^n points, but I'm not sure about forcing it with the API.

Point 2 sounds like it would be a nice feature addition.

A PR for (1) would be a good starting point, and a PR for (2) would be welcome if someone can figure out the algorithm.

Some comments:

It's very easy to slightly modify the Sobol sequence to use points that don't start at 0 but are still a proper QMC net, even if the final size is unknown. After setting the nth bit, set the n+1th bit to be 1. e.g. instead of .0, .1, .01, .11, .001, .011 ... use .01, .11, .011, .111, .0011, .0111... (This is equivalent to skipping the first 2^n points, but can be done ahead of time.)
I think changing the API to require powers of two is a good idea. The big problem is many users don't realize that QMC points aren't just drop-in replacements for Monte Carlo points, and as a result, bigger sample sizes aren't always better. Sobol points will consistently be biased towards sampling low values of the integrand when generating points larger than 2^n. For randomized points, this worsens the asymptotic error to be no better than Monte Carlo, but for unrandomized points, the result is usually just flat-out inconsistent. It's difficult to imagine a situation where this is something users would actually want.

PieterjanRobbe · 2023-04-28T22:00:47Z

Maybe point (1) is related to this comment in the source code about skipping 2^m - 1 points instead of 2^m?

devmotion · 2024-08-21T12:51:54Z

A PR for (1) would be a good starting point

I opened #35.

ParadaCarleton mentioned this issue Oct 21, 2022

Better Sampling Interface SciML/QuasiMonteCarlo.jl#44

Open

krivenko added a commit to krivenko/QInchworm.jl that referenced this issue Nov 21, 2022

Add a Sobol sequence wrapper that includes the initial point (0, 0, ...)

04d004b

C.f. JuliaMath/Sobol.jl#31

dmetivie mentioned this issue May 22, 2023

Sobol' Sequence Missing First 4 Points SciML/QuasiMonteCarlo.jl#78

Closed

devmotion mentioned this issue May 30, 2024

Use randomized sobol sampling SciML/GlobalSensitivity.jl#167

Open

5 tasks

devmotion mentioned this issue Jun 12, 2024

Start sequence at zero #35

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start the sequence at zero and add scrambling #31

Start the sequence at zero and add scrambling #31

dcarrera commented Aug 4, 2022 •

edited

Loading

dmetivie commented Nov 7, 2022

stevengj commented Nov 7, 2022

ParadaCarleton commented Dec 5, 2022 •

edited

Loading

PieterjanRobbe commented Apr 28, 2023

devmotion commented Aug 21, 2024

Start the sequence at zero and add scrambling #31

Start the sequence at zero and add scrambling #31

Comments

dcarrera commented Aug 4, 2022 • edited Loading

dmetivie commented Nov 7, 2022

stevengj commented Nov 7, 2022

ParadaCarleton commented Dec 5, 2022 • edited Loading

PieterjanRobbe commented Apr 28, 2023

devmotion commented Aug 21, 2024

dcarrera commented Aug 4, 2022 •

edited

Loading

ParadaCarleton commented Dec 5, 2022 •

edited

Loading