You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datasketches.apache.org by Jon Malkin <jo...@gmail.com> on 2019/08/22 21:49:12 UTC

java and Random

We got a PR to change reservoir sampling from using java.util.Random to
java.util.ThreadLocalRandom, saying it sped up a multithreaded app
significantly. In a scenario with multiple threads writing to independent
reservoirs, ThreadLocalRandom really should avoid a lot of locks, so that's
fine.

The only drawback I see is that it makes deterministic, reproducible
testing impossible since you can't set the seed for ThreadLocalRanom.
That's ultimately a good thing -- it'd be a terrible design to have
separate reservoirs initialized with the same seed! And our tests already
don't pick a seed.

I'm inclined to approve the PR, and even to go farther and recommend that
we (eventually) move to TLRandom more generally. But I first wanted to see
if there's some unknown drawback of which I am not currently aware?

  jon

Re: java and Random

Posted by Jon Malkin <jo...@gmail.com>.
Is there any performance difference between ThreadLocal<SplittableRandom>
and ThreadLocalRandom?

The place where we rely on randomness end up being fundamental to the
correctness of the algorithms. Allowing the seed to be set, at least by end
users, is actually a risk to correctness in production. We had someone
trying to create deterministic output from a quantiles sketch, for
instance, which ends up breaking any sort of guarantees we can make about
the error properties!

Aside from debugging, where I'm willing to say it's ok to hack in a
fixed-seed RNG, I'd generally vote for making it harder to break sketches
results in production systems (and to make us be less lazy with tests of
stochastic data structures).  Which means I'd find it harder to support
ThreadLocal<SplittableRandom> if there's even a fairly modest performance
impact. At the same time, my intensity of support in this case isn't very
high.

  jon

On Thu, Aug 22, 2019, 3:24 PM Roman Leventov <le...@gmail.com> wrote:

> ThreadLocal<SplittableRandom> may be used to avoid locks and retain
> determinism in tests.
>
> On Fri, 23 Aug 2019, 00:49 Jon Malkin, <jo...@gmail.com> wrote:
>
> > We got a PR to change reservoir sampling from using java.util.Random to
> > java.util.ThreadLocalRandom, saying it sped up a multithreaded app
> > significantly. In a scenario with multiple threads writing to independent
> > reservoirs, ThreadLocalRandom really should avoid a lot of locks, so
> that's
> > fine.
> >
> > The only drawback I see is that it makes deterministic, reproducible
> > testing impossible since you can't set the seed for ThreadLocalRanom.
> > That's ultimately a good thing -- it'd be a terrible design to have
> > separate reservoirs initialized with the same seed! And our tests already
> > don't pick a seed.
> >
> > I'm inclined to approve the PR, and even to go farther and recommend that
> > we (eventually) move to TLRandom more generally. But I first wanted to
> see
> > if there's some unknown drawback of which I am not currently aware?
> >
> >   jon
> >
>

Re: java and Random

Posted by Roman Leventov <le...@gmail.com>.
ThreadLocal<SplittableRandom> may be used to avoid locks and retain
determinism in tests.

On Fri, 23 Aug 2019, 00:49 Jon Malkin, <jo...@gmail.com> wrote:

> We got a PR to change reservoir sampling from using java.util.Random to
> java.util.ThreadLocalRandom, saying it sped up a multithreaded app
> significantly. In a scenario with multiple threads writing to independent
> reservoirs, ThreadLocalRandom really should avoid a lot of locks, so that's
> fine.
>
> The only drawback I see is that it makes deterministic, reproducible
> testing impossible since you can't set the seed for ThreadLocalRanom.
> That's ultimately a good thing -- it'd be a terrible design to have
> separate reservoirs initialized with the same seed! And our tests already
> don't pick a seed.
>
> I'm inclined to approve the PR, and even to go farther and recommend that
> we (eventually) move to TLRandom more generally. But I first wanted to see
> if there's some unknown drawback of which I am not currently aware?
>
>   jon
>