You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gilles (JIRA)" <ji...@apache.org> on 2018/07/27 10:35:00 UTC
[jira] [Comment Edited] (RNG-50) PoissonSampler single use speed improvements

    [ https://issues.apache.org/jira/browse/RNG-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559070#comment-16559070 ] 

Gilles edited comment on RNG-50 at 7/27/18 10:34 AM:
-----------------------------------------------------

{quote}It doesn't check the RNG is not null.
{quote}
Passing \{{null} is a programming error, and the check is going to be performed by the JVM; since the call is quite low-level, the NPE will be thrown fairly close to the error; so indeed no need (IMHO) to duplicate such checks.
{quote}AhrensDieterMarsagliaTsangGammaSampler where the constructors params are not checked.
{quote}
Oh. That was not intentional; I'd be inclined to introduce sanity checks. ;)
 Do you think that it would make such a big difference?
{quote}use case of the Poisson sampling is to model photon shot noise on a camera image, e.g 512x512 is 262,144 pixels. Each pixel value is likely to be unique during a simulation that creates floating-point images.
{quote}
Interesting!
 I don't quite understand why "each pixel value is likely to be unique" entails that the sampler is to be thrown away after a single call to the {{sample()}} method. Is the {{mean}} different for each pixel and frame?
 I'm asking because if there is no other way than instantiating an sampler for a single sample, the API is not good for that: there is more overhead than the factorial cache (instantiation of 2 other objects). A new class should provide a {{sample(double mean)}} method (It would not fit the {{DiscreteSampler}} API).

bq. This is a sub-optimal cache.

I know; it was under the assumption that the number of calls to {{sample()}}, with all its non-trivial computations, would generally overwhelm the running time.

bq. Subsequent uses would not suffer.

Sure, I agree. But the concern was to avoid multi-thread complication.

bq. A better solution \[...\] LogFactorial \[...\]

Using {{synchronized}} has a performance impact.
I'm no multi-thread expert but I suspect that your code does not synchronize consistently: Couldn't {code}getLogF(n){code} read the table before a new value has been written to slot "n", thus returning 0?

bq. The master table can be increased in size and reduced.

When would one want to reduce it?

bq. If the FactorialLog class is modified then when it is used by other samplers then they can take advantage of any already computed values.

Sure. But is the performance gain worth the added complexity? And how much is that gain? Aren't there cases where there is a loss in performance due to the synchronization?
Wouldn't it be sufficient to define a constructor where the (precomputed, immutable) cache is passed as an argument?



was (Author: erans):
{quote}It doesn't check the RNG is not null.
{quote}
Passing \{{null} is a programming error, and the check is going to be performed by the JVM; since the call is quite low-level, the NPE will be thrown fairly close to the error; so indeed no need (IMHO) to duplicate such checks.
{quote}AhrensDieterMarsagliaTsangGammaSampler where the constructors params are not checked.
{quote}
Oh. That was not intentional; I'd be inclined to introduce sanity checks. ;)
 Do you think that it would make such a big difference?
{quote}use case of the Poisson sampling is to model photon shot noise on a camera image, e.g 512x512 is 262,144 pixels. Each pixel value is likely to be unique during a simulation that creates floating-point images.
{quote}
Interesting!
 I don't quite understand why "each pixel value is likely to be unique" entails that the sampler is to be thrown away after a single call to the {{sample()}} method. Is the {{mean}} different for each pixel and frame?
 I'm asking because if there is no other way than instantiating an sampler for a single sample, the API is not good for that: there is more overhead than the factorial cache (instantiation of 2 other objects). A new class should provide a {{sample(double mean)}} method (It would not fit the {{DiscreteSampler}} API).

bq. This is a sub-optimal cache.

I know; it was under the assumption that the number of calls to {{sample()}}, with all its non-trivial computations, would generally overwhelm the running time.

bq. Subsequent uses would not suffer.

Sure, I agree. But the concern was to avoid multi-thread complication.

bq. A better solution \[...\] LogFactorial \[...\]

Using {{synchronized}} has a performance impact.
I'm no multi-thread expert but I suspect that your code does not synchronize consistently: Couldn't {{getLogF(n)}} read the table before a new value has been written to slot "n", thus returning 0?

bq. The master table can be increased in size and reduced.

When would one want to reduce it?

bq. If the FactorialLog class is modified then when it is used by other samplers then they can take advantage of any already computed values.

Sure. But is the performance gain worth the added complexity? And how much is that gain? Aren't there cases where there is a loss in performance due to the synchronization?
Wouldn't it be sufficient to define a constructor where the (precomputed, immutable) cache is passed as an argument?


> PoissonSampler single use speed improvements
> --------------------------------------------
>
>                 Key: RNG-50
>                 URL: https://issues.apache.org/jira/browse/RNG-50
>             Project: Commons RNG
>          Issue Type: Improvement
>    Affects Versions: 1.0
>            Reporter: Alex D Herbert
>            Priority: Minor
>         Attachments: PoissonSamplerTest.java
>
>
> The Sampler architecture of {{org.apache.commons.rng.sampling.distribution}} is nicely written for fast sampling of small dataset sizes. The constructors for the samplers do not check the input parameters are valid for the respective distributions (in contrast to the old {{org.apache.commons.math3.random.distribution}} classes). I assume this is a design choice for speed. Thus most of the samplers can be used within a loop to sample just one value with very little overhead.
> The {{PoissonSampler}} precomputes log factorial numbers upon construction if the mean is above 40. This is done using the {{InternalUtils.FactorialLog}} class. As of version 1.0 this internal class is currently only used in the {{PoissonSampler}}.
> The cache size is limited to 2*PIVOT (where PIVOT=40). But it creates and precomputes the cache every time a PoissonSampler is constructed if the mean is above the PIVOT value.
> Why not create this once in a static block for the PoissonSampler?
> {code:java}
> /** {@code log(n!)}. */
> private static final FactorialLog factorialLog;
>      
> static 
> {
>     factorialLog = FactorialLog.create().withCache((int) (2 * PoissonSampler.PIVOT));
> }
> {code}
> This will make the construction cost of a new {{PoissonSampler}} negligible. If the table is computed dynamically as a static construction method then the overhead will be in the first use. Thus the following call will be much faster:
> {code:java}
> UniformRandomProvider rng = ...;
> int value = new PoissonSampler(rng, 50).sample();
> {code}
> I have tested this modification (see attached file) and the results are:
> {noformat}
> Mean 40  Single construction ( 7330792) vs Loop construction                          (24334724)   (3.319522.2x faster)
> Mean 40  Single construction ( 7330792) vs Loop construction with static FactorialLog ( 7990656)   (1.090013.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction                          (19389026)   (3.034132.2x faster)
> Mean 50  Single construction ( 6390303) vs Loop construction with static FactorialLog ( 6146556)   (0.961857.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction                          (21337678)   (3.532047.2x faster)
> Mean 60  Single construction ( 6041165) vs Loop construction with static FactorialLog ( 5329129)   (0.882136.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction                          (23963516)   (3.951765.2x faster)
> Mean 70  Single construction ( 6064003) vs Loop construction with static FactorialLog ( 5306081)   (0.875013.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction                          (26381365)   (4.349935.2x faster)
> Mean 80  Single construction ( 6064772) vs Loop construction with static FactorialLog ( 6341274)   (1.045591.2x faster)
> {noformat}
> Thus the speed improvements would be approximately 3-4 fold for single use Poisson sampling.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)