You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Gilles <gi...@harfang.homelinux.org> on 2016/09/02 22:36:50 UTC

[rng] Usefulness of benchmarks

Hi.

This page
   
http://commons.apache.org/proper/commons-rng/userguide/rng.html#a3._Performance
presents two sets of benchmarks.

For a long time, I've raised the issue that it might be important
to understand why they don't seem to agree.

Eventually, it seems that performance is not such a great concern
as was made to believe when I initially questioned a few statements
of the then Commons Math code.

The discrepancy between "PerfTestUtils" and JMH could be a bug (in
"PerfTestUtils" of course!) or ... measuring different use-cases:
Use of several RNGs at the same time vs using a single one; the
latter could allow for more aggressive optimizations.

Lacking input as to what the benchmarks purport to demonstrate, I'm
about to simply delete the "PerfTestUtils" column.
The result will be a simplified (maybe simplistic) view of the
relative performance of the RNGs in the "single use" use-case.

Any comment, objection, explanation, suggestion?
[E.g. set up JMH to benchmark the other use case, or a reason why
this is in fact not necessary.]


Thanks,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [rng] Usefulness of benchmarks

Posted by Gilles <gi...@harfang.homelinux.org>.
Hi.

On Sun, 4 Sep 2016 15:20:39 +0300, Artem Barger wrote:
> On Sun, Sep 4, 2016 at 3:01 PM, Gilles <gi...@harfang.homelinux.org> 
> wrote:
>
>> "RandomStressTester" is in Commons _Rng_ "src/userguide" example
>> code, while I've just committed "RandomNumberGeneratorBenchmark"
>> in Commons _Math_ userguide examples (branch "develop").
>>
>> There is a README in "src/userguide"; basically, you'd run with a
>> command similar to:
>>
>>   $ mvn -q exec:java \
>>    
>> -Dexec.mainClass=org.apache.commons.math4.userguide.rng.RandomNumberGeneratorBenchmark
>> \
>>    -Dexec.args="MT ISAAC JDK"
>>
>> where the "exec.args" contains a list of "RandomSource" identifiers.
>>
>>
> \u200bI've \u200bgot following results and they look similar to ones you have 
> in the
> table :/
>
> [...]
>

It seems that the behaviour changes with the number of generators
being tested. I don't know what to make out of that...

Personally, I think that these codes are so small/fast (wrt to
common use of Java) that such micro-benchmarking measurements
border to an exercise in futility.

Concerning the subject of this thread, I guess that we are back to
square one, i.e. no more knowledge about it was gathered here than
we had in December 2015, even though back then the benchmark score
was the sole argument put forward for constraining the design.

:-(


Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [rng] Usefulness of benchmarks

Posted by Artem Barger <ar...@bargr.net>.
On Sun, Sep 4, 2016 at 3:01 PM, Gilles <gi...@harfang.homelinux.org> wrote:

> "RandomStressTester" is in Commons _Rng_ "src/userguide" example
> code, while I've just committed "RandomNumberGeneratorBenchmark"
> in Commons _Math_ userguide examples (branch "develop").
>
> There is a README in "src/userguide"; basically, you'd run with a
> command similar to:
>
>   $ mvn -q exec:java \
>    -Dexec.mainClass=org.apache.commons.math4.userguide.rng.RandomNumberGeneratorBenchmark
> \
>    -Dexec.args="MT ISAAC JDK"
>
> where the "exec.args" contains a list of "RandomSource" identifiers.
>
>
​I've ​got following results and they look similar to ones you have in the
table :/

Java 1.8.0_101
Runtime 1.8.0_101-b13
JVM Java HotSpot(TM) 64-Bit Server VM 25.101-b13
nextInt() (calls per timed block: 1000000, timed blocks: 500, time unit: ms)
                       name time/call std dev total time ratio   cv
difference
                 j.u.Random 1.390e-05 4.2e-06 6.9486e+03 1.000 0.30
0.0000e+00
o.a.c.r.i.s.MersenneTwister 1.129e-05 3.9e-06 5.6437e+03 0.812 0.34
-1.3049e+03
    o.a.c.r.i.s.ISAACRandom 1.611e-05 3.7e-06 8.0537e+03 1.159 0.23
1.1052e+03
      o.a.c.r.i.s.JDKRandom 1.634e-05 3.4e-06 8.1704e+03 1.176 0.21
1.2218e+03
nextDouble() (calls per timed block: 1000000, timed blocks: 500, time unit:
ms)
                       name time/call std dev total time ratio   cv
difference
                 j.u.Random 2.608e-05 4.3e-06 1.3042e+04 1.000 0.17
0.0000e+00
o.a.c.r.i.s.MersenneTwister 2.099e-05 3.9e-06 1.0497e+04 0.805 0.19
-2.5450e+03
    o.a.c.r.i.s.ISAACRandom 2.381e-05 3.8e-06 1.1905e+04 0.913 0.16
-1.1367e+03
      o.a.c.r.i.s.JDKRandom 2.922e-05 4.5e-06 1.4612e+04 1.120 0.15
1.5705e+03
nextLong() (calls per timed block: 1000000, timed blocks: 500, time unit:
ms)
                       name time/call std dev total time ratio   cv
difference
                 j.u.Random 2.436e-05 3.9e-06 1.2180e+04 1.000 0.16
0.0000e+00
o.a.c.r.i.s.MersenneTwister 1.890e-05 3.4e-06 9.4501e+03 0.776 0.18
-2.7297e+03
    o.a.c.r.i.s.ISAACRandom 2.169e-05 4.1e-06 1.0845e+04 0.890 0.19
-1.3352e+03
      o.a.c.r.i.s.JDKRandom 2.743e-05 4.7e-06 1.3713e+04 1.126 0.17
1.5330e+03


Best regards,
                      Artem Barger.

Re: [rng] Usefulness of benchmarks

Posted by Gilles <gi...@harfang.homelinux.org>.
On Sun, 4 Sep 2016 14:22:04 +0300, Artem Barger wrote:
> On Sun, Sep 4, 2016 at 2:06 PM, Gilles <gi...@harfang.homelinux.org> 
> wrote:
>
>> A few quick runs seems to produce numbers that are now much
>> closer to what JMH produces.[1]
>>
>> What do you think?
>>
>
> \u200bHow I can produce results with PerfTestUtils? Is there any script to 
> run
> RandomStressTester?

"RandomStressTester" is in Commons _Rng_ "src/userguide" example
code, while I've just committed "RandomNumberGeneratorBenchmark"
in Commons _Math_ userguide examples (branch "develop").

There is a README in "src/userguide"; basically, you'd run with a
command similar to:

   $ mvn -q exec:java \
    
-Dexec.mainClass=org.apache.commons.math4.userguide.rng.RandomNumberGeneratorBenchmark 
\
    -Dexec.args="MT ISAAC JDK"

where the "exec.args" contains a list of "RandomSource" identifiers.

> It possible that previous results were produced on loaded computer, 
> hence
> the bias.\u200b

Unlikely to be the reason because:
  1. the computer was not overloaded,
  2. the same load would be seen by all benchmarked codes (that is
     precisely the purpose of "PerfTestUtils").

Anyways, this is all becoming rather moot since the previous
figures are not reproducible...


Best regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [rng] Usefulness of benchmarks

Posted by Artem Barger <ar...@bargr.net>.
On Sun, Sep 4, 2016 at 2:06 PM, Gilles <gi...@harfang.homelinux.org> wrote:

> A few quick runs seems to produce numbers that are now much
> closer to what JMH produces.[1]
>
> What do you think?
>

​How I can produce results with PerfTestUtils? Is there any script to run
RandomStressTester?
It possible that previous results were produced on loaded computer, hence
the bias.​


Best regards,
                      Artem Barger.

Re: [rng] Usefulness of benchmarks

Posted by Gilles <gi...@harfang.homelinux.org>.
Hi Artem.

I've just updated the usage examples of Commons Math:
  commit ae8f5f04574be75727f5e948e04bc649cbdbbb3b

A few quick runs seems to produce numbers that are now much
closer to what JMH produces.[1]

What do you think?

I'm now fairly comfortable dropping the "PerfTestUtils" column
from the table in the userguide (i.e. IMHO it's sufficiently
close to what JMH provides to not raise concern).


Thanks,
Gilles


[1] Although I don't recall having made any code update since
     the previous numbers were produced that could account for
     this change. :-{


On Sun, 04 Sep 2016 00:17:17 +0200, Gilles wrote:
> On Sun, 4 Sep 2016 00:33:32 +0300, Artem Barger wrote:
>> On Sat, Sep 3, 2016 at 1:36 AM, Gilles 
>> <gi...@harfang.homelinux.org> wrote:
>>
>>> The discrepancy between "PerfTestUtils" and JMH could be a bug (in
>>> "PerfTestUtils" of course!) or ... measuring different use-cases:
>>> Use of several RNGs at the same time vs using a single one; the
>>> latter could allow for more aggressive optimizations.
>>>
>>
>> \u200bI'm not really familiar with the PerfTestUtils, while I know that 
>> JMH is
>> doing a great
>> job to avoid different pitfalls while building microbenchmarks for
>> measuring a performance.
>> Also it looks a bit suspicious where comparing JDK random generator 
>> against
>> itself it's not
>> showing ration of 1.0 for PerfTestUtils.
>
> That is easy to explain: the ratio was done wrt to a
> "java.util.Random" object, while "RandomSource.JDK" wraps
> an instance of "java.util.Random".
>
>
>>> Lacking input as to what the benchmarks purport to demonstrate, I'm
>>> about to simply delete the "PerfTestUtils" column.
>>> The result will be a simplified (maybe simplistic) view of the
>>> relative performance of the RNGs in the "single use" use-case.
>>>
>>>
>> \u200bI can try to take a look on PerfTestUtils to understand what is the 
>> main
>> cause of such difference.\u200b
>
> AFAICS, the main difference is that JMH benchmarks one code
> (marked with "@Benchmark") after the other while "PerfTestUtils"
> benchmarks all the codes together.
>
>>> Any comment, objection, explanation, suggestion?
>>> [E.g. set up JMH to benchmark the other use case, or a reason why
>>> this is in fact not necessary.]
>>>
>>
>> \u200bWe can play with different amount of warm up rounds in JMH to see 
>> whenever
>> there is a degradation
>> to results similar to PerfTestUtils for example.\u200b
>
> I'm pretty sure it's not related to warmup because the
> "PerfTestUtils" benchmarks were running for much longer
> than JMH.
>
> What would be interesting is to see whether JMH performance
> degrades when multiple RNGs (of different types) have been
> instantiated and run in the same scope.
>
> Regards,
> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [rng] Usefulness of benchmarks

Posted by Gilles <gi...@harfang.homelinux.org>.
On Sun, 4 Sep 2016 00:33:32 +0300, Artem Barger wrote:
> On Sat, Sep 3, 2016 at 1:36 AM, Gilles <gi...@harfang.homelinux.org> 
> wrote:
>
>> The discrepancy between "PerfTestUtils" and JMH could be a bug (in
>> "PerfTestUtils" of course!) or ... measuring different use-cases:
>> Use of several RNGs at the same time vs using a single one; the
>> latter could allow for more aggressive optimizations.
>>
>
> \u200bI'm not really familiar with the PerfTestUtils, while I know that 
> JMH is
> doing a great
> job to avoid different pitfalls while building microbenchmarks for
> measuring a performance.
> Also it looks a bit suspicious where comparing JDK random generator 
> against
> itself it's not
> showing ration of 1.0 for PerfTestUtils.

That is easy to explain: the ratio was done wrt to a
"java.util.Random" object, while "RandomSource.JDK" wraps
an instance of "java.util.Random".


>> Lacking input as to what the benchmarks purport to demonstrate, I'm
>> about to simply delete the "PerfTestUtils" column.
>> The result will be a simplified (maybe simplistic) view of the
>> relative performance of the RNGs in the "single use" use-case.
>>
>>
> \u200bI can try to take a look on PerfTestUtils to understand what is the 
> main
> cause of such difference.\u200b

AFAICS, the main difference is that JMH benchmarks one code
(marked with "@Benchmark") after the other while "PerfTestUtils"
benchmarks all the codes together.

>> Any comment, objection, explanation, suggestion?
>> [E.g. set up JMH to benchmark the other use case, or a reason why
>> this is in fact not necessary.]
>>
>
> \u200bWe can play with different amount of warm up rounds in JMH to see 
> whenever
> there is a degradation
> to results similar to PerfTestUtils for example.\u200b

I'm pretty sure it's not related to warmup because the
"PerfTestUtils" benchmarks were running for much longer
than JMH.

What would be interesting is to see whether JMH performance
degrades when multiple RNGs (of different types) have been
instantiated and run in the same scope.

Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [rng] Usefulness of benchmarks

Posted by Artem Barger <ar...@bargr.net>.
On Sat, Sep 3, 2016 at 1:36 AM, Gilles <gi...@harfang.homelinux.org> wrote:

> The discrepancy between "PerfTestUtils" and JMH could be a bug (in
> "PerfTestUtils" of course!) or ... measuring different use-cases:
> Use of several RNGs at the same time vs using a single one; the
> latter could allow for more aggressive optimizations.
>

​I'm not really familiar with the PerfTestUtils, while I know that JMH is
doing a great
job to avoid different pitfalls while building microbenchmarks for
measuring a performance.
Also it looks a bit suspicious where comparing JDK random generator against
itself it's not
showing ration of 1.0 for PerfTestUtils.


> Lacking input as to what the benchmarks purport to demonstrate, I'm
> about to simply delete the "PerfTestUtils" column.
> The result will be a simplified (maybe simplistic) view of the
> relative performance of the RNGs in the "single use" use-case.
>
>
​I can try to take a look on PerfTestUtils to understand what is the main
cause of such difference.​



> Any comment, objection, explanation, suggestion?
> [E.g. set up JMH to benchmark the other use case, or a reason why
> this is in fact not necessary.]
>

​We can play with different amount of warm up rounds in JMH to see whenever
there is a degradation
to results similar to PerfTestUtils for example.​



Best regards,
                      Artem Barger.