You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Alex Herbert (Jira)" <ji...@apache.org> on 2019/10/28 14:13:00 UTC

[jira] [Comment Edited] (RNG-86) PractRand

[ https://issues.apache.org/jira/browse/RNG-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961037#comment-16961037 ]

Alex Herbert edited comment on RNG-86 at 10/28/19 2:12 PM:
-----------------------------------------------------------

h1. PractRand

PractRand is a test suite that can test the output of RNGs. Testing is performed using a different approach to Dieharder and TestU01. Those test suites choose a list of tests. The RNG output is then used for each test in turn. Each test requires an approximately fixed size of output. Thus in general each test uses different output from the RNG and the entire test suite takes the same length of time irrespective of the RNG.

A run of PractRand must choose the set of tests. The output of the RNG is passed to *all* of the tests concurrently and assessed at intervals. The intervals successively double until the maximum length of the test is reached or failure occurs. The result is that PractRand can fail very fast or run a long time. The JDK random fails in 4 seconds; 4TB of output from PCG RS 32 takes over 12.7 hours when PractRand is run multi-threaded (for an approximate speed-up of 3-fold). In contrast the same machine will run BigCrush in 5.2 hours.

PractRand can test 8, 16, 32 or 64-bit output. The output size of the RNG is important as PractRand has the concept of folding where sections of the output are extracted into a smaller sequences and assessed.

The folding options test the following least significant bits of the output:
||Option||Output||Folding transforms||
|-tf 0 (no folding)|ignored|none|
|-tf 1 (smart folding)|32-bit|1/32,8/32|
|-tf 1 (smart folding)|64-bit|1/64, 4/64, 16/64|
|-tf 2 (extended folding)|ignored|1/8,1/16,1/32,1/64,4/16,4/32,4/64,8/32,8/64|

The default mode is smart folding (-tf 1).

There are 2 main options for test suites.
||Option||Tests||
|-te 0|core|
|-te 1|extended|

The default mode is smart folding (-te 0). This uses a set of tests which are orthogonal, i.e. they test different characteristics of the output. The extended test set contains repeats of many of the tests with different parameters. It is also variable enough that the author notes that between versions it may require a re-run of benchmarks.

The documentation contains notes about the expected run time of the different modes:
{noformat}
normal test set, no folding: 9.6 seconds per GB
normal test set, standard folding: 13.5 seconds per GB (recommended)
normal test set, extra folding: 23 seconds per GB
expanded test set, no folding: 25 seconds per GB
expanded test set, standard folding: 32 seconds per GB
expanded test set, extra folding: 54 seconds per GB
{noformat}
The documentation notes about the extra foldings:
{noformat}
"This may cause excessive amounts of memory to be used on longer data streams"
{noformat}
Example memory usage I have noted using the residual state size of the process when run using the core tests (-te 0):
||Folding||Output (TB)||Mode||Residual State Size (KB)||GB||
|1|1|32-bit| 732792|0.733|
|2|1|32-bit|1271424|1.27|
|1|1|64-bit|800148|0.800|
|2|1|64-bit|1304116|1.30|
|1|4|32-bit|1239440|1.24|
|2|4|32-bit|2314232|2.31|
|1|4|64-bit|1324436|1.32|
|2|4|64-bit|?|?|

So the 64-bit smart folding (-tf 1) requires a bit more memory. The extended foldings (-tf 2) use 50% more memory for max 1 TB and nearly 2-fold more memory of 4 TB. I do not have a trial for extended folding in 64-bit mode but since the mode is ignored (all folding are always done) it will probably be approximately the same.
h1. Results

In the following results the number is the size of the output where failure occurred expressed as a power of 2. Hence higher is better. No failure is marked with a dash.
h2. PractRand v0.93 -tf 2 -te 1 -tlmax 1TB:

Extended foldings.
Extended tests.
{noformat}
RNG PractRand ∩
JDK 19 512 KiB
WELL_512_A 21 2 MiB
WELL_1024_A 24 16 MiB
WELL_19937_A 36 64 GiB
WELL_19937_C 36 64 GiB
WELL_44497_A 39 512 GiB
WELL_44497_B 39 512 GiB
MT 32 4 GiB
ISAAC -
SPLIT_MIX_64 -
XOR_SHIFT_1024_S 28 256 MiB
TWO_CMRES 21 2 MiB
MT_64 36 64 GiB
MWC_256 -
KISS -
XOR_SHIFT_1024_S_PHI 30 1 GiB
XO_RO_SHI_RO_64_S 18 256 KiB
XO_RO_SHI_RO_64_SS -
XO_SHI_RO_128_PLUS 21 2 MiB
XO_SHI_RO_128_SS -
XO_RO_SHI_RO_128_PLUS 22 4 MiB
XO_RO_SHI_RO_128_SS -
XO_SHI_RO_256_PLUS 24 16 MiB
XO_SHI_RO_256_SS -
XO_SHI_RO_512_PLUS 27 128 MiB
XO_SHI_RO_512_SS -
PCG_XSH_RR_32 -
PCG_XSH_RS_32 -
PCG_RXS_M_XS_64 -
PCG_MCG_XSH_RR_32 -
PCG_MCG_XSH_RS_32 -
MSWS -
SFC_32 -
SFC_64 -
JSF_32 -
JSF_64 -
XO_SHI_RO_128_PP -
XO_RO_SHI_RO_128_PP -
XO_SHI_RO_256_PP -
XO_SHI_RO_512_PP -
XO_RO_SHI_RO_1024_PP -
XO_RO_SHI_RO_1024_S 30 1 GiB
XO_RO_SHI_RO_1024_SS -
{noformat}
This test was aborted from multiple trials. The extended test set with extra folding increases run time considerably. It also increases memory usage to the extent that parallel jobs on the same machine can run out of memory as each task can use >2GB of RAM.
h2. PractRand v0.93 -tf 2 -te 0 -tlmax 1TB:

Extended foldings.
Core tests.
{noformat}
RNG PractRand ∩
JDK 18,19,19,19,19 512 KiB
WELL_512_A 24,24,24,24,24 16 MiB
WELL_1024_A 27,27,27,27,27 128 MiB
WELL_19937_A 39,39,39,39,39 512 GiB
WELL_19937_C 39,39,39,39,39 512 GiB
WELL_44497_A -,-,-,-,-
WELL_44497_B -,-,-,-,-
MT 35,35,35,35,35 32 GiB
ISAAC -,-,-,-,-
SPLIT_MIX_64 -,-,-,-,-
XOR_SHIFT_1024_S 31,31,31,31,31 2 GiB
TWO_CMRES 21,21,21,21,21 2 MiB
MT_64 39,39,39,39,39 512 GiB
MWC_256 -,-,-,-,-
KISS -,-,-,-,-
XOR_SHIFT_1024_S_PHI 33,33,33,33,33 8 GiB
XO_RO_SHI_RO_64_S 21,21,21,21,21 2 MiB
XO_RO_SHI_RO_64_SS -,-,-,-,-
XO_SHI_RO_128_PLUS 24,24,24,24,24 16 MiB
XO_SHI_RO_128_SS -,-,-,-,-
XO_RO_SHI_RO_128_PLUS 25,25,25,25,25 32 MiB
XO_RO_SHI_RO_128_SS -,-,-,-,-
XO_SHI_RO_256_PLUS 27,27,27,27,27 128 MiB
XO_SHI_RO_256_SS -,-,-,-,-
XO_SHI_RO_512_PLUS 30,30,30,30,30 1 GiB
XO_SHI_RO_512_SS -,-,-,-,-
PCG_XSH_RR_32 -,-,-,-,-
PCG_XSH_RS_32 -,-,-,-,-
PCG_RXS_M_XS_64 -,-,-,-,-
PCG_MCG_XSH_RR_32 -,-,-,-,-
PCG_MCG_XSH_RS_32 -,-,-,-,-
MSWS -,-,-,-,-
SFC_32 -,-,-,-,-
SFC_64 -,-,-,-,-
JSF_32 -,-,-,-,-
JSF_64 -,-,-,-,-
XO_SHI_RO_128_PP -,-,-,-,-
XO_RO_SHI_RO_128_PP -,-,-,-,-
XO_SHI_RO_256_PP -,-,-,-,-
XO_SHI_RO_512_PP -,-,-,-,-
XO_RO_SHI_RO_1024_PP -,-,-,-,-
XO_RO_SHI_RO_1024_S 33,33,33,33,33 8 GiB
XO_RO_SHI_RO_1024_SS -,-,-,-,-
{noformat}
This test contains multiple trials. However note that:
* Almost all failures occurred using foldings that are present in the default smart folding set
* The WELL_44497 generators do not fail. These are known to fail TestU01 BigCrush.

Here are failures from the extended folding set:
{noformat}
MT PractRand 32 GiB
MT PractRand [Low4/16]BRank(12):6K(1)
TWO_CMRES PractRand 2 MiB
TWO_CMRES PractRand [Low4/32]Gap-16:A
{noformat}
h2. PractRand v0.94 -tf 1 -te 0 -tlmax 4TB:

Smart foldings.
Core tests.

Note: These are the default settings but reduced from 32TB of output to 4TB. Using the smart folding and core test set allows the test to execute twice as fast so a higher maximum output was possible.

This is a PractRand version switch from v0.93 to v0.94. The v0.94 distribution download does not build on linux so I initially tested using v0.93. Then I created a patch to fix v0.94 to allow testing using the newer version. A patch has been added to the examples-stress src folder.
{noformat}
RNG PractRand ∩
JDK 20,20,20 1 MiB
WELL_512_A 24,24,24 16 MiB
WELL_1024_A 27,27,27 128 MiB
WELL_19937_A 39,39,39 512 GiB
WELL_19937_C 39,39,39 512 GiB
WELL_44497_A 42,42,42 4 TiB
WELL_44497_B 42,42,42 4 TiB
MT 38,38,38 256 GiB
ISAAC -,-,-
SPLIT_MIX_64 -,-,-
XOR_SHIFT_1024_S 31,31,31 2 GiB
TWO_CMRES 32,32,32 4 GiB
MT_64 39,39,39 512 GiB
MWC_256 -,-,-
KISS -,-,-
XOR_SHIFT_1024_S_PHI 33,33,33 8 GiB
XO_RO_SHI_RO_64_S 21,21,21 2 MiB
XO_RO_SHI_RO_64_SS -,-,-
XO_SHI_RO_128_PLUS 24,24,24 16 MiB
XO_SHI_RO_128_SS -,-,-
XO_RO_SHI_RO_128_PLUS 25,25,25 32 MiB
XO_RO_SHI_RO_128_SS -,-,-
XO_SHI_RO_256_PLUS 27,27,27 128 MiB
XO_SHI_RO_256_SS -,-,-
XO_SHI_RO_512_PLUS 30,30,30 1 GiB
XO_SHI_RO_512_SS -,-,-
PCG_XSH_RR_32 -,-,-
PCG_XSH_RS_32 41,-,-
PCG_RXS_M_XS_64 -,-,-
PCG_MCG_XSH_RR_32 -,-,-
PCG_MCG_XSH_RS_32 40,41,41 2 TiB
MSWS -,-,-
SFC_32 -,-,-
SFC_64 -,-,-
JSF_32 -,-,-
JSF_64 -,-,-
XO_SHI_RO_128_PP -,-,-
XO_RO_SHI_RO_128_PP -,-,-
XO_SHI_RO_256_PP -,-,-
XO_SHI_RO_512_PP -,-,-
XO_RO_SHI_RO_1024_PP -,-,-
XO_RO_SHI_RO_1024_S 33,33,33 8 GiB
XO_RO_SHI_RO_1024_SS -,-,-
{noformat}
Note that with the switch to v0.94 the core test suite was updated. The test where JDK failed in v0.93 has been removed. Thus JDK gets to twice the output before failing a different test (still at 4 seconds). This test was replaced by a new test that mainly targets LCGs. Interestingly this makes PCG XSH RS fail once at 2TB. The variant PCG MCG XSH RS systematically fails at 2TB.

Also note that the failures that occurred on extended folding are now observed later on the smart foldings:
{noformat}
MT 32 GiB -> 256 GiB
TWO_CMRES 2 MiB -> 4 GiB
{noformat}
In particular the TWO_CMRES gets 2000 times as much output before failing. The output size is still relatively small.

The WELL_44497 generators now fail identifying problems that are found by TestU01 BigCrush.
h1. Conclusion

I tried a few variants of testing. Using extended folding was much slower and did not fail many generators that would not fail anyway on the core foldings. Using the extended test suite is not recommended by the author. The tests are more comprehensive but are not orthogonal. There is a lot of overlap. The extra tests make the run-time longer and significantly increase the memory overhead. The extra tests only seem to make the MT generator fail much faster. Using the smart folding with the core tests allow running to 4TB of output in the same length of time as 1TB with extended folding and extra tests. At 4TB we see failures of the WELL_44497 generators and PCG_MCG_XSH_RS so this extra length is useful.

I only did 3 runs. Computation time is large compared to TestU01 BigCrush. In contrast to TestU01 the entire output of the generator is put through every test. I believe it is a better use of resources to try an extend the run-time to longer output rather than to include more runs at the same length. The seed becomes largely irrelevant when 4TB of output is generated (e.g. 2^36 long values). So I can commit the current results to the user guide and then spend a few weeks running PractRand to 32TB output for a single trial run to see what happens.
h1. Stats

The total testing time for each of the test suites in the current set of results are:
||Tests||Total||Cores||Notes||
|Dieharder|13 days 19:05:49.59|2|Workstation 1 |
|TestU01 BigCrush|47 days 00:50:11.79|2|Workstation 1 |
|PractRand|35 days 12:43:34.94|4|Workstation 1 for some generators; Workstation 2 for the other generators.
The second machine runs approximately 2-fold faster than workstation 1. This may be due to improved hardware or the switch form g++ 4 to g++ 5 which updates support from c++11 to c++14.|

Note that to workaround the high memory usage of PractRand (which uses up to 1.3GB per job when run to 4TB of output on default settings) I use the multi-threaded mode of PractRand. It can use up to 5 threads but usage is variable and there is an approximate 3-fold increase in speed when multi-threaded. Thus results have been generated using 3 threads per testing process. This allows long running parallel jobs on a benchmarking machine to not saturate memory.

was (Author: alexherbert):
h1. PractRand

The default mode is smart folding (-tf 1).

There are 2 main options for test suites.
||Option||Tests||
|-ts 0|core|
|-ts 1|extended|

The default mode is smart folding (-ts 0). This uses a set of tests which are orthogonal, i.e. they test different characteristics of the output. The extended test set contains repeats of many of the tests with different parameters. It is also variable enough that the author notes that between versions it may require a re-run of benchmarks.

The documentation contains notes about the expected run time of the different modes:
{noformat}
normal test set, no folding: 9.6 seconds per GB
normal test set, standard folding: 13.5 seconds per GB (recommended)
normal test set, extra folding: 23 seconds per GB
expanded test set, no folding: 25 seconds per GB
expanded test set, standard folding: 32 seconds per GB
expanded test set, extra folding: 54 seconds per GB
{noformat}
The documentation notes about the extra foldings:
{noformat}
"This may cause excessive amounts of memory to be used on longer data streams"
{noformat}
Example memory usage I have noted using the residual state size of the process when run using the core tests (-ts 0):
||Folding||Output (TB)||Mode||Residual State Size (KB)||GB||
|1|1|32-bit| 732792|0.733|
|2|1|32-bit|1271424|1.27|
|1|1|64-bit|800148|0.800|
|2|1|64-bit|1304116|1.30|
|1|4|32-bit|1239440|1.24|
|2|4|32-bit|2314232|2.31|
|1|4|64-bit|1324436|1.32|
|2|4|64-bit|?|?|

Smart foldings.
Core tests.

The WELL_44497 generators now fail identifying problems that are found by TestU01 BigCrush.
h1. Conclusion

> PractRand
> ---------
>
> Key: RNG-86
> URL: https://issues.apache.org/jira/browse/RNG-86
> Project: Commons RNG
> Issue Type: Wish
> Components: examples
> Reporter: Gilles Sadowski
> Priority: Minor
>
> Integrate another test suite to the {{RandomStressTester}} application:
> [http://pracrand.sourceforge.net/]
> The library also contains many RNG implementations (C++).
> FTR: https://markmail.org/message/74zmora4jrhwb5hu

--
This message was sent by Atlassian Jira
(v8.3.4#803005)