You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Alex Herbert (Jira)" <ji...@apache.org> on 2019/10/28 14:13:00 UTC

[jira] [Comment Edited] (RNG-86) PractRand

    [ https://issues.apache.org/jira/browse/RNG-86?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16961037#comment-16961037 ] 

Alex Herbert edited comment on RNG-86 at 10/28/19 2:12 PM:
-----------------------------------------------------------

h1. PractRand

PractRand is a test suite that can test the output of RNGs. Testing is performed using a different approach to Dieharder and TestU01. Those test suites choose a list of tests. The RNG output is then used for each test in turn. Each test requires an approximately fixed size of output. Thus in general each test uses different output from the RNG and the entire test suite takes the same length of time irrespective of the RNG.

A run of PractRand must choose the set of tests. The output of the RNG is passed to *all* of the tests concurrently and assessed at intervals. The intervals successively double until the maximum length of the test is reached or failure occurs. The result is that PractRand can fail very fast or run a long time. The JDK random fails in 4 seconds; 4TB of output from PCG RS 32 takes over 12.7 hours when PractRand is run multi-threaded (for an approximate speed-up of 3-fold). In contrast the same machine will run BigCrush in 5.2 hours.

PractRand can test 8, 16, 32 or 64-bit output. The output size of the RNG is important as PractRand has the concept of folding where sections of the output are extracted into a smaller sequences and assessed.

The folding options test the following least significant bits of the output:
||Option||Output||Folding transforms||
|-tf 0 (no folding)|ignored|none|
|-tf 1 (smart folding)|32-bit|1/32,8/32|
|-tf 1 (smart folding)|64-bit|1/64, 4/64, 16/64|
|-tf 2 (extended folding)|ignored|1/8,1/16,1/32,1/64,4/16,4/32,4/64,8/32,8/64|

The default mode is smart folding (-tf 1).

There are 2 main options for test suites.
||Option||Tests||
|-te 0|core|
|-te 1|extended|

The default mode is smart folding (-te 0). This uses a set of tests which are orthogonal, i.e. they test different characteristics of the output. The extended test set contains repeats of many of the tests with different parameters. It is also variable enough that the author notes that between versions it may require a re-run of benchmarks.

The documentation contains notes about the expected run time of the different modes: 
{noformat}
normal test set, no folding: 9.6 seconds per GB
normal test set, standard folding: 13.5 seconds per GB (recommended)
normal test set, extra folding: 23 seconds per GB
expanded test set, no folding: 25 seconds per GB
expanded test set, standard folding: 32 seconds per GB
expanded test set, extra folding: 54 seconds per GB
{noformat}
The documentation notes about the extra foldings:
{noformat}
"This may cause excessive amounts of memory to be used on longer data streams"
{noformat}
Example memory usage I have noted using the residual state size of the process when run using the core tests (-te 0):
||Folding||Output (TB)||Mode||Residual State Size (KB)||GB||
|1|1|32-bit| 732792|0.733|
|2|1|32-bit|1271424|1.27|
|1|1|64-bit|800148|0.800|
|2|1|64-bit|1304116|1.30|
|1|4|32-bit|1239440|1.24|
|2|4|32-bit|2314232|2.31|
|1|4|64-bit|1324436|1.32|
|2|4|64-bit|?|?|

So the 64-bit smart folding (-tf 1) requires a bit more memory. The extended foldings (-tf 2) use 50% more memory for max 1 TB and nearly 2-fold more memory of 4 TB. I do not have a trial for extended folding in 64-bit mode but since the mode is ignored (all folding are always done) it will probably be approximately the same.
h1. Results

In the following results the number is the size of the output where failure occurred expressed as a power of 2. Hence higher is better. No failure is marked with a dash.
h2. PractRand v0.93 -tf 2 -te 1 -tlmax 1TB:

Extended foldings.
Extended tests.
{noformat}
RNG                     PractRand       ∩      
JDK                     19              512 KiB
WELL_512_A              21              2 MiB  
WELL_1024_A             24              16 MiB 
WELL_19937_A            36              64 GiB 
WELL_19937_C            36              64 GiB 
WELL_44497_A            39              512 GiB
WELL_44497_B            39              512 GiB
MT                      32              4 GiB  
ISAAC                   -                      
SPLIT_MIX_64            -                      
XOR_SHIFT_1024_S        28              256 MiB
TWO_CMRES               21              2 MiB  
MT_64                   36              64 GiB 
MWC_256                 -                      
KISS                    -                      
XOR_SHIFT_1024_S_PHI    30              1 GiB  
XO_RO_SHI_RO_64_S       18              256 KiB
XO_RO_SHI_RO_64_SS      -                      
XO_SHI_RO_128_PLUS      21              2 MiB  
XO_SHI_RO_128_SS        -                      
XO_RO_SHI_RO_128_PLUS   22              4 MiB  
XO_RO_SHI_RO_128_SS     -                      
XO_SHI_RO_256_PLUS      24              16 MiB 
XO_SHI_RO_256_SS        -                      
XO_SHI_RO_512_PLUS      27              128 MiB
XO_SHI_RO_512_SS        -                      
PCG_XSH_RR_32           -                      
PCG_XSH_RS_32           -                      
PCG_RXS_M_XS_64         -                      
PCG_MCG_XSH_RR_32       -                      
PCG_MCG_XSH_RS_32       -                      
MSWS                    -                      
SFC_32                  -                      
SFC_64                  -                      
JSF_32                  -                      
JSF_64                  -                      
XO_SHI_RO_128_PP        -                      
XO_RO_SHI_RO_128_PP     -                      
XO_SHI_RO_256_PP        -                      
XO_SHI_RO_512_PP        -                      
XO_RO_SHI_RO_1024_PP    -                      
XO_RO_SHI_RO_1024_S     30              1 GiB  
XO_RO_SHI_RO_1024_SS    -  
{noformat}
This test was aborted from multiple trials. The extended test set with extra folding increases run time considerably. It also increases memory usage to the extent that parallel jobs on the same machine can run out of memory as each task can use >2GB of RAM.
h2. PractRand v0.93 -tf 2 -te 0 -tlmax 1TB:

Extended foldings.
 Core tests.
{noformat}
RNG                     PractRand       ∩      
JDK                     18,19,19,19,19  512 KiB
WELL_512_A              24,24,24,24,24  16 MiB 
WELL_1024_A             27,27,27,27,27  128 MiB
WELL_19937_A            39,39,39,39,39  512 GiB
WELL_19937_C            39,39,39,39,39  512 GiB
WELL_44497_A            -,-,-,-,-              
WELL_44497_B            -,-,-,-,-              
MT                      35,35,35,35,35  32 GiB 
ISAAC                   -,-,-,-,-              
SPLIT_MIX_64            -,-,-,-,-              
XOR_SHIFT_1024_S        31,31,31,31,31  2 GiB  
TWO_CMRES               21,21,21,21,21  2 MiB  
MT_64                   39,39,39,39,39  512 GiB
MWC_256                 -,-,-,-,-              
KISS                    -,-,-,-,-              
XOR_SHIFT_1024_S_PHI    33,33,33,33,33  8 GiB  
XO_RO_SHI_RO_64_S       21,21,21,21,21  2 MiB  
XO_RO_SHI_RO_64_SS      -,-,-,-,-              
XO_SHI_RO_128_PLUS      24,24,24,24,24  16 MiB 
XO_SHI_RO_128_SS        -,-,-,-,-              
XO_RO_SHI_RO_128_PLUS   25,25,25,25,25  32 MiB 
XO_RO_SHI_RO_128_SS     -,-,-,-,-              
XO_SHI_RO_256_PLUS      27,27,27,27,27  128 MiB
XO_SHI_RO_256_SS        -,-,-,-,-              
XO_SHI_RO_512_PLUS      30,30,30,30,30  1 GiB  
XO_SHI_RO_512_SS        -,-,-,-,-              
PCG_XSH_RR_32           -,-,-,-,-              
PCG_XSH_RS_32           -,-,-,-,-              
PCG_RXS_M_XS_64         -,-,-,-,-              
PCG_MCG_XSH_RR_32       -,-,-,-,-              
PCG_MCG_XSH_RS_32       -,-,-,-,-              
MSWS                    -,-,-,-,-              
SFC_32                  -,-,-,-,-              
SFC_64                  -,-,-,-,-              
JSF_32                  -,-,-,-,-              
JSF_64                  -,-,-,-,-              
XO_SHI_RO_128_PP        -,-,-,-,-              
XO_RO_SHI_RO_128_PP     -,-,-,-,-              
XO_SHI_RO_256_PP        -,-,-,-,-              
XO_SHI_RO_512_PP        -,-,-,-,-              
XO_RO_SHI_RO_1024_PP    -,-,-,-,-              
XO_RO_SHI_RO_1024_S     33,33,33,33,33  8 GiB  
XO_RO_SHI_RO_1024_SS    -,-,-,-,-
{noformat}
This test contains multiple trials. However note that:
 * Almost all failures occurred using foldings that are present in the default smart folding set
 * The WELL_44497 generators do not fail. These are known to fail TestU01 BigCrush.

Here are failures from the extended folding set:
{noformat}
MT                      PractRand       32 GiB                    
MT                      PractRand       [Low4/16]BRank(12):6K(1)  
TWO_CMRES               PractRand       2 MiB                     
TWO_CMRES               PractRand       [Low4/32]Gap-16:A 
{noformat}
h2. PractRand v0.94 -tf 1 -te 0 -tlmax 4TB:

Smart foldings.
Core tests.

Note: These are the default settings but reduced from 32TB of output to 4TB. Using the smart folding and core test set allows the test to execute twice as fast so a higher maximum output was possible.

This is a PractRand version switch from v0.93 to v0.94. The v0.94 distribution download does not build on linux so I initially tested using v0.93. Then I created a patch to fix v0.94 to allow testing using the newer version. A patch has been added to the examples-stress src folder.
{noformat}
RNG                     PractRand       ∩      
JDK                     20,20,20        1 MiB  
WELL_512_A              24,24,24        16 MiB 
WELL_1024_A             27,27,27        128 MiB
WELL_19937_A            39,39,39        512 GiB
WELL_19937_C            39,39,39        512 GiB
WELL_44497_A            42,42,42        4 TiB  
WELL_44497_B            42,42,42        4 TiB  
MT                      38,38,38        256 GiB
ISAAC                   -,-,-                  
SPLIT_MIX_64            -,-,-                  
XOR_SHIFT_1024_S        31,31,31        2 GiB  
TWO_CMRES               32,32,32        4 GiB  
MT_64                   39,39,39        512 GiB
MWC_256                 -,-,-                  
KISS                    -,-,-                  
XOR_SHIFT_1024_S_PHI    33,33,33        8 GiB  
XO_RO_SHI_RO_64_S       21,21,21        2 MiB  
XO_RO_SHI_RO_64_SS      -,-,-                  
XO_SHI_RO_128_PLUS      24,24,24        16 MiB 
XO_SHI_RO_128_SS        -,-,-                  
XO_RO_SHI_RO_128_PLUS   25,25,25        32 MiB 
XO_RO_SHI_RO_128_SS     -,-,-                  
XO_SHI_RO_256_PLUS      27,27,27        128 MiB
XO_SHI_RO_256_SS        -,-,-                  
XO_SHI_RO_512_PLUS      30,30,30        1 GiB  
XO_SHI_RO_512_SS        -,-,-                  
PCG_XSH_RR_32           -,-,-                  
PCG_XSH_RS_32           41,-,-                 
PCG_RXS_M_XS_64         -,-,-                  
PCG_MCG_XSH_RR_32       -,-,-                  
PCG_MCG_XSH_RS_32       40,41,41        2 TiB  
MSWS                    -,-,-                  
SFC_32                  -,-,-                  
SFC_64                  -,-,-                  
JSF_32                  -,-,-                  
JSF_64                  -,-,-                  
XO_SHI_RO_128_PP        -,-,-                  
XO_RO_SHI_RO_128_PP     -,-,-                  
XO_SHI_RO_256_PP        -,-,-                  
XO_SHI_RO_512_PP        -,-,-                  
XO_RO_SHI_RO_1024_PP    -,-,-                  
XO_RO_SHI_RO_1024_S     33,33,33        8 GiB  
XO_RO_SHI_RO_1024_SS    -,-,-           
{noformat}
Note that with the switch to v0.94 the core test suite was updated. The test where JDK failed in v0.93 has been removed. Thus JDK gets to twice the output before failing a different test (still at 4 seconds). This test was replaced by a new test that mainly targets LCGs. Interestingly this makes PCG XSH RS fail once at 2TB. The variant PCG MCG XSH RS systematically fails at 2TB.

Also note that the failures that occurred on extended folding are now observed later on the smart foldings:
{noformat}
MT           32 GiB -> 256 GiB
TWO_CMRES     2 MiB ->   4 GiB
{noformat}
In particular the TWO_CMRES gets 2000 times as much output before failing. The output size is still relatively small.

The WELL_44497 generators now fail identifying problems that are found by TestU01 BigCrush.
h1. Conclusion

I tried a few variants of testing. Using extended folding was much slower and did not fail many generators that would not fail anyway on the core foldings. Using the extended test suite is not recommended by the author. The tests are more comprehensive but are not orthogonal. There is a lot of overlap. The extra tests make the run-time longer and significantly increase the memory overhead. The extra tests only seem to make the MT generator fail much faster. Using the smart folding with the core tests allow running to 4TB of output in the same length of time as 1TB with extended folding and extra tests. At 4TB we see failures of the WELL_44497 generators and PCG_MCG_XSH_RS so this extra length is useful.

I only did 3 runs. Computation time is large compared to TestU01 BigCrush. In contrast to TestU01 the entire output of the generator is put through every test. I believe it is a better use of resources to try an extend the run-time to longer output rather than to include more runs at the same length. The seed becomes largely irrelevant when 4TB of output is generated (e.g. 2^36 long values). So I can commit the current results to the user guide and then spend a few weeks running PractRand to 32TB output for a single trial run to see what happens.
h1. Stats

The total testing time for each of the test suites in the current set of results are:
||Tests||Total||Cores||Notes||
|​Dieharder|​13 days 19:05:49.59|2|Workstation 1 |
|TestU01 BigCrush|47 days 00:50:11.79|2|Workstation 1 |
|PractRand|35 days 12:43:34.94|4|Workstation 1 for some generators; Workstation 2 for the other generators. 
  
 The second machine runs approximately 2-fold faster than workstation 1. This may be due to improved hardware or the switch form g++ 4 to g++ 5 which updates support from c++11 to c++14.|

Note that to workaround the high memory usage of PractRand (which uses up to 1.3GB per job when run to 4TB of output on default settings) I use the multi-threaded mode of PractRand. It can use up to 5 threads but usage is variable and there is an approximate 3-fold increase in speed when multi-threaded. Thus results have been generated using 3 threads per testing process. This allows long running parallel jobs on a benchmarking machine to not saturate memory.



was (Author: alexherbert):
h1. PractRand

PractRand is a test suite that can test the output of RNGs. Testing is performed using a different approach to Dieharder and TestU01. Those test suites choose a list of tests. The RNG output is then used for each test in turn. Each test requires an approximately fixed size of output. Thus in general each test uses different output from the RNG and the entire test suite takes the same length of time irrespective of the RNG.

A run of PractRand must choose the set of tests. The output of the RNG is passed to *all* of the tests concurrently and assessed at intervals. The intervals successively double until the maximum length of the test is reached or failure occurs. The result is that PractRand can fail very fast or run a long time. The JDK random fails in 4 seconds; 4TB of output from PCG RS 32 takes over 12.7 hours when PractRand is run multi-threaded (for an approximate speed-up of 3-fold). In contrast the same machine will run BigCrush in 5.2 hours.

PractRand can test 8, 16, 32 or 64-bit output. The output size of the RNG is important as PractRand has the concept of folding where sections of the output are extracted into a smaller sequences and assessed.

The folding options test the following least significant bits of the output:
||Option||Output||Folding transforms||
|-tf 0 (no folding)|ignored|none|
|-tf 1 (smart folding)|32-bit|1/32,8/32|
|-tf 1 (smart folding)|64-bit|1/64, 4/64, 16/64|
|-tf 2 (extended folding)|ignored|1/8,1/16,1/32,1/64,4/16,4/32,4/64,8/32,8/64|

The default mode is smart folding (-tf 1).

There are 2 main options for test suites.
||Option||Tests||
|-ts 0|core|
|-ts 1|extended|

The default mode is smart folding (-ts 0). This uses a set of tests which are orthogonal, i.e. they test different characteristics of the output. The extended test set contains repeats of many of the tests with different parameters. It is also variable enough that the author notes that between versions it may require a re-run of benchmarks.

The documentation contains notes about the expected run time of the different modes: 
{noformat}
normal test set, no folding: 9.6 seconds per GB
normal test set, standard folding: 13.5 seconds per GB (recommended)
normal test set, extra folding: 23 seconds per GB
expanded test set, no folding: 25 seconds per GB
expanded test set, standard folding: 32 seconds per GB
expanded test set, extra folding: 54 seconds per GB
{noformat}
The documentation notes about the extra foldings:
{noformat}
"This may cause excessive amounts of memory to be used on longer data streams"
{noformat}
Example memory usage I have noted using the residual state size of the process when run using the core tests (-ts 0):
||Folding||Output (TB)||Mode||Residual State Size (KB)||GB||
|1|1|32-bit| 732792|0.733|
|2|1|32-bit|1271424|1.27|
|1|1|64-bit|800148|0.800|
|2|1|64-bit|1304116|1.30|
|1|4|32-bit|1239440|1.24|
|2|4|32-bit|2314232|2.31|
|1|4|64-bit|1324436|1.32|
|2|4|64-bit|?|?|

So the 64-bit smart folding (-tf 1) requires a bit more memory. The extended foldings (-tf 2) use 50% more memory for max 1 TB and nearly 2-fold more memory of 4 TB. I do not have a trial for extended folding in 64-bit mode but since the mode is ignored (all folding are always done) it will probably be approximately the same.
h1. Results

In the following results the number is the size of the output where failure occurred expressed as a power of 2. Hence higher is better. No failure is marked with a dash.
h2. PractRand v0.93 -tf 2 -ts 1 -tlmax 1TB:

Extended foldings.
Extended tests.
{noformat}
RNG                     PractRand       ∩      
JDK                     19              512 KiB
WELL_512_A              21              2 MiB  
WELL_1024_A             24              16 MiB 
WELL_19937_A            36              64 GiB 
WELL_19937_C            36              64 GiB 
WELL_44497_A            39              512 GiB
WELL_44497_B            39              512 GiB
MT                      32              4 GiB  
ISAAC                   -                      
SPLIT_MIX_64            -                      
XOR_SHIFT_1024_S        28              256 MiB
TWO_CMRES               21              2 MiB  
MT_64                   36              64 GiB 
MWC_256                 -                      
KISS                    -                      
XOR_SHIFT_1024_S_PHI    30              1 GiB  
XO_RO_SHI_RO_64_S       18              256 KiB
XO_RO_SHI_RO_64_SS      -                      
XO_SHI_RO_128_PLUS      21              2 MiB  
XO_SHI_RO_128_SS        -                      
XO_RO_SHI_RO_128_PLUS   22              4 MiB  
XO_RO_SHI_RO_128_SS     -                      
XO_SHI_RO_256_PLUS      24              16 MiB 
XO_SHI_RO_256_SS        -                      
XO_SHI_RO_512_PLUS      27              128 MiB
XO_SHI_RO_512_SS        -                      
PCG_XSH_RR_32           -                      
PCG_XSH_RS_32           -                      
PCG_RXS_M_XS_64         -                      
PCG_MCG_XSH_RR_32       -                      
PCG_MCG_XSH_RS_32       -                      
MSWS                    -                      
SFC_32                  -                      
SFC_64                  -                      
JSF_32                  -                      
JSF_64                  -                      
XO_SHI_RO_128_PP        -                      
XO_RO_SHI_RO_128_PP     -                      
XO_SHI_RO_256_PP        -                      
XO_SHI_RO_512_PP        -                      
XO_RO_SHI_RO_1024_PP    -                      
XO_RO_SHI_RO_1024_S     30              1 GiB  
XO_RO_SHI_RO_1024_SS    -  
{noformat}
This test was aborted from multiple trials. The extended test set with extra folding increases run time considerably. It also increases memory usage to the extent that parallel jobs on the same machine can run out of memory as each task can use >2GB of RAM.
h2. PractRand v0.93 -tf 2 -ts 0 -tlmax 1TB:

Extended foldings.
 Core tests.
{noformat}
RNG                     PractRand       ∩      
JDK                     18,19,19,19,19  512 KiB
WELL_512_A              24,24,24,24,24  16 MiB 
WELL_1024_A             27,27,27,27,27  128 MiB
WELL_19937_A            39,39,39,39,39  512 GiB
WELL_19937_C            39,39,39,39,39  512 GiB
WELL_44497_A            -,-,-,-,-              
WELL_44497_B            -,-,-,-,-              
MT                      35,35,35,35,35  32 GiB 
ISAAC                   -,-,-,-,-              
SPLIT_MIX_64            -,-,-,-,-              
XOR_SHIFT_1024_S        31,31,31,31,31  2 GiB  
TWO_CMRES               21,21,21,21,21  2 MiB  
MT_64                   39,39,39,39,39  512 GiB
MWC_256                 -,-,-,-,-              
KISS                    -,-,-,-,-              
XOR_SHIFT_1024_S_PHI    33,33,33,33,33  8 GiB  
XO_RO_SHI_RO_64_S       21,21,21,21,21  2 MiB  
XO_RO_SHI_RO_64_SS      -,-,-,-,-              
XO_SHI_RO_128_PLUS      24,24,24,24,24  16 MiB 
XO_SHI_RO_128_SS        -,-,-,-,-              
XO_RO_SHI_RO_128_PLUS   25,25,25,25,25  32 MiB 
XO_RO_SHI_RO_128_SS     -,-,-,-,-              
XO_SHI_RO_256_PLUS      27,27,27,27,27  128 MiB
XO_SHI_RO_256_SS        -,-,-,-,-              
XO_SHI_RO_512_PLUS      30,30,30,30,30  1 GiB  
XO_SHI_RO_512_SS        -,-,-,-,-              
PCG_XSH_RR_32           -,-,-,-,-              
PCG_XSH_RS_32           -,-,-,-,-              
PCG_RXS_M_XS_64         -,-,-,-,-              
PCG_MCG_XSH_RR_32       -,-,-,-,-              
PCG_MCG_XSH_RS_32       -,-,-,-,-              
MSWS                    -,-,-,-,-              
SFC_32                  -,-,-,-,-              
SFC_64                  -,-,-,-,-              
JSF_32                  -,-,-,-,-              
JSF_64                  -,-,-,-,-              
XO_SHI_RO_128_PP        -,-,-,-,-              
XO_RO_SHI_RO_128_PP     -,-,-,-,-              
XO_SHI_RO_256_PP        -,-,-,-,-              
XO_SHI_RO_512_PP        -,-,-,-,-              
XO_RO_SHI_RO_1024_PP    -,-,-,-,-              
XO_RO_SHI_RO_1024_S     33,33,33,33,33  8 GiB  
XO_RO_SHI_RO_1024_SS    -,-,-,-,-
{noformat}
This test contains multiple trials. However note that:
 * Almost all failures occurred using foldings that are present in the default smart folding set
 * The WELL_44497 generators do not fail. These are known to fail TestU01 BigCrush.

Here are failures from the extended folding set:
{noformat}
MT                      PractRand       32 GiB                    
MT                      PractRand       [Low4/16]BRank(12):6K(1)  
TWO_CMRES               PractRand       2 MiB                     
TWO_CMRES               PractRand       [Low4/32]Gap-16:A 
{noformat}
h2. PractRand v0.94 -tf 1 -ts 0 -tlmax 4TB:

Smart foldings.
Core tests.

Note: These are the default settings but reduced from 32TB of output to 4TB. Using the smart folding and core test set allows the test to execute twice as fast so a higher maximum output was possible.

This is a PractRand version switch from v0.93 to v0.94. The v0.94 distribution download does not build on linux so I initially tested using v0.93. Then I created a patch to fix v0.94 to allow testing using the newer version. A patch has been added to the examples-stress src folder.
{noformat}
RNG                     PractRand       ∩      
JDK                     20,20,20        1 MiB  
WELL_512_A              24,24,24        16 MiB 
WELL_1024_A             27,27,27        128 MiB
WELL_19937_A            39,39,39        512 GiB
WELL_19937_C            39,39,39        512 GiB
WELL_44497_A            42,42,42        4 TiB  
WELL_44497_B            42,42,42        4 TiB  
MT                      38,38,38        256 GiB
ISAAC                   -,-,-                  
SPLIT_MIX_64            -,-,-                  
XOR_SHIFT_1024_S        31,31,31        2 GiB  
TWO_CMRES               32,32,32        4 GiB  
MT_64                   39,39,39        512 GiB
MWC_256                 -,-,-                  
KISS                    -,-,-                  
XOR_SHIFT_1024_S_PHI    33,33,33        8 GiB  
XO_RO_SHI_RO_64_S       21,21,21        2 MiB  
XO_RO_SHI_RO_64_SS      -,-,-                  
XO_SHI_RO_128_PLUS      24,24,24        16 MiB 
XO_SHI_RO_128_SS        -,-,-                  
XO_RO_SHI_RO_128_PLUS   25,25,25        32 MiB 
XO_RO_SHI_RO_128_SS     -,-,-                  
XO_SHI_RO_256_PLUS      27,27,27        128 MiB
XO_SHI_RO_256_SS        -,-,-                  
XO_SHI_RO_512_PLUS      30,30,30        1 GiB  
XO_SHI_RO_512_SS        -,-,-                  
PCG_XSH_RR_32           -,-,-                  
PCG_XSH_RS_32           41,-,-                 
PCG_RXS_M_XS_64         -,-,-                  
PCG_MCG_XSH_RR_32       -,-,-                  
PCG_MCG_XSH_RS_32       40,41,41        2 TiB  
MSWS                    -,-,-                  
SFC_32                  -,-,-                  
SFC_64                  -,-,-                  
JSF_32                  -,-,-                  
JSF_64                  -,-,-                  
XO_SHI_RO_128_PP        -,-,-                  
XO_RO_SHI_RO_128_PP     -,-,-                  
XO_SHI_RO_256_PP        -,-,-                  
XO_SHI_RO_512_PP        -,-,-                  
XO_RO_SHI_RO_1024_PP    -,-,-                  
XO_RO_SHI_RO_1024_S     33,33,33        8 GiB  
XO_RO_SHI_RO_1024_SS    -,-,-           
{noformat}
Note that with the switch to v0.94 the core test suite was updated. The test where JDK failed in v0.93 has been removed. Thus JDK gets to twice the output before failing a different test (still at 4 seconds). This test was replaced by a new test that mainly targets LCGs. Interestingly this makes PCG XSH RS fail once at 2TB. The variant PCG MCG XSH RS systematically fails at 2TB.

Also note that the failures that occurred on extended folding are now observed later on the smart foldings:
{noformat}
MT           32 GiB -> 256 GiB
TWO_CMRES     2 MiB ->   4 GiB
{noformat}
In particular the TWO_CMRES gets 2000 times as much output before failing. The output size is still relatively small.

The WELL_44497 generators now fail identifying problems that are found by TestU01 BigCrush.
h1. Conclusion

I tried a few variants of testing. Using extended folding was much slower and did not fail many generators that would not fail anyway on the core foldings. Using the extended test suite is not recommended by the author. The tests are more comprehensive but are not orthogonal. There is a lot of overlap. The extra tests make the run-time longer and significantly increase the memory overhead. The extra tests only seem to make the MT generator fail much faster. Using the smart folding with the core tests allow running to 4TB of output in the same length of time as 1TB with extended folding and extra tests. At 4TB we see failures of the WELL_44497 generators and PCG_MCG_XSH_RS so this extra length is useful.

I only did 3 runs. Computation time is large compared to TestU01 BigCrush. In contrast to TestU01 the entire output of the generator is put through every test. I believe it is a better use of resources to try an extend the run-time to longer output rather than to include more runs at the same length. The seed becomes largely irrelevant when 4TB of output is generated (e.g. 2^36 long values). So I can commit the current results to the user guide and then spend a few weeks running PractRand to 32TB output for a single trial run to see what happens.
h1. Stats

The total testing time for each of the test suites in the current set of results are:
||Tests||Total||Cores||Notes||
|​Dieharder|​13 days 19:05:49.59|2|Workstation 1 |
|TestU01 BigCrush|47 days 00:50:11.79|2|Workstation 1 |
|PractRand|35 days 12:43:34.94|4|Workstation 1 for some generators; Workstation 2 for the other generators. 
  
 The second machine runs approximately 2-fold faster than workstation 1. This may be due to improved hardware or the switch form g++ 4 to g++ 5 which updates support from c++11 to c++14.|

Note that to workaround the high memory usage of PractRand (which uses up to 1.3GB per job when run to 4TB of output on default settings) I use the multi-threaded mode of PractRand. It can use up to 5 threads but usage is variable and there is an approximate 3-fold increase in speed when multi-threaded. Thus results have been generated using 3 threads per testing process. This allows long running parallel jobs on a benchmarking machine to not saturate memory.

 

> PractRand
> ---------
>
>                 Key: RNG-86
>                 URL: https://issues.apache.org/jira/browse/RNG-86
>             Project: Commons RNG
>          Issue Type: Wish
>          Components: examples
>            Reporter: Gilles Sadowski
>            Priority: Minor
>
> Integrate another test suite to the {{RandomStressTester}} application:
>  [http://pracrand.sourceforge.net/]
> The library also contains many RNG implementations (C++).
> FTR: https://markmail.org/message/74zmora4jrhwb5hu



--
This message was sent by Atlassian Jira
(v8.3.4#803005)