You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Alex Herbert <al...@gmail.com> on 2019/03/14 11:51:06 UTC

[Rng] User Guide Quality section

I've just added a Jira task to add the new XorShiRo generaters to the 
user guide.

I note that the order of the RNGs in the *Quality* section does not 
match the order of the RandomSource enum, nor is it alphabetical. The 
list order is defined in 
org.apache.commons.rng.examples.stress.GeneratorsList. Nearly all the 
IntProviders are first with the exception of MWC_256 and KISS.

So:

- Should the new XoShiRo generators be added to the end?
- Should all the *Quality* results be reordered to match the 
RandomSource enum?
- Should results be alphabetical?
- Should the MWC_256 and KISS be moved so the generators are listed for 
IntProvider then LongProvider?

Looking at the history for RandomSource enum and the GeneratorsList it 
seems the order has always been the same in both, although not between 
them. If new generators are added to the end of the RandomSource enum 
then perhaps all the other lists used in the test suites and 
benchmarking suites should match the same order.


I would vote for requiring that new generators are added to the end of 
the RandomSource enum. All other lists through the code should match 
this order.


The order in PR #30 for the XorShift1024StarPhi generator adds to the 
RandomSource enum directly after XOR_SHIFT_1024_S.

The order in PR #20 for the new XorShiRo generators adds to the 
RandomSource enum at the end in alphabetical order for the IntProviders 
then again for the new LongProviders.

These can be fixed depending on the order decision when the PRs are merged.


Alex



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Rng] User Guide Quality section

Posted by Alex Herbert <al...@gmail.com>.

> On 24 Mar 2019, at 23:57, Gilles Sadowski <gi...@gmail.com> wrote:
> 
>> [...]
> 
> I think that we mostly agreed in another thread.
> 
>> 
>> Summary of my proposals:
>> 
>> - Keep the order in the user guide as is but move MT next to MT_64
>> - Auto generate the GeneratorsList from the RandomSource enum
>> - Update all the file names and the user guide to point to the renamed files for the new order
>> - New generators can go anywhere in the user guide table if related to others, or at the end
>> - Assert that the ProviderList in [simple] matches RandomSource in size
>> - Add test that a null seed passed to RandomSource.create can build a generator that passes the test for randomness
>> - Update the ProviderBuilder to handle zero filled seeds for generators that cannot handle them
>> - Update the ProviderBuilder to handle an empty array as if it were a null seed
> 
> I'm not sure about the last two points; I imagined that one would want to be
> able to explicitly see the limitation of some algorithm.
> A "null" seed is convention for telling the library to provide a good sequence
> without much hassle, but if a user says that the seed is zero, then it might
> be useful to honor that.  One should read the doc. :-)

OK. So only try and guarantee a functional provider if no seed is given and the seed is ‘correctly' made by the library. Let the user mess up their own seed. That follows from having no checks on the input seeds (beyond length) in the actual generators in the core module too.

I’ll do this when I have an attempt to update the ProviderBuilder code for building generators.

Currently I am still refining the command line program to run the stress test. It started as a small job that got big. I’ll create a PR for discussion and testing when it is done. Hopefully this week.

> 
> Gilles
> 
>> [...]
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Rng] User Guide Quality section

Posted by Gilles Sadowski <gi...@gmail.com>.
> [...]

I think that we mostly agreed in another thread.

>
> Summary of my proposals:
>
> - Keep the order in the user guide as is but move MT next to MT_64
> - Auto generate the GeneratorsList from the RandomSource enum
> - Update all the file names and the user guide to point to the renamed files for the new order
> - New generators can go anywhere in the user guide table if related to others, or at the end
> - Assert that the ProviderList in [simple] matches RandomSource in size
> - Add test that a null seed passed to RandomSource.create can build a generator that passes the test for randomness
> - Update the ProviderBuilder to handle zero filled seeds for generators that cannot handle them
> - Update the ProviderBuilder to handle an empty array as if it were a null seed

I'm not sure about the last two points; I imagined that one would want to be
able to explicitly see the limitation of some algorithm.
A "null" seed is convention for telling the library to provide a good sequence
without much hassle, but if a user says that the seed is zero, then it might
be useful to honor that.  One should read the doc. :-)

Gilles

> [...]

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Rng] User Guide Quality section

Posted by Alex Herbert <al...@gmail.com>.

> On 14 Mar 2019, at 18:22, Gilles Sadowski <gi...@gmail.com> wrote:
> 
> Hello.
> 
> Le jeu. 14 mars 2019 à 12:58, Alex Herbert <al...@gmail.com> a écrit :
>> 
>> I've just added a Jira task to add the new XorShiRo generaters to the
>> user guide.
>> 
>> I note that the order of the RNGs in the *Quality* section does not
>> match the order of the RandomSource enum, nor is it alphabetical. The
>> list order is defined in
>> org.apache.commons.rng.examples.stress.GeneratorsList. Nearly all the
>> IntProviders are first with the exception of MWC_256 and KISS.
>> 
>> So:
>> 
>> - Should the new XoShiRo generators be added to the end?
> 
> +1
> [To minimize source code changes.]

Agreed.

> 
>> - Should all the *Quality* results be reordered to match the
>> RandomSource enum?
> 
> +0
> [Unless some other orders make sense: from fast to slow, from good
> to bad, …]

Since the order in the rng.apt file does not have to match the underlying GeneratorsList or RandomSource enum then sorting/ordering the user guide table is the easiest option.

If we use total failures for BigCrush this makes the order:

RNG identifier	Dieharder	BigCrush
JDK	11, 12, 13	74, 72, 75
WELL_512_A	0, 0, 0	7, 6, 6
WELL_1024_A	0, 0, 0	4, 4, 5
WELL_44497_A	0, 0, 0	2, 3, 3
WELL_19937_A	0, 0, 0	3, 2, 3
WELL_19937_C	0, 1, 0	2, 2, 3
MT_64	0, 0, 1	3, 2, 3
MT	0, 1, 0	3, 2, 2
WELL_44497_B	0, , 0	2, 2, 2
KISS	0, 0, 0	1, 2, 0
ISAAC	0, 0, 1	0, 1, 0
TWO_CMRES	1, 1, 1	0, 0, 1
SPLIT_MIX_64	0, 0, 0	2, 0, 0
XOR_SHIFT_1024_S	0, 0, 0	2, 0, 0
MWC_256	0, 0, 0	0, 0, 0


I am not sure I like this. It appears more of a mess to my eye. At least in the current order the WELL family is grouped together.

Maybe the only thing I would change is to put the MT and MT_64 together.

> 
>> - Should results be alphabetical?
> 
> -0
> [The names are not especially telling…]

Agreed. It was logical but not useful. I’ve just done it in my spreadsheet copy and it also looks a mess to my eye.

> 
>> - Should the MWC_256 and KISS be moved so the generators are listed for
>> IntProvider then LongProvider?
> 
> ?

Since the end-user does not have to know if the RNG is a based on making int or long then this suggestion is ruled out.

> 
>> Looking at the history for RandomSource enum and the GeneratorsList it
>> seems the order has always been the same in both, although not between
>> them. If new generators are added to the end of the RandomSource enum
>> then perhaps all the other lists used in the test suites and
>> benchmarking suites should match the same order.
> 
> The order in "RandomSource" is immaterial to the te user (i.e. an application
> should certainly not rely on that order).
> 
>> I would vote for requiring that new generators are added to the end of
>> the RandomSource enum.
> 
> +1
> 
>> All other lists through the code should match
>> this order.
> 
> What when the list split in two ("Int" vs "Long"), then combined for running
> a test/benchmark?  Results are not going to be sorted.  At first sight, I don't
> think it's worth making this sort of requirements (unless everything can be
> automated to provide the sorted output for free).

Keeping the order the same through the codebase was an idea just for maintenance. There is very similar functionality here:

[test] org.apache.commons.rng.core.ProvidersList
[test] org.apache.commons.rng.simple.ProvidersList
org.apache.commons.rng.examples.stress.GeneratorsList
org.apache.commons.rng.examples.jmh.ConstructionPerformance$Sources
org.apache.commons.rng.examples.jmh.GenerationPerformance$Sources
org.apache.commons.rng.examples.jmh.distribution.SamplersPerformance$Sources

The first two build a list with a seed. Core uses a random seed from SecureRandom. Simple uses a fixed seed. It still tests randomness using copied code from the tests in core. However it does have a note: "check whether it is possible to have a single implementation.” As an aside I think this test for randomness should stay but the generator should be built with a null seed. Currently it tests empty seeds and an empty seed array is not processed specially by the ProviderBuilder so this actually tests self-seeding of the generator. Self-Seeding is already testing in the core module. I suggest the ProviderBuilder be updated to detect empty seed arrays and do something better, like handle an empty array as if it were null. The test in simple should also add test for randomness using a null seed, effectively testing the RandomSource.create method creates a provider that works given no seed, i.e. like testEmptyIntArraySeed but with a null seed:

@Test
public void testNullSeed() {
   // … Do the checkNextIntegerInRange test
}

I also note that there is this:

@Ignore@Test
public void testZeroIntArraySeed()

Currently the XorShift1024Star generator would fail this. This would also be failed by all the new XoShiRo generators. I think this should be fixed and the test reinstated. Sooner or later someone will try RandomSource.create(RandomSource.XYZ, new long[50]) and get a bad generator. I am not sure how to fix it yet[1] but I would like to try.

The third list does not have any seeds. This arguably could be generated automatically using the RandomSource enum for all generators that do not have multiple constructor arguments (i.e. all but TWO_CMRES_SELECT).

The other lists are built using JMH annotations from strings. These cannot be auto-generated.

This makes only 2 key lists to maintain: one in core and one in simple. These should be updated when a new RNG is added just to get tests working. The order does not matter. However I think the test in simple should add an assertion that the list of providers is the expected length given the RandomSource enum. Currently this would be in the static initialisation for the ProvidersList:

if (LIST.size() != RandomSource.values().length) {
    throw new RuntimeException("Providers list is not the expected size");
}

This would make PRs for a new generator fail if the functionality in simple has not be added to the tests but was added to RandomSource.

The third list could be rearranged to match the RandomSource enum and so would be autogenerated. That would mean rearranging the results files from Dieharder and BigCrush. I’m happy to do this with some scripted git mv commands.

The JMH lists I suppose would be updated when the benchmarks need to be redone for new generators. This is manual labour.

Summary of my proposals:

- Keep the order in the user guide as is but move MT next to MT_64
- Auto generate the GeneratorsList from the RandomSource enum
- Update all the file names and the user guide to point to the renamed files for the new order
- New generators can go anywhere in the user guide table if related to others, or at the end
- Assert that the ProviderList in [simple] matches RandomSource in size
- Add test that a null seed passed to RandomSource.create can build a generator that passes the test for randomness
- Update the ProviderBuilder to handle zero filled seeds for generators that cannot handle them
- Update the ProviderBuilder to handle an empty array as if it were a null seed

[1] The new XoShiRo generators break when the entire seed is zero. However depending on the first sequence iteration they can handle a lot of zeros. For one generator I seem to recall that only the final long of the seed needs to have a single bit set in the least significant X bits (I cannot recall X). If bits are set above the least significant X then even though the seed is not zero the generator breaks (the first step shifted all the bits using << (64-X)). I tried to build a method to test for bad seeds but then it was decided that broken generators were OK as they match the reference implementation. However we should do better for the [simple] module and try to make it work, at least as far as possible.

> 
> Regards,
> Gilles
> 
>> 
>> 
>> The order in PR #30 for the XorShift1024StarPhi generator adds to the
>> RandomSource enum directly after XOR_SHIFT_1024_S.
>> 
>> The order in PR #20 for the new XorShiRo generators adds to the
>> RandomSource enum at the end in alphabetical order for the IntProviders
>> then again for the new LongProviders.
>> 
>> These can be fixed depending on the order decision when the PRs are merged.
>> 
>> 
>> Alex
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


Re: [Rng] User Guide Quality section

Posted by Gilles Sadowski <gi...@gmail.com>.
Hello.

Le jeu. 14 mars 2019 à 12:58, Alex Herbert <al...@gmail.com> a écrit :
>
> I've just added a Jira task to add the new XorShiRo generaters to the
> user guide.
>
> I note that the order of the RNGs in the *Quality* section does not
> match the order of the RandomSource enum, nor is it alphabetical. The
> list order is defined in
> org.apache.commons.rng.examples.stress.GeneratorsList. Nearly all the
> IntProviders are first with the exception of MWC_256 and KISS.
>
> So:
>
> - Should the new XoShiRo generators be added to the end?

+1
[To minimize source code changes.]

> - Should all the *Quality* results be reordered to match the
> RandomSource enum?

+0
[Unless some other orders make sense: from fast to slow, from good
to bad, ...]

> - Should results be alphabetical?

-0
[The names are not especially telling...]

> - Should the MWC_256 and KISS be moved so the generators are listed for
> IntProvider then LongProvider?

?

> Looking at the history for RandomSource enum and the GeneratorsList it
> seems the order has always been the same in both, although not between
> them. If new generators are added to the end of the RandomSource enum
> then perhaps all the other lists used in the test suites and
> benchmarking suites should match the same order.

The order in "RandomSource" is immaterial to the te user (i.e. an application
should certainly not rely on that order).

> I would vote for requiring that new generators are added to the end of
> the RandomSource enum.

+1

> All other lists through the code should match
> this order.

What when the list split in two ("Int" vs "Long"), then combined for running
a test/benchmark?  Results are not going to be sorted.  At first sight, I don't
think it's worth making this sort of requirements (unless everything can be
automated to provide the sorted output for free).

Regards,
Gilles

>
>
> The order in PR #30 for the XorShift1024StarPhi generator adds to the
> RandomSource enum directly after XOR_SHIFT_1024_S.
>
> The order in PR #20 for the new XorShiRo generators adds to the
> RandomSource enum at the end in alphabetical order for the IntProviders
> then again for the new LongProviders.
>
> These can be fixed depending on the order decision when the PRs are merged.
>
>
> Alex
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org