You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Alex D Herbert (JIRA)" <ji...@apache.org> on 2019/02/23 20:10:00 UTC
[jira] [Comment Edited] (RNG-72) Create a RandomSource.create benchmark

    [ https://issues.apache.org/jira/browse/RNG-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16775619#comment-16775619 ] 

Alex D Herbert edited comment on RNG-72 at 2/23/19 8:09 PM:
------------------------------------------------------------

Initial benchmark to create 500 RNGs, seeds are pre-built where appropriate. Test the following construction methods:
 * Create using the native constructor with the {{new}} keyword
 * Create using a cached {{Constructor<Object>}} with newInstance
 * Create using a cached {{Class<?>}}, lookup the {{Constructor<Object>}} and use newInstance
 * Create using {{RandomSource}} with the native seed
 * Create using {{RandomSource}} with a {{null}} seed
 * Create using {{RandomSource}} with a truncated native seed forcing self seeding
 * Create using {{RandomSource}} with a {{byte[]}} of the appropriate length

The table shows the average time for the {{new}} method and then the relative time for each of the other methods. Table is sorted by the speed of the native constructor (time in microseconds):
||RandomSource||new||newInstance||lookupNewInstance||createNativeSeed||createNullSeed||createSelfSeed||createByteArray||
|SPLIT_MIX_64|3.28|1.93|6.10|6.68|25.52|0.00|9.65|
|KISS|7.26|1.55|3.22|3.54|186.63|3.64|6.08|
|WELL_512_A|9.14|1.33|2.71|3.01|146.15|5.16|12.56|
|JDK|10.14|1.14|2.64|2.84|8.95|0.00|4.20|
|WELL_1024_A|14.45|1.11|2.06|2.23|92.51|5.62|14.19|
|XOR_SHIFT_1024_S|14.91|1.09|1.97|2.20|164.19|3.09|8.20|
|WELL_19937_A|157.55|1.00|1.04|1.37|10.74|8.32|23.58|
|WELL_19937_C|158.58|1.00|1.05|1.37|10.46|8.30|22.10|
|MWC_256|159.91|0.98|1.02|1.25|9.30|3.61|9.64|
|WELL_44497_B|358.09|1.00|1.03|1.22|5.70|8.05|22.80|
|WELL_44497_A|363.88|1.01|1.01|1.18|5.62|7.94|21.50|
|ISAAC|1055.40|1.00|1.06|1.11|2.30|1.52|2.38|
|MT_64|3290.11|1.02|1.03|1.06|1.21|0.44|0.92|
|MT|5327.08|1.00|1.01|1.04|0.93|0.68|1.35|
|TWO_CMRES|73908.06|1.03|1.03|1.06|1.07|0.00|1.08|

For reference here are the native seed types and length (for arrays):
||RandomSource||Type||Length||
|SPLIT_MIX_64|Long|-|
|KISS|int[]|4|
|WELL_512_A|int[]|16|
|JDK|Long|-|
|WELL_1024_A|int[]|32|
|XOR_SHIFT_1024_S|long[]|16|
|WELL_19937_A|int[]|624|
|MWC_256|int[]|257|
|WELL_44497_A|int[]|1391|
|ISAAC|int[]|256|
|MT_64|long[]|312|
|MT|int[]|624|
|TWO_CMRES|Integer|-|

Notes:

The following generators have a self-seeding routine:
 * ISAAC
 * MT
 * MT_64
 * TWO_CMRES

This self-seeding routine is the main overhead during construction. In the case of the TWO_CMRES the self-seeding is very computationally intensive as it involves on average 2^16 calls to advance the sub-cycle generators.

Construction of the SPLIT_MIX_64 and JDK just copies the long value as the seed state.

Construction with a native seed for the other generators is basically an array copy.

 

Analysis:

Using Reflection

The use of the {{new}} keyword is optimal. Using a cached constructor has a performance improvement over using the class to find the constructor. This is a large part of the construction cost for the fastest {{RandomSource}}, e.g. SPLIT_MIX_64.

The factory method using the native seed (which uses the class to find the constructor) is only about 10% slower than the hard coded benchmark code without any seed validation. So the factory is not impacting the construction speed over just using reflection when a native seed is available.

It would be possible to move the creation using reflection to the {{RandomSourceInternal}} enum. The default implementation could be overidden with a method using the {{new}} keyword to avoid reflection where this is a performance benefit, e.g.

 
{code:java}
// Method added to RandomSourceInternal to be used when the seed has been converted to the native seed.

// Default
RestorableUniformRandomProvider createInstance(Object seed, Object[] args) {
    final List<Object> all = new ArrayList<Object>();
    all.add(nativeSeed);
    if (args != null) {
        all.addAll(Arrays.asList(args));
    }
    return create(createConstructor(this), all.toArray())
}
// Optimised for performance
RestorableUniformRandomProvider createInstance(Object seed, Object[] args) {
    return new SplitMix64((Long)seed);
}{code}
This would show the most benefit for the fast to construct RNGs at the cost of extra code in the factory (and test methods to ensure that createInstance returns a type matching the class returned by getRng()).

 

 

SPLIT_MIX_64 (single number seed)

Construction with a native seed shows the relative cost of instantiation using reflection is 6x slower. This could be improved by removing the use of ArrayList within the method that builds the {{Object[]}} to pass to the constructor. In most cases the arguments are null and an {{Object[]}} of length 1 can be created directly using just the seed. This may explain part of the 10% slowdown between the factory method and the benchmark using reflection.

If the seed must be generated then the construction is 22x slower. This is due to the use of synchronisation around a provider of seeds. 

This could be improved by changing the provider of seeds to be a faster generator. Currently it is a long period {{Well44497b}}. This could be switched to a native generator of {{long}} bits such as the XorShift1024Star which is over 4x as fast. 

Creation from a byte array is approximately 1.25x slower than using a {{Long}} showing the cost of seed conversion is low.

 

KISS (small array seed)

Construction with a native seed shows the relative cost of instantiation using reflection is 3x slower. If the seed must be generated then the construction is 164x slower. This is partly due to the factory method creating a seed array of length 128 when only 16 is used. This could be improved by changing the factory to know the required size for all {{RandomSource}} options. The seed is generated from a single instance of a RNG requiring synchronisation on each call. This can be changed to use a single {{long}} seed from the synchronised seed factory to create a SplitMix to generate the array of seed values without synchronisation. This will reduce the possible randomness of the seeds due to using a shorter period RNG.

The self-seeding time is almost as fast as using a native seed. A large part of construction from the native seed can be attributed to the construction using reflection.

When using a {{byte[]}} the speed is 5x slower, so conversion of the bytes is an overhead. Inspection of the code shows the use of array allocation within a loop to copy the bytes for each number conversion. This can be refactored to use a method to read the bytes from a position in a given byte array and convert them directly to a {{long}} avoiding array allocation overhead. This method is used in {{java.nio.ByteBuffer}} for conversions.

 

WELL_44497_A (large array seed)

Construction with a native seed shows the relative cost of instantiation using reflection is 1.05x slower. The majority of the construction time is not the lookup of the constructor call but the work inside the constructor to copy the input seed array.

Construction with a {{byte[]}} is 21x slower. This demonstrates that the conversion of bytes to numbers is a large overhead. Notes on improvement are as for the KISS generator (better byte conversion without array allocation overhead).

If the seed must be generated then the construction is 6x slower. This could be improved by using a faster generator of seed data.

 

Generating seed data

Currently the seed data is generated using a single long period {{Well44497b}}. This is used with synchronization to create entire array seeds. An alternative would be to use a single generator to produce {{long}} seeds that are then used to seed a SplitMix generator. This can be used without synchronization to generate the seed data. This will improve the construction performance at the cost of reducing the randomness of the seed data produced by the factory methods.

So the question is whether the factory method should try to maximise randomness of the seed data (as current) or minimise construction overhead.

 

Summary:

Non-destructive improvements:
 * Move the use of reflection to create the instance into the {{RandomSourceInternal}} enum. This can be overridden for performance when necessary (i.e. small seeds)
 * Improve construction of {{Object[]}} that is passed to the constructor via reflection
 * Extend the internal knowledge of the factory method to know the size of the required seed arrays
 * Improve the byte array conversion methods

Possibly breaking improvements [1]:
 * Change the long period {{Well44497b}} for a less long {{XorShift1024Star}}. Note that this still has a period of 2^1024. If billions of seeds are required per second (1,000,000,000 ~ 2^30) then at (60*60*24*365=31536000~2^25 seconds/year) it would be 2^969 years before a repeat. This generator is over 4x faster at {{long}} generation and 2.5x faster at {{int}} generation.
 * Change the array seed generation to use a SplitMix generator. This reduces synchronisation to just a single call to get the seed for the SplitMix generator. This limits the seeds to 2^64 possible values. Using the Xor method of the output with a random hash value (this is currently employed for all seeds output by the {{SeedFactory}} will increase this by 2^32 to 2^96 seed values.

[1] These are only "breaking" as they reduce the randomness of the providers that can be created by the factory.


was (Author: alexherbert):
Initial benchmark to create 500 RNGs, seeds are pre-built where appropriate. Test the following construction methods:
 * Create using the native constructor with the {{new}} keyword
 * Create using {{RandomSource}} with a {{null}} seed
 * Create using {{RandomSource}} with the native seed
 * Create using {{RandomSource}} with a truncated native seed forcing self seeding
 * Create using {{RandomSource}} with a {{byte[]}} of the appropriate length

The table shows the average time for the {{new}} method and then the relative time for each of the other methods. Table is sorted by the speed of the native constructor (time in microseconds):
||RandomSource||new||createNullSeed||createNativeSeed||createSelfSeed||createByteArray||
|SPLIT_MIX_64|3.80|21.92|6.07|-|8.04|
|KISS|8.14|164.17|3.06|3.29|5.22|
|WELL_512_A|9.11|145.66|2.84|5.08|11.93|
|JDK|10.31|8.75|2.75|-|3.78|
|WELL_1024_A|14.61|90.08|2.12|5.52|13.65|
|XOR_SHIFT_1024_S|14.81|163.92|2.15|3.03|7.82|
|WELL_19937_A|157.74|10.41|1.26|8.17|22.87|
|MWC_256|161.11|9.07|1.07|3.60|9.25|
|WELL_44497_A|359.31|5.58|1.05|8.03|21.02|
|ISAAC|1073.02|2.21|1.09|1.48|2.36|
|MT_64|3399.62|1.15|1.03|0.43|0.88|
|MT|5460.06|0.90|1.03|0.68|1.27|
|TWO_CMRES|77129.26|1.02|1.07|-|1.03|

For reference here are the native seed types and length (for arrays):
||RandomSource||Type||Length||
|SPLIT_MIX_64|Long|-|
|KISS|int[]|4|
|WELL_512_A|int[]|16|
|JDK|Long|-|
|WELL_1024_A|int[]|32|
|XOR_SHIFT_1024_S|long[]|16|
|WELL_19937_A|int[]|624|
|MWC_256|int[]|257|
|WELL_44497_A|int[]|1391|
|ISAAC|int[]|256|
|MT_64|long[]|312|
|MT|int[]|624|
|TWO_CMRES|Integer|-|

Notes:

The following generators have a self-seeding routine:
 * ISAAC
 * MT
 * MT_64
 * TWO_CMRES

This self-seeding routine is the main overhead during construction. In the case of the TWO_CMRES the self-seeding is very computationally intensive as it involves on average 2^16 calls to advance the sub-cycle generators.

Construction of the SPLIT_MIX_64 and JDK just copies the long value as the seed state.

Construction with a native seed for the other generators is basically an array copy.

 

Analysis:

SPLIT_MIX_64 (single number seed)

Construction with a native seed shows the relative cost of instantiation using reflection is 6x slower. This could be improved by removing the use of ArrayList within the method that builds the {{Object[]}} to pass to the constructor. In most cases the arguments are null and an {{Object[]}} of length 1 can be created directly using just the seed.

If the seed must be generated then the construction is 22x slower. This is due to the use of synchronisation around a provider of seeds. 

This could be improved by changing the provider of seeds to be a faster generator. Currently it is a long period {{Well44497b}}. This could be switched to a native generator of {{long}} bits such as the XorShift1024Star which is over 4x as fast. 

Creation from a byte array is approximately 1.25x slower than using a {{Long}} showing the cost of seed conversion is low.

 

KISS (small array seed)

Construction with a native seed shows the relative cost of instantiation using reflection is 3x slower. If the seed must be generated then the construction is 164x slower. This is partly due to the factory method creating a seed array of length 128 when only 16 is used. This could be improved by changing the factory to know the required size for all {{RandomSource}} options. The seed is generated from a single instance of a RNG requiring synchronisation on each call. This can be changed to use a single {{long}} seed from the synchronised seed factory to create a SplitMix to generate the array of seed values without synchronisation. This will reduce the possible randomness of the seeds due to using a shorter period RNG.

The self-seeding time is almost as fast as using a native seed. A large part of construction from the native seed can be attributed to the construction using reflection.

When using a {{byte[]}} the speed is 5x slower, so conversion of the bytes is an overhead. Inspection of the code shows the use of array allocation within a loop to copy the bytes for each number conversion. This can be refactored to use a method to read the bytes from a position in a given byte array and convert them directly to a {{long}} avoiding array allocation overhead. This method is used in {{java.nio.ByteBuffer}} for conversions.

 

WELL_44497_A (large array seed)

Construction with a native seed shows the relative cost of instantiation using reflection is 1.05x slower. The majority of the construction time is not the lookup of the constructor call but the work inside the constructor to copy the input seed array.

Construction with a {{byte[]}} is 21x slower. This demonstrates that the conversion of bytes to numbers is a large overhead. Notes on improvement are as for the KISS generator (better byte conversion without array allocation overhead).

If the seed must be generated then the construction is 6x slower. This could be improved by using a faster generator of seed data.

 

Generating seed data

Currently the seed data is generated using a single long period {{Well44497b}}. This is used with synchronization to create entire array seeds. An alternative would be to use a single generator to produce {{long}} seeds that are then used to seed a SplitMix generator. This can be used without synchronization to generate the seed data. This will improve the construction performance at the cost of reducing the randomness of the seed data produced by the factory methods.

So the question is whether the factory method should try to maximise randomness of the seed data (as current) or minimise construction overhead.

 

Summary:

Non-destructive improvements:
 * Improve construction of {{Object[]}} that is passed to the constructor via reflection
 * Extend the internal knowledge of the factory method to know the size of the required seed arrays
 * Improve the byte array conversion methods

Possibly breaking improvements [1]:
 * Change the long period {{Well44497b}} for a less long {{XorShift1024Star}}. Note that this still has a period of 2^1024. If billions of seeds are required per second (1,000,000,000 ~ 2^30) then at (60*60*24*365=31536000~2^25 seconds/year) it would be 2^969 years before a repeat. This generator is over 4x faster at {{long}} generation and 2.5x faster at {{int}} generation.
 * Change the array seed generation to use a SplitMix generator. This reduces synchronisation to just a single call to get the seed for the SplitMix generator. This limits the seeds to 2^64 possible values. Using the Xor method of the output with a random hash value (this is currently employed for all seeds output by the {{SeedFactory}} will increase this by 2^32 to 2^96 seed values.

[1] These are only "breaking" as they reduce the randomness of the providers that can be created by the factory.

> Create a RandomSource.create benchmark
> --------------------------------------
>
>                 Key: RNG-72
>                 URL: https://issues.apache.org/jira/browse/RNG-72
>             Project: Commons RNG
>          Issue Type: New Feature
>          Components: simple
>    Affects Versions: 1.3
>            Reporter: Alex D Herbert
>            Priority: Minor
>              Labels: performance-benchmark
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The recommended method to construct a {{UniformRandomProvider}} is to use, e.g.:
> {code:java}
> import org.apache.commons.rng.UniformRandomProvider;
> import org.apache.commons.rng.simple.RandomSource;
> UniformRandomProvider rng = RandomSource.create(RandomSource.MWC_256);
> {code}
> The factory method knows the type of seed required for the constructor and generates one as appropriate.
> This factory method could be made more efficient, in particular:
>  * Reducing synchronisation around the single source of random seed data
>  * Adding knowledge of the required seed size for arrays
>  * Changing internal data structures, e.g. {{Map<Class<?>, SeedConverter<?,?>>}} can be changed to {{Map<SeedType, SeedConverter<?,?>>}} using an {{EnumMap}} if a new enum {{SeedType}} was created for all the supported seeds (currently 4 types).
>  * Add a new interface to replace {{SeedConverter<?,?>.convert()}} with a {{.convert(int outputArraySize)}} method to allow conversions to generate appropriately sized arrays. The parameter can be ignored for non-array conversions but could optimise array conversions.
> This ticket is to add a JMH benchmark to compare the speed of construction of all the providers using:
>  * Their native constructor
>  * {{RandomSource}} using the native seed of the correct size (calls a constructor using reflection)
>  * {{RandomSource}} using a non native seed (requires seed conversion)
>  * {{RandomSource}} using no seed (requires seed generation)
> The report will be posted here. It could be added to the user guide for reference.
> This work is motivated by the new {{XorShiRo}} generators in version 1.3 that have a native array seed size of 2, 4, or 8. The current {{RandomSource}} create method will generate a fixed seed of length 128 for seeding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)