You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Gilles <gi...@harfang.homelinux.org> on 2015/12/29 04:08:29 UTC

[Math] About the refactoring of RNGs (Was: [01/18] [math] MATH-1307)

On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
> The significant refactoring to eliminate the (standard) next(int)
> included in these changes has the possibility of introducing subtle
> bugs or performance issues.  Please run some tests to verify that
> the same sequences are generated by the 3_X code

IIUC our unit tests of the RNGs, this is covered.

> and the refactored
> code and benchmarks to show there is no loss in performance.

Given that there are exactly two operations _less_ (a subtraction
and a shift), it would be surprising.

> It
> would also be good to have some additional review of this code by
> PRNG experts.

The "nextInt()" code is exactly the same as the "next(int)" modulo
the little change above (in the last line of the "nextInt/next" code).

That change in "nextInt/next" implied similarly tiny recodings in
the generic methods "nextDouble()", "nextBoolean()", ... which, apart
from that, were copied from "BitsStreamGenerator".

[However tiny a change, I had made a mistake... and dozens of tests
started to fail. Found the typo and all was quiet again...]

About "next(int)" being standard, it would be interesting to know
what that means.
As I indicated quite clearly in one of my first posts about this
refactoring
1. all the CM implementations generate random bits in batches
    of 32 bits, and
2. before returning, the "next(int bits)" method was truncating
    the generated "int":
      return x >>> (32 - bits);

In all implementations, that was the only place where the "bits"
parameter was used, from which I concluded that the randomness
provider does not care if the request was to create less than 32
random bits.
Taking "nextBoolean()" for example, it looks like a waste of 31
bits (or am I missing something?).

Of course, if some implementation were able to store the bits not
requested by the last call to "next(int)", then I'd understand that
we must really provide access to a "next(int)" method.

Perhaps that the overhead of such bookkeeping is why the practical
algorithms chose to store integers rather than bits (?).

As you dismissed my request about CM being able to care for a RNG
implementation based on a "long", I don't quite understand the
caring for a "next(int)" that serves no more purpose (as of current
CM).


Gilles


> Phil
>
> On 12/28/15 10:23 AM, erans@apache.org wrote:
>> Repository: commons-math
>> Updated Branches:
>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>
>>
>> MATH-1307
>>
>> New base class for RNG implementations.
>> The source of randomness is provided through the "nextInt()" method 
>> (to be defined in subclasses).
>>
>>
>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] About the refactoring of RNGs

Posted by Luc Maisonobe <lu...@spaceroots.org>.
hi all,

Le 29/12/2015 18:32, Phil Steitz a écrit :
> 
> 
>> On Dec 29, 2015, at 8:41 AM, Gilles <gi...@harfang.homelinux.org>
>> wrote:
>> 
>>> On Mon, 28 Dec 2015 20:33:24 -0700, Phil Steitz wrote:
>>>> On 12/28/15 8:08 PM, Gilles wrote:
>>>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote: The
>>>>> significant refactoring to eliminate the (standard)
>>>>> next(int) included in these changes has the possibility of
>>>>> introducing subtle bugs or performance issues.  Please run
>>>>> some tests to verify that the same sequences are generated by
>>>>> the 3_X code
>>>> 
>>>> IIUC our unit tests of the RNGs, this is covered.
>>> 
>>> No.  Not sufficient.  What you have done is changed the internal 
>>> implementation of all of the Bitstream generators.  I am not 
>>> convinced that you have not broken anything.  I will have to do
>>> the testing myself.  I see no point in fiddling with the
>>> internals of this code that has had a lot of eyeballs and testing
>>> on it.  I was not personally looking forward to researching the
>>> algorithms to make sure any invariants may be broken by these
>>> changes; but I am now going to have to do this.  I have to ask
>>> why.  Please at some point read [1], especially the sections on
>>> "Avoid Flexibility Syndrom" and "Value Laziness as a Virtue."
>>> Gratuitous refactoring drains community energy.
>> 
>> It seems that again you don't read what I write.[1] Hence the above
>> paragraph is spreading F -> "I am not convinced that you have not
>> broken anything." U -> "I will have to do the testing myself." D ->
>> "I see no point in fiddling [...]" .
>> 
>> Or maybe I have to rant about email communication. Please reread
>> the thread to fully appreciate that you could have shared your
>> doubts among the opinions which you gave about some of my
>> questions. When the reply answers to only some of the direct
>> questions, the OP is legitimately tempted to assume that the
>> non-comment is akin to "I don't care" (as in "left to the judgment
>> of the one who does the job").
>> 
>> As is often the case, you can dislike my ideas of improvements
>> (ideas that come on the basis of the information provided by what
>> the code does) but I don't appreciate the use of the word
>> "gratuitous", given that I clearly stated the purpose of making
>> explicit what the code actually does (that is, *generating* 32
>> random bits and not the number of bits passed as a parameter to
>> "next(int)"). So, the code is now self-documenting; it is a small
>> change, to be sure, but hardly gratuitous.
> 
> I did not respond to the original question about eliminating
> next(int) because I was not (still am not) expert in the internals of
> how the generators work and how the generated ints-of-bits are
> transformed to make doubles, gaussians, etc.  I was hoping someone
> who was expert would chime in and say either "don't do that" or
> "that's ok".

I intend to look at these changes, but concentrated on the last
things I wanted to be included in 3.6 (another mail will come
about this later this evening).

I am not an expert either on PRNG, but did work on some of the
implementations we have, so maybe I will be able to give an
opinion.

One thing we coud do is set up a branch for such an experimentation.
Branches in Git are simple. The work could then be merged back
to master once it has stabilized. It would also be easy for any
developer to switch back and forth between this feature branch
and master to get convinced there are no hidden problems.

> That did not happen before you committed the changes.
> That obligates us to review them.  The tests cover only nextInt so
> we need to convince ourselves that the transformations (which have
> all changed) are still correct.  That is in my mind just extra work
> generated for the community.  I would rather see our cycles go toward
> getting 3.6 out.

So let the dust settle down for a few days, I would really like
to finalize 3.6 too.

best regards,
Luc

> 
> I stand by my assertion that the rebasing to eliminate next(bits)
> adds nothing. If you can demonstrate via benchmarks this it is
> faster, I will retract that statement.
> 
> I know we disagree fundamentally on the priorities of [math].  To me,
> stability, correctness and ease of use (including ease of upgrades)
> are paramount.  That means you don't make implementation changes to
> hardened code without a good reason and you limit incompatible change
> to what is necessary to deliver practical benefit to users in new
> features or bug fixes.  I was +1 to dropping AbstractRandomGenerator
> and just leaving the bitstream generators in place.  That is a
> simplifying change.
>> 
>> I actually did go some way towards "Avoiding [a] Flexibility
>> Syndrom" that *was* present in a seeming flexibility of
>> "next(int)". This method was indeed letting users assume that it is
>> able to generate _less than 32 bits_ whereas the randomness
>> generators implemented in CM *always* generate exactly 32
>> (hopefully random) bits.
>> 
>> I stand guilty on the last count as I do indeed not always "value 
>> laziness as a virtue": I indeed attribute more value to design (and
>> code) aesthetics than to laziness. IMO, ugly code is often an early
>> hint that the design is broken, even if the functionality may not
>> be (yet?). [But note again that this change was not "just because
>> of aesthetic reasons"; in the wording of your reference, I think
>> that "just because" is important.]
>> 
>> In a real community,[2] you'd value that some people are willing
>> to tackle different tasks than you do. Rather than stifling any
>> change on the sole ground that it is a change, it would be less
>> "draining" on the community if reviewers would only voice concrete
>> concerns about the resulting code, and not just assume that the
>> coder's motivation is pointless.[3]
>> 
>> I understand that something can more likely become broken when it
>> is being touched rather than when left alone.  Of course, I do! But
>> with the help of an extensive and sensitive test suite, I felt I 
>> could give this small refactoring a try, being fairly confident
>> that mistakes would not go unnoticed. Your doubting that the test
>> suite could let this happen should question our assuming that it
>> could assess the correct behaviour of the previous code.
>> Alternately, all of the numerous tests passing should mean that
>> the new code is not buggier; and visual inspection can assess that
>> it cannot be slower.[4]
>> 
>> My last thought about how standard the method "next(int)" is, I let
>> it be conveyed by what is not mentioned in the following
>> unquestionably standard source: 
>> http://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html
>>
>>
>> [I could have made additional comments on how various suggestions in
>> http://www.apachecon.com/eu2007/materials/ac2006.2.pdf are either
>> not applied in the CM project, or not taken "with a grain of salt"
>> whenever it suits you.  But this post has already drained me far 
>> too much.]
>> 
>> Gilles
>> 
>> [1] I can make mistakes (and I did, as told previously) in
>> "fiddling" with the code, but that can be spotted relatively easily
>> by inspecting three one-line changes commits and their few lines
>> consequences on the generic methods where "nextInt()" replaced
>> "next(int)", mutatis mutandis, in a single and quite small class. 
>> That is a far cry from "researching the algorithms". [2] To be
>> contrasted with the "common good" for which your stand against 
>> changes may be the right one for many (conservative) CM users. [3]
>> I can assure you that it is quite draining to have to defend any
>> and every change, especially when they are obviously towards
>> greater "standardness", even when they would occur in a new major
>> version, against an opposition that is only based on a conservatism
>> argument: When the code can be made more standard or more elegant
>> or more flexible or more state-of-the-art, it is deemed not
>> necessary because "we've always done it that way". This is a
>> general remark about other discussions. Here I totally agree that I
>> favoured non-standard Java in trashing "next(int bits)". In other 
>> languages, that signature is _not_ "standard". [4] In order to
>> defend a small and technically innocuous modification, I have to
>> myself do some "researching". Mind you, it's "just" reading from
>> web pages edited by people who AFAICT seem to know what they are
>> talking about (but I may be missing something, again). Among other
>> things which I've shared in this post, it appears that a benchmark
>> including the WELL (1204a) RNG indicates that it is 5 to 10 times 
>> slower than alternative RNGs. In this light, if you are so worried
>> about performance issues induced by a functionally no-op change,
>> then you probably should not use CM for RNG.
>> 
>>>>> and the refactored code and benchmarks to show there is no
>>>>> loss in performance.
>>>> 
>>>> Given that there are exactly two operations _less_ (a
>>>> subtraction and a shift), it would be surprising.
>>>> 
>>>>> It would also be good to have some additional review of this
>>>>> code by PRNG experts.
>>>> 
>>>> The "nextInt()" code is exactly the same as the "next(int)"
>>>> modulo the little change above (in the last line of the
>>>> "nextInt/next" code).
>>>> 
>>>> That change in "nextInt/next" implied similarly tiny recodings
>>>> in the generic methods "nextDouble()", "nextBoolean()", ...
>>>> which, apart from that, were copied from
>>>> "BitsStreamGenerator".
>>>> 
>>>> [However tiny a change, I had made a mistake... and dozens of
>>>> tests started to fail. Found the typo and all was quiet
>>>> again...]
>>>> 
>>>> About "next(int)" being standard, it would be interesting to
>>>> know what that means.
>>> 
>>> Have a look at the source code for the JDK generators, for
>>> example.
>>>> As I indicated quite clearly in one of my first posts about
>>>> this refactoring 1. all the CM implementations generate random
>>>> bits in batches of 32 bits, and 2. before returning, the
>>>> "next(int bits)" method was truncating the generated "int": 
>>>> return x >>> (32 - bits);
>>>> 
>>>> In all implementations, that was the only place where the
>>>> "bits" parameter was used, from which I concluded that the
>>>> randomness provider does not care if the request was to create
>>>> less than 32 random bits. Taking "nextBoolean()" for example,
>>>> it looks like a waste of 31 bits (or am I missing something?).
>>> 
>>> Quite possibly, yes, you are missing something.
>>>> 
>>>> Of course, if some implementation were able to store the bits
>>>> not requested by the last call to "next(int)", then I'd
>>>> understand that we must really provide access to a "next(int)"
>>>> method.
>>>> 
>>>> Perhaps that the overhead of such bookkeeping is why the
>>>> practical algorithms chose to store integers rather than bits
>>>> (?).
>>>> 
>>>> As you dismissed my request about CM being able to care for a
>>>> RNG implementation based on a "long", I don't quite understand
>>>> the caring for a "next(int)" that serves no more purpose (as of
>>>> current CM).
>>> This change is
>>>> 
>>>> Gilles
>>>> 
>>>> 
>>>>> Phil
>>>>> 
>>>>>> On 12/28/15 10:23 AM, erans@apache.org wrote: Repository:
>>>>>> commons-math Updated Branches: refs/heads/master 7b62d0155
>>>>>> -> 81585a3c4
>>>>>> 
>>>>>> 
>>>>>> MATH-1307
>>>>>> 
>>>>>> New base class for RNG implementations. The source of
>>>>>> randomness is provided through the "nextInt()" method (to
>>>>>> be defined in subclasses).
>>>>>> 
>>>>>> 
>>>>>> [...]
>>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
>> 
>> 
>> ---------------------------------------------------------------------
>>
>> 
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>> 
> 
> ---------------------------------------------------------------------
>
> 
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


[All] Don't touch that code! (Was: [Math] About the refactoring of RNGs)

Posted by Gilles <gi...@harfang.homelinux.org>.
Hi.

[Preamble: In the following, I do not ask for opinions about the
design of Commons Math; I report an example of a dysfunctional
"community".]

A few days ago, rather than "shut down for holidays", I took upon me to
respond to the request of a newcomer to the CM discussions (MATH-1300).
I'm certainly not anywhere close to being an expert in PRNG.
But it looked easy enough to make that change.

While browsing through the related code in "o.a.c.m.random", I happened 
to
stumble upon one line of code that looked strange (because its sole 
purpose
seemed to clear bits that, anyways, would not be used by the caller).

Then there were the usual suspects when doing such an exercise in CM:
duplicate code ("BitsStreamRandomGenerator" vs 
"AbstractRandomGenerator"),
and unsafe code, that prompted me to raise questions on this ML, and to
open issues MATH-1307, MATH-1308 and MATH-1309 (itself related to the 
old
MATH-734).

My argumentation was crystal-clear.
No technical objection was raised, and I committed the proposed changes 
(as
they were shown in an attachment to the corresponding JIRA page).

The commit elicited a flood of negative comments that made me feel I 
must
have made some horrendous, and obviously stupid, mistake.

Around the same time I was making those little changes (creating 1 new 
file,
deleting 2, and less than 20 lines changed in total) to the development
branch ("master") containing code not supposed to be 
backwards-compatible
and not foreseen to be released any time soon, Luc was adding or 
modifying
dozens of files (largely more than 1000 lines to supposedly review) to 
the
MATH_3_X branch, thus to be included in the 3.6 version of CM to be
imminently released.

There is thus a double-standard reviewing process.[1]

As it turned out, nobody has, at this time, something technically sound 
to
argue against the change which I had made.

Instead, there was a barrage of protests that amounted to creating FUD:

FUD 1: "Quite possibly, yes, you are missing something."
Truth: Strong, but baseless, affirmation that I'm wrong in even 
thinking
        that some code statement could be useless.

FUD 2: "significant refactoring"
Truth: Less than 20 lines were modified (and for many, it's the same 
type
        of change).
        [Moreover, the behaviour and usage of the concrete classes are
        unchanged.]

FUD 3: "[...] rebasing to eliminate next(bits) adds nothing."
Truth: New code exposes what the underlying algorithms actually do 
while
        the previous code indirectly advertised non-existing 
functionality.
        [The change actually proves that the argument of "next(int)" is
        useless (i.e. the implementations always generate more bits than 
the
        argument would have one to believe).]

FUD 4: "[refactoring] to eliminate the (standard) next(int)"
Truth: Unchecked and ambiguous use of the word "standard".
        As far I could see by browsing some source codes in C, 
"next(void)"
        could be the standard in RNG codes, not "next(int bits)".
        The absence of "next(int)" in the newer JDK class 
"SplittableRandom"
        confirms the impression that "next(int)" could have been, 
indeed,
        useless for all practical purposes.

FUD 5: Potential breakage
Truth: 544 unit tests show no sign of regression for any of the RNG
        implementations.

FUD 6: "The tests cover only nextInt"
Truth: The "RandomGeneratorAbstractTest" class contains unit tests for 
*all*
        the generic methods.  In particular, the tests ensure that the
        distribution of the generated "int", "long", "float" and 
"double" is
        uniform.

FUD 7: "[...] implementation changes to hardened code"
Truth: Undefined use of the word "hardened": A specific unit test 
verifies
        that some hard-coded sequences are reproduced by the code. It 
passes
        with the new code, as it did with the old.
        [Other than that, the code comes with no warranty (applies to 
*both*
        the current version and the previous one).]

FUD 8: "[...] invariants may be broken by these changes"
Truth: Visual inspection of 1 line of code per generic method and 1 
line of
        code per RNG can prove whether there is a mistake or not.

FUD 9: "[...] possibility of introducing [...] performance issues."
Truth: Visual inspection of 1 line of code per generic method and 1 
line of
        code per RNG shows a reduction of the number of basic operations 
(one
        shift and one subtraction in each concrete RNG implementation).
        Moreover, IIUC the following results of a micro-benchmark 
comparison,
        no significant change in performance can be detected.
        [Although the change is for the better in almost all cases.]

+----------+
| MATH_3_X |
+----------+
nextInt() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 1.07711988e-05 3.96937236e-06 2.1542e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.10205274e-05 2.22826722e-06 2.2041e+03 1.0231e+00  
4.98657110e+01
           Well512a 1.33509150e-05 1.81097101e-06 2.6702e+03 1.2395e+00  
5.15943241e+02
          Well1024a 1.30304620e-05 9.68410328e-07 2.6061e+03 1.2098e+00  
4.51852641e+02
         Well19937a 1.56130116e-05 2.14122318e-06 3.1226e+03 1.4495e+00  
9.68362551e+02
         Well19937c 1.47900777e-05 1.21972362e-06 2.9580e+03 1.3731e+00  
8.03775773e+02
         Well44497a 1.73521386e-05 8.06996030e-07 3.4704e+03 1.6110e+00  
1.31618795e+03
         Well44497b 1.76055476e-05 5.14973607e-07 3.5211e+03 1.6345e+00  
1.36686975e+03
              ISAAC 1.20243626e-05 7.88169183e-07 2.4049e+03 1.1163e+00  
2.50632757e+02

nextDouble() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 2.21792835e-05 2.41114910e-06 4.4359e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.97174235e-05 2.03446115e-06 3.9435e+03 8.8900e-01 
-4.92371995e+02
           Well512a 2.12618083e-05 1.88123228e-06 4.2524e+03 9.5863e-01 
-1.83495042e+02
          Well1024a 2.41596883e-05 2.57029116e-06 4.8319e+03 1.0893e+00  
3.96080966e+02
         Well19937a 2.66921255e-05 1.49574930e-06 5.3384e+03 1.2035e+00  
9.02568405e+02
         Well19937c 2.92679702e-05 1.33918527e-06 5.8536e+03 1.3196e+00  
1.41773734e+03
         Well44497a 3.36010178e-05 2.73487416e-06 6.7202e+03 1.5150e+00  
2.28434687e+03
         Well44497b 3.59767542e-05 1.08315236e-06 7.1954e+03 1.6221e+00  
2.75949415e+03
              ISAAC 2.21894735e-05 1.53691118e-06 4.4379e+03 1.0005e+00  
2.03801200e+00

nextLong() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 2.34176352e-05 2.20143373e-06 4.6835e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.94922075e-05 1.69613941e-06 3.8984e+03 8.3237e-01 
-7.85085547e+02
           Well512a 2.49888171e-05 1.35979076e-06 4.9978e+03 1.0671e+00  
3.14236383e+02
          Well1024a 2.33254629e-05 9.16091045e-07 4.6651e+03 9.9606e-01 
-1.84344700e+01
         Well19937a 2.61210825e-05 7.35362338e-07 5.2242e+03 1.1154e+00  
5.40689466e+02
         Well19937c 2.77540845e-05 2.52800203e-06 5.5508e+03 1.1852e+00  
8.67289861e+02
         Well44497a 3.29713155e-05 2.55438277e-06 6.5943e+03 1.4080e+00  
1.91073605e+03
         Well44497b 3.42167709e-05 1.27562024e-06 6.8434e+03 1.4612e+00  
2.15982714e+03
              ISAAC 2.15235364e-05 1.57741593e-06 4.3047e+03 9.1912e-01 
-3.78819767e+02

+--------+
| master |
+--------+
nextInt() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 1.34103050e-05 2.45357474e-06 2.6821e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.14748419e-05 9.90008146e-07 2.2950e+03 8.5567e-01 
-3.87092621e+02
           Well512a 1.34941515e-05 1.69129854e-06 2.6988e+03 1.0063e+00  
1.67693020e+01
          Well1024a 1.50723034e-05 8.03930622e-07 3.0145e+03 1.1239e+00  
3.32399667e+02
         Well19937a 1.38263157e-05 5.69889705e-06 2.7653e+03 1.0310e+00  
8.32021280e+01
         Well19937c 1.59393823e-05 2.35562440e-06 3.1879e+03 1.1886e+00  
5.05815457e+02
         Well44497a 1.93589129e-05 2.64356495e-06 3.8718e+03 1.4436e+00  
1.18972157e+03
         Well44497b 2.03862615e-05 2.14390340e-06 4.0773e+03 1.5202e+00  
1.39519129e+03
              ISAAC 1.32823857e-05 2.02074578e-06 2.6565e+03 9.9046e-01 
-2.55838710e+01

nextDouble() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 2.32352997e-05 1.76107593e-06 4.6471e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.95680107e-05 5.15377258e-07 3.9136e+03 8.4217e-01 
-7.33457809e+02
           Well512a 2.40500380e-05 2.50013851e-06 4.8100e+03 1.0351e+00  
1.62947648e+02
          Well1024a 2.48173367e-05 2.93429135e-06 4.9635e+03 1.0681e+00  
3.16407385e+02
         Well19937a 2.75334969e-05 2.45923272e-06 5.5067e+03 1.1850e+00  
8.59639442e+02
         Well19937c 2.84004673e-05 1.95285979e-06 5.6801e+03 1.2223e+00  
1.03303351e+03
         Well44497a 3.55292049e-05 2.88860968e-06 7.1058e+03 1.5291e+00  
2.45878103e+03
         Well44497b 3.69400758e-05 1.55845565e-06 7.3880e+03 1.5898e+00  
2.74095521e+03
              ISAAC 2.21769477e-05 2.20944591e-06 4.4354e+03 9.5445e-01 
-2.11670411e+02

nextLong() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 2.24688238e-05 1.33151962e-06 4.4938e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.81175642e-05 1.63071216e-06 3.6235e+03 8.0634e-01 
-8.70251910e+02
           Well512a 2.27232582e-05 1.06753850e-06 4.5447e+03 1.0113e+00  
5.08868800e+01
          Well1024a 2.27862427e-05 1.59353328e-06 4.5572e+03 1.0141e+00  
6.34837960e+01
         Well19937a 2.50605559e-05 1.00839808e-06 5.0121e+03 1.1153e+00  
5.18346431e+02
         Well19937c 2.68680821e-05 1.11341253e-06 5.3736e+03 1.1958e+00  
8.79851660e+02
         Well44497a 3.29918582e-05 7.18500728e-07 6.5984e+03 1.4683e+00  
2.10460688e+03
         Well44497b 3.47592845e-05 7.83038103e-07 6.9519e+03 1.5470e+00  
2.45809215e+03
              ISAAC 1.99627637e-05 1.40470250e-06 3.9926e+03 8.8847e-01 
-5.01212018e+02


In light of the above, I feel I can call foul on a behaviour that is 
fully focused
on stymieing (even mildly) "revolutionary"[2] endeavours.
A supportive community would have engaged in a constructive discussion 
(if it
felt that bad side-effects could ensue[3]), rather than being 
affirmative that some
parts of the library are out-of-bounds (for me).


Regards,
Gilles

[1] I'd rather not enter in the "meaning of meritocracy" debate; I'm 
not implying that
     I have the same skills as Luc and that I must benefit of the same 
trust.  Here we
     are considering a specific situation where on the one hand massive 
amounts of code
     are being accepted without a single review, while on the other hand 
an easily
     reviewable code (reading 20 lines and/or "mvn test") *must* obtain 
clearance from
     a PRNG expert (who is knowingly absent from these neighbourhoods).
[2] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf (page 186).
[3] How about simply file a JIRA report (like: "Make sure no corners 
were cut in commit
     xxxx") rather than starting a FUD campaign on this list?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] About the refactoring of RNGs

Posted by Phil Steitz <ph...@gmail.com>.

> On Dec 29, 2015, at 8:41 AM, Gilles <gi...@harfang.homelinux.org> wrote:
> 
>> On Mon, 28 Dec 2015 20:33:24 -0700, Phil Steitz wrote:
>>> On 12/28/15 8:08 PM, Gilles wrote:
>>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>>>> The significant refactoring to eliminate the (standard) next(int)
>>>> included in these changes has the possibility of introducing subtle
>>>> bugs or performance issues.  Please run some tests to verify that
>>>> the same sequences are generated by the 3_X code
>>> 
>>> IIUC our unit tests of the RNGs, this is covered.
>> 
>> No.  Not sufficient.  What you have done is changed the internal
>> implementation of all of the Bitstream generators.  I am not
>> convinced that you have not broken anything.  I will have to do the
>> testing myself.  I see no point in fiddling with the internals of
>> this code that has had a lot of eyeballs and testing on it.  I was
>> not personally looking forward to researching the algorithms to make
>> sure any invariants may be broken by these changes; but I am now
>> going to have to do this.  I have to ask why.  Please at some point
>> read [1], especially the sections on "Avoid Flexibility Syndrom" and
>> "Value Laziness as a Virtue."  Gratuitous refactoring drains
>> community energy.
> 
> It seems that again you don't read what I write.[1]
> Hence the above paragraph is spreading
> F -> "I am not convinced that you have not broken anything."
> U -> "I will have to do the testing myself."
> D -> "I see no point in fiddling [...]"
> .
> 
> Or maybe I have to rant about email communication.
> Please reread the thread to fully appreciate that you could have shared
> your doubts among the opinions which you gave about some of my questions.
> When the reply answers to only some of the direct questions, the OP is
> legitimately tempted to assume that the non-comment is akin to "I don't
> care" (as in "left to the judgment of the one who does the job").
> 
> As is often the case, you can dislike my ideas of improvements (ideas
> that come on the basis of the information provided by what the code
> does) but I don't appreciate the use of the word "gratuitous", given
> that I clearly stated the purpose of making explicit what the code
> actually does (that is, *generating* 32 random bits and not the number
> of bits passed as a parameter to "next(int)").
> So, the code is now self-documenting; it is a small change, to be sure,
> but hardly gratuitous.

I did not respond to the original question about eliminating next(int) because I was not (still am not) expert in the internals of how the generators work and how the generated ints-of-bits are transformed to make doubles, gaussians, etc.  I was hoping someone who was expert would chime in and say either "don't do that" or "that's ok".  That did not happen before you committed the changes.  That obligates us to review them.  The tests cover only nextInt so  we need to convince ourselves that the transformations (which have all changed) are still correct.  That is in my mind just extra work generated for the community.  I would rather see our cycles go toward getting 3.6 out.

I stand by my assertion that the rebasing to eliminate next(bits) adds nothing. If you can demonstrate via benchmarks this it is faster, I will retract that statement.

I know we disagree fundamentally on the priorities of [math].  To me, stability, correctness and ease of use (including ease of upgrades) are paramount.  That means you don't make implementation changes to hardened code without a good reason and you limit incompatible change to what is necessary to deliver practical benefit to users in new features or bug fixes.  I was +1 to dropping AbstractRandomGenerator and just leaving the bitstream generators in place.  That is a simplifying change.
> 
> I actually did go some way towards "Avoiding [a] Flexibility Syndrom"
> that *was* present in a seeming flexibility of "next(int)".
> This method was indeed letting users assume that it is able to generate
> _less than 32 bits_ whereas the randomness generators implemented in CM
> *always* generate exactly 32 (hopefully random) bits.
> 
> I stand guilty on the last count as I do indeed not always "value
> laziness as a virtue": I indeed attribute more value to design (and code)
> aesthetics than to laziness.
> IMO, ugly code is often an early hint that the design is broken, even
> if the functionality may not be (yet?).
> [But note again that this change was not "just because of aesthetic
> reasons"; in the wording of your reference, I think that "just because"
> is important.]
> 
> In a real community,[2] you'd value that some people are willing to
> tackle different tasks than you do.
> Rather than stifling any change on the sole ground that it is a change,
> it would be less "draining" on the community if reviewers would only
> voice concrete concerns about the resulting code, and not just assume
> that the coder's motivation is pointless.[3]
> 
> I understand that something can more likely become broken when it is
> being touched rather than when left alone.  Of course, I do!
> But with the help of an extensive and sensitive test suite, I felt I
> could give this small refactoring a try, being fairly confident that
> mistakes would not go unnoticed.
> Your doubting that the test suite could let this happen should question
> our assuming that it could assess the correct behaviour of the previous
> code. Alternately, all of the numerous tests passing should mean that the
> new code is not buggier; and visual inspection can assess that it cannot
> be slower.[4]
> 
> My last thought about how standard the method "next(int)" is, I let it
> be conveyed by what is not mentioned in the following unquestionably
> standard source:
>  http://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html
> 
> [I could have made additional comments on how various suggestions in
>  http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
> are either not applied in the CM project, or not taken "with a grain of
> salt" whenever it suits you.  But this post has already drained me far
> too much.]
> 
> Gilles
> 
> [1] I can make mistakes (and I did, as told previously) in "fiddling" with
>    the code, but that can be spotted relatively easily by inspecting three
>    one-line changes commits and their few lines consequences on the generic
>    methods where "nextInt()" replaced "next(int)", mutatis mutandis, in a
>    single and quite small class.
>    That is a far cry from "researching the algorithms".
> [2] To be contrasted with the "common good" for which your stand against
>    changes may be the right one for many (conservative) CM users.
> [3] I can assure you that it is quite draining to have to defend any and
>    every change, especially when they are obviously towards greater
>    "standardness", even when they would occur in a new major version,
>    against an opposition that is only based on a conservatism argument:
>    When the code can be made more standard or more elegant or more flexible
>    or more state-of-the-art, it is deemed not necessary because "we've always
>    done it that way".
>    This is a general remark about other discussions. Here I totally agree
>    that I favoured non-standard Java in trashing "next(int bits)". In other
>    languages, that signature is _not_ "standard".
> [4] In order to defend a small and technically innocuous modification, I have
>    to myself do some "researching". Mind you, it's "just" reading from web
>    pages edited by people who AFAICT seem to know what they are talking about
>    (but I may be missing something, again).
>    Among other things which I've shared in this post, it appears that a
>    benchmark including the WELL (1204a) RNG indicates that it is 5 to 10 times
>    slower than alternative RNGs.
>    In this light, if you are so worried about performance issues induced by a
>    functionally no-op change, then you probably should not use CM for RNG.
> 
>>>> and the refactored
>>>> code and benchmarks to show there is no loss in performance.
>>> 
>>> Given that there are exactly two operations _less_ (a subtraction
>>> and a shift), it would be surprising.
>>> 
>>>> It
>>>> would also be good to have some additional review of this code by
>>>> PRNG experts.
>>> 
>>> The "nextInt()" code is exactly the same as the "next(int)" modulo
>>> the little change above (in the last line of the "nextInt/next"
>>> code).
>>> 
>>> That change in "nextInt/next" implied similarly tiny recodings in
>>> the generic methods "nextDouble()", "nextBoolean()", ... which, apart
>>> from that, were copied from "BitsStreamGenerator".
>>> 
>>> [However tiny a change, I had made a mistake... and dozens of tests
>>> started to fail. Found the typo and all was quiet again...]
>>> 
>>> About "next(int)" being standard, it would be interesting to know
>>> what that means.
>> 
>> Have a look at the source code for the JDK generators, for example.
>>> As I indicated quite clearly in one of my first posts about this
>>> refactoring
>>> 1. all the CM implementations generate random bits in batches
>>>   of 32 bits, and
>>> 2. before returning, the "next(int bits)" method was truncating
>>>   the generated "int":
>>>     return x >>> (32 - bits);
>>> 
>>> In all implementations, that was the only place where the "bits"
>>> parameter was used, from which I concluded that the randomness
>>> provider does not care if the request was to create less than 32
>>> random bits.
>>> Taking "nextBoolean()" for example, it looks like a waste of 31
>>> bits (or am I missing something?).
>> 
>> Quite possibly, yes, you are missing something.
>>> 
>>> Of course, if some implementation were able to store the bits not
>>> requested by the last call to "next(int)", then I'd understand that
>>> we must really provide access to a "next(int)" method.
>>> 
>>> Perhaps that the overhead of such bookkeeping is why the practical
>>> algorithms chose to store integers rather than bits (?).
>>> 
>>> As you dismissed my request about CM being able to care for a RNG
>>> implementation based on a "long", I don't quite understand the
>>> caring for a "next(int)" that serves no more purpose (as of current
>>> CM).
>> This change is
>>> 
>>> Gilles
>>> 
>>> 
>>>> Phil
>>>> 
>>>>> On 12/28/15 10:23 AM, erans@apache.org wrote:
>>>>> Repository: commons-math
>>>>> Updated Branches:
>>>>>  refs/heads/master 7b62d0155 -> 81585a3c4
>>>>> 
>>>>> 
>>>>> MATH-1307
>>>>> 
>>>>> New base class for RNG implementations.
>>>>> The source of randomness is provided through the "nextInt()"
>>>>> method (to be defined in subclasses).
>>>>> 
>>>>> 
>>>>> [...]
>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] About the refactoring of RNGs

Posted by Gilles <gi...@harfang.homelinux.org>.
On Mon, 28 Dec 2015 20:33:24 -0700, Phil Steitz wrote:
> On 12/28/15 8:08 PM, Gilles wrote:
>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>>> The significant refactoring to eliminate the (standard) next(int)
>>> included in these changes has the possibility of introducing subtle
>>> bugs or performance issues.  Please run some tests to verify that
>>> the same sequences are generated by the 3_X code
>>
>> IIUC our unit tests of the RNGs, this is covered.
>
> No.  Not sufficient.  What you have done is changed the internal
> implementation of all of the Bitstream generators.  I am not
> convinced that you have not broken anything.  I will have to do the
> testing myself.  I see no point in fiddling with the internals of
> this code that has had a lot of eyeballs and testing on it.  I was
> not personally looking forward to researching the algorithms to make
> sure any invariants may be broken by these changes; but I am now
> going to have to do this.  I have to ask why.  Please at some point
> read [1], especially the sections on "Avoid Flexibility Syndrom" and
> "Value Laziness as a Virtue."  Gratuitous refactoring drains
> community energy.

It seems that again you don't read what I write.[1]
Hence the above paragraph is spreading
  F -> "I am not convinced that you have not broken anything."
  U -> "I will have to do the testing myself."
  D -> "I see no point in fiddling [...]"
.

Or maybe I have to rant about email communication.
Please reread the thread to fully appreciate that you could have shared
your doubts among the opinions which you gave about some of my 
questions.
When the reply answers to only some of the direct questions, the OP is
legitimately tempted to assume that the non-comment is akin to "I don't
care" (as in "left to the judgment of the one who does the job").

As is often the case, you can dislike my ideas of improvements (ideas
that come on the basis of the information provided by what the code
does) but I don't appreciate the use of the word "gratuitous", given
that I clearly stated the purpose of making explicit what the code
actually does (that is, *generating* 32 random bits and not the number
of bits passed as a parameter to "next(int)").
So, the code is now self-documenting; it is a small change, to be sure,
but hardly gratuitous.

I actually did go some way towards "Avoiding [a] Flexibility Syndrom"
that *was* present in a seeming flexibility of "next(int)".
This method was indeed letting users assume that it is able to generate
_less than 32 bits_ whereas the randomness generators implemented in CM
*always* generate exactly 32 (hopefully random) bits.

I stand guilty on the last count as I do indeed not always "value
laziness as a virtue": I indeed attribute more value to design (and 
code)
aesthetics than to laziness.
IMO, ugly code is often an early hint that the design is broken, even
if the functionality may not be (yet?).
[But note again that this change was not "just because of aesthetic
reasons"; in the wording of your reference, I think that "just because"
is important.]

In a real community,[2] you'd value that some people are willing to
tackle different tasks than you do.
Rather than stifling any change on the sole ground that it is a change,
it would be less "draining" on the community if reviewers would only
voice concrete concerns about the resulting code, and not just assume
that the coder's motivation is pointless.[3]

I understand that something can more likely become broken when it is
being touched rather than when left alone.  Of course, I do!
But with the help of an extensive and sensitive test suite, I felt I
could give this small refactoring a try, being fairly confident that
mistakes would not go unnoticed.
Your doubting that the test suite could let this happen should question
our assuming that it could assess the correct behaviour of the previous
code. Alternately, all of the numerous tests passing should mean that 
the
new code is not buggier; and visual inspection can assess that it 
cannot
be slower.[4]

My last thought about how standard the method "next(int)" is, I let it
be conveyed by what is not mentioned in the following unquestionably
standard source:
   
http://docs.oracle.com/javase/8/docs/api/java/util/SplittableRandom.html

[I could have made additional comments on how various suggestions in
   http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
are either not applied in the CM project, or not taken "with a grain of
salt" whenever it suits you.  But this post has already drained me far
too much.]

Gilles

[1] I can make mistakes (and I did, as told previously) in "fiddling" 
with
     the code, but that can be spotted relatively easily by inspecting 
three
     one-line changes commits and their few lines consequences on the 
generic
     methods where "nextInt()" replaced "next(int)", mutatis mutandis, 
in a
     single and quite small class.
     That is a far cry from "researching the algorithms".
[2] To be contrasted with the "common good" for which your stand 
against
     changes may be the right one for many (conservative) CM users.
[3] I can assure you that it is quite draining to have to defend any 
and
     every change, especially when they are obviously towards greater
     "standardness", even when they would occur in a new major version,
     against an opposition that is only based on a conservatism 
argument:
     When the code can be made more standard or more elegant or more 
flexible
     or more state-of-the-art, it is deemed not necessary because "we've 
always
     done it that way".
     This is a general remark about other discussions. Here I totally 
agree
     that I favoured non-standard Java in trashing "next(int bits)". In 
other
     languages, that signature is _not_ "standard".
[4] In order to defend a small and technically innocuous modification, 
I have
     to myself do some "researching". Mind you, it's "just" reading from 
web
     pages edited by people who AFAICT seem to know what they are 
talking about
     (but I may be missing something, again).
     Among other things which I've shared in this post, it appears that 
a
     benchmark including the WELL (1204a) RNG indicates that it is 5 to 
10 times
     slower than alternative RNGs.
     In this light, if you are so worried about performance issues 
induced by a
     functionally no-op change, then you probably should not use CM for 
RNG.

>>> and the refactored
>>> code and benchmarks to show there is no loss in performance.
>>
>> Given that there are exactly two operations _less_ (a subtraction
>> and a shift), it would be surprising.
>>
>>> It
>>> would also be good to have some additional review of this code by
>>> PRNG experts.
>>
>> The "nextInt()" code is exactly the same as the "next(int)" modulo
>> the little change above (in the last line of the "nextInt/next"
>> code).
>>
>> That change in "nextInt/next" implied similarly tiny recodings in
>> the generic methods "nextDouble()", "nextBoolean()", ... which, 
>> apart
>> from that, were copied from "BitsStreamGenerator".
>>
>> [However tiny a change, I had made a mistake... and dozens of tests
>> started to fail. Found the typo and all was quiet again...]
>>
>> About "next(int)" being standard, it would be interesting to know
>> what that means.
>
> Have a look at the source code for the JDK generators, for example.
>> As I indicated quite clearly in one of my first posts about this
>> refactoring
>> 1. all the CM implementations generate random bits in batches
>>    of 32 bits, and
>> 2. before returning, the "next(int bits)" method was truncating
>>    the generated "int":
>>      return x >>> (32 - bits);
>>
>> In all implementations, that was the only place where the "bits"
>> parameter was used, from which I concluded that the randomness
>> provider does not care if the request was to create less than 32
>> random bits.
>> Taking "nextBoolean()" for example, it looks like a waste of 31
>> bits (or am I missing something?).
>
> Quite possibly, yes, you are missing something.
>>
>> Of course, if some implementation were able to store the bits not
>> requested by the last call to "next(int)", then I'd understand that
>> we must really provide access to a "next(int)" method.
>>
>> Perhaps that the overhead of such bookkeeping is why the practical
>> algorithms chose to store integers rather than bits (?).
>>
>> As you dismissed my request about CM being able to care for a RNG
>> implementation based on a "long", I don't quite understand the
>> caring for a "next(int)" that serves no more purpose (as of current
>> CM).
>>
> This change is
>>
>> Gilles
>>
>>
>>> Phil
>>>
>>> On 12/28/15 10:23 AM, erans@apache.org wrote:
>>>> Repository: commons-math
>>>> Updated Branches:
>>>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>>>
>>>>
>>>> MATH-1307
>>>>
>>>> New base class for RNG implementations.
>>>> The source of randomness is provided through the "nextInt()"
>>>> method (to be defined in subclasses).
>>>>
>>>>
>>>> [...]
>>
> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] About the refactoring of RNGs (Was: [01/18] [math] MATH-1307)

Posted by Phil Steitz <ph...@gmail.com>.
On 12/29/15 2:33 AM, Luc Maisonobe wrote:
> Hi all,
>
> Le 29/12/2015 09:21, Thomas Neidhart a écrit :
>> On 12/29/2015 04:33 AM, Phil Steitz wrote:
>>> On 12/28/15 8:08 PM, Gilles wrote:
>>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>>>>> The significant refactoring to eliminate the (standard) next(int)
>>>>> included in these changes has the possibility of introducing subtle
>>>>> bugs or performance issues.  Please run some tests to verify that
>>>>> the same sequences are generated by the 3_X code
>>>> IIUC our unit tests of the RNGs, this is covered.
>>> No.  Not sufficient.  What you have done is changed the internal
>>> implementation of all of the Bitstream generators.  I am not
>>> convinced that you have not broken anything.  I will have to do the
>>> testing myself.  I see no point in fiddling with the internals of
>>> this code that has had a lot of eyeballs and testing on it.  I was
>>> not personally looking forward to researching the algorithms to make
>>> sure any invariants may be broken by these changes; but I am now
>>> going to have to do this.  I have to ask why.  Please at some point
>>> read [1], especially the sections on "Avoid Flexibility Syndrom" and
>>> "Value Laziness as a Virtue."  Gratuitous refactoring drains
>>> community energy. 
>> +1, on top of that I think we should aim to refactor the parts that
>> really need refactoring and try to keep the number of incompatibilities
>> to the 3_X branch as minimal as possible.
>>
>> Thomas
>>
>>>>> and the refactored
>>>>> code and benchmarks to show there is no loss in performance.
>>>> Given that there are exactly two operations _less_ (a subtraction
>>>> and a shift), it would be surprising.
>>>>
>>>>> It
>>>>> would also be good to have some additional review of this code by
>>>>> PRNG experts.
>>>> The "nextInt()" code is exactly the same as the "next(int)" modulo
>>>> the little change above (in the last line of the "nextInt/next"
>>>> code).
>>>>
>>>> That change in "nextInt/next" implied similarly tiny recodings in
>>>> the generic methods "nextDouble()", "nextBoolean()", ... which, apart
>>>> from that, were copied from "BitsStreamGenerator".
>>>>
>>>> [However tiny a change, I had made a mistake... and dozens of tests
>>>> started to fail. Found the typo and all was quiet again...]
>>>>
>>>> About "next(int)" being standard, it would be interesting to know
>>>> what that means.
> In all the papers I have read concerning pseudo random number
> generation, the basic model was based on small chunks of bits,
> much of the time the size of an int because this is what computer
> manages directly (they have no provision to manage chunks of 5 or
> 11 bits for example).

That's what I thought.  Internally, is it correct to assume that the
generation is always in int-sized blocks so Gilles is correct that
the bits parameter is only used for output truncation?
>
> Deriving other primitive types from this (boolean, long, double) is
> really an add-on. I even asked an expert about the (Pierre L'Ecuyer
> if I remember well) about some explanations for converting to double
> (which is simply done by multiplying by a constant representing the
> weight of the least significant bit in order to constrain the range to
> [0; 1]). His answer was that this ensured the theoretical mathematical
> proofs that apply to uniform distribution still apply, as only this
> case (uniformity over a multi-dimensional integral grid) has been
> studied. It seems nothing has been studied about using the exponential
> features of floating point representation in relationship with
> double random number generation directly.

Makes sense.  The - excellent - tests included with the Well,
Mersenne and Isaac generators verify (as Gilles states) that
nextInt() itself is likely unaffected by the changes above.  What I
am worried about is the conversions.  Do those changes look correct
to you?  That is what needs to be carefully reviewed and tested. 
>
> Hence everybody starts from int, and the mathematicians proved us
> this method works and some properties are preserved (multi-dimensional
> independance, long period, ...) that are essential typically for
> Monte-Carlo analyses.
>
> I know nothing about random number generation for secure application
> like cryptograpgy, except that it requires completely different
> properties, often completely opposite to what is needed for
> Monte-Carlo analysis. As an example, it should be impossible to
> reproduce a secure sequence (it cannot be deterministic), whereas in
> Monte-Carlo we *want* it to be reproducible if we reuse the same seed.
>
>>> Have a look at the source code for the JDK generators, for example.
>>>> As I indicated quite clearly in one of my first posts about this
>>>> refactoring
>>>> 1. all the CM implementations generate random bits in batches
>>>>    of 32 bits, and
>>>> 2. before returning, the "next(int bits)" method was truncating
>>>>    the generated "int":
>>>>      return x >>> (32 - bits);
>>>>
>>>> In all implementations, that was the only place where the "bits"
>>>> parameter was used, from which I concluded that the randomness
>>>> provider does not care if the request was to create less than 32
>>>> random bits.
>>>> Taking "nextBoolean()" for example, it looks like a waste of 31
>>>> bits (or am I missing something?).
>>> Quite possibly, yes, you are missing something.
> I would guess it is linked to performance consideration. Pseudo
> random number generation is sometimes put under very heavy stress
> with billions of numbers generated. It should run extremelly fast,
> and the algorithms have been designed to have tremendously long periods
> (things like 2^19937 -1). With such long periods, we can waste 31
> bits each time we produce 1 bit if it saves some overhead.
>
> best regards,
> Luc
>
>>>> Of course, if some implementation were able to store the bits not
>>>> requested by the last call to "next(int)", then I'd understand that
>>>> we must really provide access to a "next(int)" method.
>>>>
>>>> Perhaps that the overhead of such bookkeeping is why the practical
>>>> algorithms chose to store integers rather than bits (?).
>>>>
>>>> As you dismissed my request about CM being able to care for a RNG
>>>> implementation based on a "long", I don't quite understand the
>>>> caring for a "next(int)" that serves no more purpose (as of current
>>>> CM).
>>>>
>>> This change is
>>>> Gilles
>>>>
>>>>
>>>>> Phil
>>>>>
>>>>> On 12/28/15 10:23 AM, erans@apache.org wrote:
>>>>>> Repository: commons-math
>>>>>> Updated Branches:
>>>>>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>>>>>
>>>>>>
>>>>>> MATH-1307
>>>>>>
>>>>>> New base class for RNG implementations.
>>>>>> The source of randomness is provided through the "nextInt()"
>>>>>> method (to be defined in subclasses).
>>>>>>
>>>>>>
>>>>>> [...]
>>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] About the refactoring of RNGs

Posted by Thomas Neidhart <th...@gmail.com>.
On 12/29/2015 05:10 PM, Gilles wrote:
> On Tue, 29 Dec 2015 10:33:15 +0100, Luc Maisonobe wrote:
>> Hi all,
>>
>> Le 29/12/2015 09:21, Thomas Neidhart a écrit :
>>> On 12/29/2015 04:33 AM, Phil Steitz wrote:
>>>> On 12/28/15 8:08 PM, Gilles wrote:
>>>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>>>>>> The significant refactoring to eliminate the (standard) next(int)
>>>>>> included in these changes has the possibility of introducing subtle
>>>>>> bugs or performance issues.  Please run some tests to verify that
>>>>>> the same sequences are generated by the 3_X code
>>>>>
>>>>> IIUC our unit tests of the RNGs, this is covered.
>>>>
>>>> No.  Not sufficient.  What you have done is changed the internal
>>>> implementation of all of the Bitstream generators.  I am not
>>>> convinced that you have not broken anything.  I will have to do the
>>>> testing myself.  I see no point in fiddling with the internals of
>>>> this code that has had a lot of eyeballs and testing on it.  I was
>>>> not personally looking forward to researching the algorithms to make
>>>> sure any invariants may be broken by these changes; but I am now
>>>> going to have to do this.  I have to ask why.  Please at some point
>>>> read [1], especially the sections on "Avoid Flexibility Syndrom" and
>>>> "Value Laziness as a Virtue."  Gratuitous refactoring drains
>>>> community energy.
>>>
>>> +1, on top of that I think we should aim to refactor the parts that
>>> really need refactoring
> 
> Though I could have liked to say as much on the parts of the library
> were my changes were much criticized because they failed to produce
> a perfect design (which the previous one wasn't either), I would have
> refrained to tell volunteers what they should do or not.

when I have seen the commit I was already thinking of writing a comment
about it but refrained until Phil was doing so. Afaict, your refactoring
moved the bit-shifting logic from one method (next(int)) to others
(nextDouble(), ...) in a slightly different way. On top of that, some
classes have been removed and added and I do not clearly see how this
has improved something, but maybe it was a change in the right direction
as Luc pointed out.

Nobody blames anybody for something, I believe, instead I have the
impression that people are focused on good technical solutions and this
sometimes means that you get criticism (and I got this as well a lot,
which can be quite frustrating).

I just got the impression lately that some people want to completely
refactor CM for the 4.0 release and I wonder if this will do any good to
the project, especially in areas where there is really no need to
further refactor anything.

>>> and try to keep the number of incompatibilities
>>> to the 3_X branch as minimal as possible.
> 
> I clearly and not surprisingly do not subscribe to that goal.
> And the recent discussions about RERO and "experimental" releases
> certainly were getting to a completely different consensus.

I am not sure what this has to do with my comment. RERO just means that
we should release more often, but you can only do this if you are not
introducing incompatibilities.

The discussion about "experimental" releases was just brain-storming
imho. There was no decision or real consensus on how to achieve this,
just some ideas that are at best very non-standard and in some cases
impractical.

I fear that Apache Commons is not the right place for doing such things.

btw. I proposed to use the scheme from guava to explicitly mark new
features with a Beta annotation to allow possible incompatible changes
later on, which I find much more practical, especially considering the
limited man-power and release cycles.

Thomas

>>>>>> and the refactored
>>>>>> code and benchmarks to show there is no loss in performance.
>>>>>
>>>>> Given that there are exactly two operations _less_ (a subtraction
>>>>> and a shift), it would be surprising.
>>>>>
>>>>>> It
>>>>>> would also be good to have some additional review of this code by
>>>>>> PRNG experts.
>>>>>
>>>>> The "nextInt()" code is exactly the same as the "next(int)" modulo
>>>>> the little change above (in the last line of the "nextInt/next"
>>>>> code).
>>>>>
>>>>> That change in "nextInt/next" implied similarly tiny recodings in
>>>>> the generic methods "nextDouble()", "nextBoolean()", ... which, apart
>>>>> from that, were copied from "BitsStreamGenerator".
>>>>>
>>>>> [However tiny a change, I had made a mistake... and dozens of tests
>>>>> started to fail. Found the typo and all was quiet again...]
>>>>>
>>>>> About "next(int)" being standard, it would be interesting to know
>>>>> what that means.
>>
>> In all the papers I have read concerning pseudo random number
>> generation, the basic model was based on small chunks of bits,
>> much of the time the size of an int because this is what computer
>> manages directly (they have no provision to manage chunks of 5 or
>> 11 bits for example).
>>
>> Deriving other primitive types from this (boolean, long, double) is
>> really an add-on. I even asked an expert about the (Pierre L'Ecuyer
>> if I remember well) about some explanations for converting to double
>> (which is simply done by multiplying by a constant representing the
>> weight of the least significant bit in order to constrain the range to
>> [0; 1]). His answer was that this ensured the theoretical mathematical
>> proofs that apply to uniform distribution still apply, as only this
>> case (uniformity over a multi-dimensional integral grid) has been
>> studied. It seems nothing has been studied about using the exponential
>> features of floating point representation in relationship with
>> double random number generation directly.
>>
>> Hence everybody starts from int,
> 
> [Or a "long", as I could observe in some other source codes.]
> 
> Hence, do you agree that my move to "nextInt()" was a sensible one?
> 
>> and the mathematicians proved us
>> this method works and some properties are preserved (multi-dimensional
>> independance, long period, ...) that are essential typically for
>> Monte-Carlo analyses.
>>
>> I know nothing about random number generation for secure application
>> like cryptograpgy, except that it requires completely different
>> properties, often completely opposite to what is needed for
>> Monte-Carlo analysis. As an example, it should be impossible to
>> reproduce a secure sequence (it cannot be deterministic), whereas in
>> Monte-Carlo we *want* it to be reproducible if we reuse the same seed.
>>
>>>>
>>>> Have a look at the source code for the JDK generators, for example.
>>>>> As I indicated quite clearly in one of my first posts about this
>>>>> refactoring
>>>>> 1. all the CM implementations generate random bits in batches
>>>>>    of 32 bits, and
>>>>> 2. before returning, the "next(int bits)" method was truncating
>>>>>    the generated "int":
>>>>>      return x >>> (32 - bits);
>>>>>
>>>>> In all implementations, that was the only place where the "bits"
>>>>> parameter was used, from which I concluded that the randomness
>>>>> provider does not care if the request was to create less than 32
>>>>> random bits.
>>>>> Taking "nextBoolean()" for example, it looks like a waste of 31
>>>>> bits (or am I missing something?).
>>>>
>>>> Quite possibly, yes, you are missing something.
>>
>> I would guess it is linked to performance consideration. Pseudo
>> random number generation is sometimes put under very heavy stress
>> with billions of numbers generated. It should run extremelly fast,
>> and the algorithms have been designed to have tremendously long periods
>> (things like 2^19937 -1). With such long periods, we can waste 31
>> bits each time we produce 1 bit if it saves some overhead.
> 
> Hence I did not misunderstand that bits are wasted (and I guess will
> always be if performance is dependent on the CPU architecture).
> 
> IIUC the CM code, discarding those bits was unnecessary since the
> discarded ones were never used (and I guess could never be, as they
> are replaced by a non-random sequence of zeroes...).
> Hence I suppose that a bug could exist if the caller assumes that the
> most significant bits of the returned "int" were all zero.
> But nothing of the like can be inferred from the documentation of the
> "standard" method:
>  
> http://docs.oracle.com/javase/7/docs/api/java/util/Random.html#next%28int%29
> 
> 
> 
> Thanks, Luc for constructive comments,
> Gilles
> 
>>
>> best regards,
>> Luc
>>
>>>>>
>>>>> Of course, if some implementation were able to store the bits not
>>>>> requested by the last call to "next(int)", then I'd understand that
>>>>> we must really provide access to a "next(int)" method.
>>>>>
>>>>> Perhaps that the overhead of such bookkeeping is why the practical
>>>>> algorithms chose to store integers rather than bits (?).
>>>>>
>>>>> As you dismissed my request about CM being able to care for a RNG
>>>>> implementation based on a "long", I don't quite understand the
>>>>> caring for a "next(int)" that serves no more purpose (as of current
>>>>> CM).
>>>>>
>>>> This change is
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>>> Phil
>>>>>>
>>>>>> On 12/28/15 10:23 AM, erans@apache.org wrote:
>>>>>>> Repository: commons-math
>>>>>>> Updated Branches:
>>>>>>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>>>>>>
>>>>>>>
>>>>>>> MATH-1307
>>>>>>>
>>>>>>> New base class for RNG implementations.
>>>>>>> The source of randomness is provided through the "nextInt()"
>>>>>>> method (to be defined in subclasses).
>>>>>>>
>>>>>>>
>>>>>>> [...]
>>>>>
>>>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] About the refactoring of RNGs

Posted by Gilles <gi...@harfang.homelinux.org>.
On Tue, 29 Dec 2015 10:33:15 +0100, Luc Maisonobe wrote:
> Hi all,
>
> Le 29/12/2015 09:21, Thomas Neidhart a écrit :
>> On 12/29/2015 04:33 AM, Phil Steitz wrote:
>>> On 12/28/15 8:08 PM, Gilles wrote:
>>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>>>>> The significant refactoring to eliminate the (standard) next(int)
>>>>> included in these changes has the possibility of introducing 
>>>>> subtle
>>>>> bugs or performance issues.  Please run some tests to verify that
>>>>> the same sequences are generated by the 3_X code
>>>>
>>>> IIUC our unit tests of the RNGs, this is covered.
>>>
>>> No.  Not sufficient.  What you have done is changed the internal
>>> implementation of all of the Bitstream generators.  I am not
>>> convinced that you have not broken anything.  I will have to do the
>>> testing myself.  I see no point in fiddling with the internals of
>>> this code that has had a lot of eyeballs and testing on it.  I was
>>> not personally looking forward to researching the algorithms to 
>>> make
>>> sure any invariants may be broken by these changes; but I am now
>>> going to have to do this.  I have to ask why.  Please at some point
>>> read [1], especially the sections on "Avoid Flexibility Syndrom" 
>>> and
>>> "Value Laziness as a Virtue."  Gratuitous refactoring drains
>>> community energy.
>>
>> +1, on top of that I think we should aim to refactor the parts that
>> really need refactoring

Though I could have liked to say as much on the parts of the library
were my changes were much criticized because they failed to produce
a perfect design (which the previous one wasn't either), I would have
refrained to tell volunteers what they should do or not.

>> and try to keep the number of incompatibilities
>> to the 3_X branch as minimal as possible.

I clearly and not surprisingly do not subscribe to that goal.
And the recent discussions about RERO and "experimental" releases
certainly were getting to a completely different consensus.

>>
>> Thomas
>>


>>>>> and the refactored
>>>>> code and benchmarks to show there is no loss in performance.
>>>>
>>>> Given that there are exactly two operations _less_ (a subtraction
>>>> and a shift), it would be surprising.
>>>>
>>>>> It
>>>>> would also be good to have some additional review of this code by
>>>>> PRNG experts.
>>>>
>>>> The "nextInt()" code is exactly the same as the "next(int)" modulo
>>>> the little change above (in the last line of the "nextInt/next"
>>>> code).
>>>>
>>>> That change in "nextInt/next" implied similarly tiny recodings in
>>>> the generic methods "nextDouble()", "nextBoolean()", ... which, 
>>>> apart
>>>> from that, were copied from "BitsStreamGenerator".
>>>>
>>>> [However tiny a change, I had made a mistake... and dozens of 
>>>> tests
>>>> started to fail. Found the typo and all was quiet again...]
>>>>
>>>> About "next(int)" being standard, it would be interesting to know
>>>> what that means.
>
> In all the papers I have read concerning pseudo random number
> generation, the basic model was based on small chunks of bits,
> much of the time the size of an int because this is what computer
> manages directly (they have no provision to manage chunks of 5 or
> 11 bits for example).
>
> Deriving other primitive types from this (boolean, long, double) is
> really an add-on. I even asked an expert about the (Pierre L'Ecuyer
> if I remember well) about some explanations for converting to double
> (which is simply done by multiplying by a constant representing the
> weight of the least significant bit in order to constrain the range 
> to
> [0; 1]). His answer was that this ensured the theoretical 
> mathematical
> proofs that apply to uniform distribution still apply, as only this
> case (uniformity over a multi-dimensional integral grid) has been
> studied. It seems nothing has been studied about using the 
> exponential
> features of floating point representation in relationship with
> double random number generation directly.
>
> Hence everybody starts from int,

[Or a "long", as I could observe in some other source codes.]

Hence, do you agree that my move to "nextInt()" was a sensible one?

> and the mathematicians proved us
> this method works and some properties are preserved 
> (multi-dimensional
> independance, long period, ...) that are essential typically for
> Monte-Carlo analyses.
>
> I know nothing about random number generation for secure application
> like cryptograpgy, except that it requires completely different
> properties, often completely opposite to what is needed for
> Monte-Carlo analysis. As an example, it should be impossible to
> reproduce a secure sequence (it cannot be deterministic), whereas in
> Monte-Carlo we *want* it to be reproducible if we reuse the same 
> seed.
>
>>>
>>> Have a look at the source code for the JDK generators, for example.
>>>> As I indicated quite clearly in one of my first posts about this
>>>> refactoring
>>>> 1. all the CM implementations generate random bits in batches
>>>>    of 32 bits, and
>>>> 2. before returning, the "next(int bits)" method was truncating
>>>>    the generated "int":
>>>>      return x >>> (32 - bits);
>>>>
>>>> In all implementations, that was the only place where the "bits"
>>>> parameter was used, from which I concluded that the randomness
>>>> provider does not care if the request was to create less than 32
>>>> random bits.
>>>> Taking "nextBoolean()" for example, it looks like a waste of 31
>>>> bits (or am I missing something?).
>>>
>>> Quite possibly, yes, you are missing something.
>
> I would guess it is linked to performance consideration. Pseudo
> random number generation is sometimes put under very heavy stress
> with billions of numbers generated. It should run extremelly fast,
> and the algorithms have been designed to have tremendously long 
> periods
> (things like 2^19937 -1). With such long periods, we can waste 31
> bits each time we produce 1 bit if it saves some overhead.

Hence I did not misunderstand that bits are wasted (and I guess will
always be if performance is dependent on the CPU architecture).

IIUC the CM code, discarding those bits was unnecessary since the
discarded ones were never used (and I guess could never be, as they
are replaced by a non-random sequence of zeroes...).
Hence I suppose that a bug could exist if the caller assumes that the
most significant bits of the returned "int" were all zero.
But nothing of the like can be inferred from the documentation of the
"standard" method:
   
http://docs.oracle.com/javase/7/docs/api/java/util/Random.html#next%28int%29


Thanks, Luc for constructive comments,
Gilles

>
> best regards,
> Luc
>
>>>>
>>>> Of course, if some implementation were able to store the bits not
>>>> requested by the last call to "next(int)", then I'd understand 
>>>> that
>>>> we must really provide access to a "next(int)" method.
>>>>
>>>> Perhaps that the overhead of such bookkeeping is why the practical
>>>> algorithms chose to store integers rather than bits (?).
>>>>
>>>> As you dismissed my request about CM being able to care for a RNG
>>>> implementation based on a "long", I don't quite understand the
>>>> caring for a "next(int)" that serves no more purpose (as of 
>>>> current
>>>> CM).
>>>>
>>> This change is
>>>>
>>>> Gilles
>>>>
>>>>
>>>>> Phil
>>>>>
>>>>> On 12/28/15 10:23 AM, erans@apache.org wrote:
>>>>>> Repository: commons-math
>>>>>> Updated Branches:
>>>>>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>>>>>
>>>>>>
>>>>>> MATH-1307
>>>>>>
>>>>>> New base class for RNG implementations.
>>>>>> The source of randomness is provided through the "nextInt()"
>>>>>> method (to be defined in subclasses).
>>>>>>
>>>>>>
>>>>>> [...]
>>>>
>>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] About the refactoring of RNGs (Was: [01/18] [math] MATH-1307)

Posted by Luc Maisonobe <lu...@spaceroots.org>.
Hi all,

Le 29/12/2015 09:21, Thomas Neidhart a écrit :
> On 12/29/2015 04:33 AM, Phil Steitz wrote:
>> On 12/28/15 8:08 PM, Gilles wrote:
>>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>>>> The significant refactoring to eliminate the (standard) next(int)
>>>> included in these changes has the possibility of introducing subtle
>>>> bugs or performance issues.  Please run some tests to verify that
>>>> the same sequences are generated by the 3_X code
>>>
>>> IIUC our unit tests of the RNGs, this is covered.
>>
>> No.  Not sufficient.  What you have done is changed the internal
>> implementation of all of the Bitstream generators.  I am not
>> convinced that you have not broken anything.  I will have to do the
>> testing myself.  I see no point in fiddling with the internals of
>> this code that has had a lot of eyeballs and testing on it.  I was
>> not personally looking forward to researching the algorithms to make
>> sure any invariants may be broken by these changes; but I am now
>> going to have to do this.  I have to ask why.  Please at some point
>> read [1], especially the sections on "Avoid Flexibility Syndrom" and
>> "Value Laziness as a Virtue."  Gratuitous refactoring drains
>> community energy. 
> 
> +1, on top of that I think we should aim to refactor the parts that
> really need refactoring and try to keep the number of incompatibilities
> to the 3_X branch as minimal as possible.
> 
> Thomas
> 
>>>> and the refactored
>>>> code and benchmarks to show there is no loss in performance.
>>>
>>> Given that there are exactly two operations _less_ (a subtraction
>>> and a shift), it would be surprising.
>>>
>>>> It
>>>> would also be good to have some additional review of this code by
>>>> PRNG experts.
>>>
>>> The "nextInt()" code is exactly the same as the "next(int)" modulo
>>> the little change above (in the last line of the "nextInt/next"
>>> code).
>>>
>>> That change in "nextInt/next" implied similarly tiny recodings in
>>> the generic methods "nextDouble()", "nextBoolean()", ... which, apart
>>> from that, were copied from "BitsStreamGenerator".
>>>
>>> [However tiny a change, I had made a mistake... and dozens of tests
>>> started to fail. Found the typo and all was quiet again...]
>>>
>>> About "next(int)" being standard, it would be interesting to know
>>> what that means.

In all the papers I have read concerning pseudo random number
generation, the basic model was based on small chunks of bits,
much of the time the size of an int because this is what computer
manages directly (they have no provision to manage chunks of 5 or
11 bits for example).

Deriving other primitive types from this (boolean, long, double) is
really an add-on. I even asked an expert about the (Pierre L'Ecuyer
if I remember well) about some explanations for converting to double
(which is simply done by multiplying by a constant representing the
weight of the least significant bit in order to constrain the range to
[0; 1]). His answer was that this ensured the theoretical mathematical
proofs that apply to uniform distribution still apply, as only this
case (uniformity over a multi-dimensional integral grid) has been
studied. It seems nothing has been studied about using the exponential
features of floating point representation in relationship with
double random number generation directly.

Hence everybody starts from int, and the mathematicians proved us
this method works and some properties are preserved (multi-dimensional
independance, long period, ...) that are essential typically for
Monte-Carlo analyses.

I know nothing about random number generation for secure application
like cryptograpgy, except that it requires completely different
properties, often completely opposite to what is needed for
Monte-Carlo analysis. As an example, it should be impossible to
reproduce a secure sequence (it cannot be deterministic), whereas in
Monte-Carlo we *want* it to be reproducible if we reuse the same seed.

>>
>> Have a look at the source code for the JDK generators, for example.
>>> As I indicated quite clearly in one of my first posts about this
>>> refactoring
>>> 1. all the CM implementations generate random bits in batches
>>>    of 32 bits, and
>>> 2. before returning, the "next(int bits)" method was truncating
>>>    the generated "int":
>>>      return x >>> (32 - bits);
>>>
>>> In all implementations, that was the only place where the "bits"
>>> parameter was used, from which I concluded that the randomness
>>> provider does not care if the request was to create less than 32
>>> random bits.
>>> Taking "nextBoolean()" for example, it looks like a waste of 31
>>> bits (or am I missing something?).
>>
>> Quite possibly, yes, you are missing something.

I would guess it is linked to performance consideration. Pseudo
random number generation is sometimes put under very heavy stress
with billions of numbers generated. It should run extremelly fast,
and the algorithms have been designed to have tremendously long periods
(things like 2^19937 -1). With such long periods, we can waste 31
bits each time we produce 1 bit if it saves some overhead.

best regards,
Luc

>>>
>>> Of course, if some implementation were able to store the bits not
>>> requested by the last call to "next(int)", then I'd understand that
>>> we must really provide access to a "next(int)" method.
>>>
>>> Perhaps that the overhead of such bookkeeping is why the practical
>>> algorithms chose to store integers rather than bits (?).
>>>
>>> As you dismissed my request about CM being able to care for a RNG
>>> implementation based on a "long", I don't quite understand the
>>> caring for a "next(int)" that serves no more purpose (as of current
>>> CM).
>>>
>> This change is
>>>
>>> Gilles
>>>
>>>
>>>> Phil
>>>>
>>>> On 12/28/15 10:23 AM, erans@apache.org wrote:
>>>>> Repository: commons-math
>>>>> Updated Branches:
>>>>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>>>>
>>>>>
>>>>> MATH-1307
>>>>>
>>>>> New base class for RNG implementations.
>>>>> The source of randomness is provided through the "nextInt()"
>>>>> method (to be defined in subclasses).
>>>>>
>>>>>
>>>>> [...]
>>>
>> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] About the refactoring of RNGs (Was: [01/18] [math] MATH-1307)

Posted by Thomas Neidhart <th...@gmail.com>.
On 12/29/2015 04:33 AM, Phil Steitz wrote:
> On 12/28/15 8:08 PM, Gilles wrote:
>> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>>> The significant refactoring to eliminate the (standard) next(int)
>>> included in these changes has the possibility of introducing subtle
>>> bugs or performance issues.  Please run some tests to verify that
>>> the same sequences are generated by the 3_X code
>>
>> IIUC our unit tests of the RNGs, this is covered.
> 
> No.  Not sufficient.  What you have done is changed the internal
> implementation of all of the Bitstream generators.  I am not
> convinced that you have not broken anything.  I will have to do the
> testing myself.  I see no point in fiddling with the internals of
> this code that has had a lot of eyeballs and testing on it.  I was
> not personally looking forward to researching the algorithms to make
> sure any invariants may be broken by these changes; but I am now
> going to have to do this.  I have to ask why.  Please at some point
> read [1], especially the sections on "Avoid Flexibility Syndrom" and
> "Value Laziness as a Virtue."  Gratuitous refactoring drains
> community energy. 

+1, on top of that I think we should aim to refactor the parts that
really need refactoring and try to keep the number of incompatibilities
to the 3_X branch as minimal as possible.

Thomas

>>> and the refactored
>>> code and benchmarks to show there is no loss in performance.
>>
>> Given that there are exactly two operations _less_ (a subtraction
>> and a shift), it would be surprising.
>>
>>> It
>>> would also be good to have some additional review of this code by
>>> PRNG experts.
>>
>> The "nextInt()" code is exactly the same as the "next(int)" modulo
>> the little change above (in the last line of the "nextInt/next"
>> code).
>>
>> That change in "nextInt/next" implied similarly tiny recodings in
>> the generic methods "nextDouble()", "nextBoolean()", ... which, apart
>> from that, were copied from "BitsStreamGenerator".
>>
>> [However tiny a change, I had made a mistake... and dozens of tests
>> started to fail. Found the typo and all was quiet again...]
>>
>> About "next(int)" being standard, it would be interesting to know
>> what that means.
> 
> Have a look at the source code for the JDK generators, for example.
>> As I indicated quite clearly in one of my first posts about this
>> refactoring
>> 1. all the CM implementations generate random bits in batches
>>    of 32 bits, and
>> 2. before returning, the "next(int bits)" method was truncating
>>    the generated "int":
>>      return x >>> (32 - bits);
>>
>> In all implementations, that was the only place where the "bits"
>> parameter was used, from which I concluded that the randomness
>> provider does not care if the request was to create less than 32
>> random bits.
>> Taking "nextBoolean()" for example, it looks like a waste of 31
>> bits (or am I missing something?).
> 
> Quite possibly, yes, you are missing something.
>>
>> Of course, if some implementation were able to store the bits not
>> requested by the last call to "next(int)", then I'd understand that
>> we must really provide access to a "next(int)" method.
>>
>> Perhaps that the overhead of such bookkeeping is why the practical
>> algorithms chose to store integers rather than bits (?).
>>
>> As you dismissed my request about CM being able to care for a RNG
>> implementation based on a "long", I don't quite understand the
>> caring for a "next(int)" that serves no more purpose (as of current
>> CM).
>>
> This change is
>>
>> Gilles
>>
>>
>>> Phil
>>>
>>> On 12/28/15 10:23 AM, erans@apache.org wrote:
>>>> Repository: commons-math
>>>> Updated Branches:
>>>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>>>
>>>>
>>>> MATH-1307
>>>>
>>>> New base class for RNG implementations.
>>>> The source of randomness is provided through the "nextInt()"
>>>> method (to be defined in subclasses).
>>>>
>>>>
>>>> [...]
>>
> [1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] About the refactoring of RNGs (Was: [01/18] [math] MATH-1307)

Posted by Phil Steitz <ph...@gmail.com>.
On 12/28/15 8:08 PM, Gilles wrote:
> On Mon, 28 Dec 2015 11:08:56 -0700, Phil Steitz wrote:
>> The significant refactoring to eliminate the (standard) next(int)
>> included in these changes has the possibility of introducing subtle
>> bugs or performance issues.  Please run some tests to verify that
>> the same sequences are generated by the 3_X code
>
> IIUC our unit tests of the RNGs, this is covered.

No.  Not sufficient.  What you have done is changed the internal
implementation of all of the Bitstream generators.  I am not
convinced that you have not broken anything.  I will have to do the
testing myself.  I see no point in fiddling with the internals of
this code that has had a lot of eyeballs and testing on it.  I was
not personally looking forward to researching the algorithms to make
sure any invariants may be broken by these changes; but I am now
going to have to do this.  I have to ask why.  Please at some point
read [1], especially the sections on "Avoid Flexibility Syndrom" and
"Value Laziness as a Virtue."  Gratuitous refactoring drains
community energy. 
>
>> and the refactored
>> code and benchmarks to show there is no loss in performance.
>
> Given that there are exactly two operations _less_ (a subtraction
> and a shift), it would be surprising.
>
>> It
>> would also be good to have some additional review of this code by
>> PRNG experts.
>
> The "nextInt()" code is exactly the same as the "next(int)" modulo
> the little change above (in the last line of the "nextInt/next"
> code).
>
> That change in "nextInt/next" implied similarly tiny recodings in
> the generic methods "nextDouble()", "nextBoolean()", ... which, apart
> from that, were copied from "BitsStreamGenerator".
>
> [However tiny a change, I had made a mistake... and dozens of tests
> started to fail. Found the typo and all was quiet again...]
>
> About "next(int)" being standard, it would be interesting to know
> what that means.

Have a look at the source code for the JDK generators, for example.
> As I indicated quite clearly in one of my first posts about this
> refactoring
> 1. all the CM implementations generate random bits in batches
>    of 32 bits, and
> 2. before returning, the "next(int bits)" method was truncating
>    the generated "int":
>      return x >>> (32 - bits);
>
> In all implementations, that was the only place where the "bits"
> parameter was used, from which I concluded that the randomness
> provider does not care if the request was to create less than 32
> random bits.
> Taking "nextBoolean()" for example, it looks like a waste of 31
> bits (or am I missing something?).

Quite possibly, yes, you are missing something.
>
> Of course, if some implementation were able to store the bits not
> requested by the last call to "next(int)", then I'd understand that
> we must really provide access to a "next(int)" method.
>
> Perhaps that the overhead of such bookkeeping is why the practical
> algorithms chose to store integers rather than bits (?).
>
> As you dismissed my request about CM being able to care for a RNG
> implementation based on a "long", I don't quite understand the
> caring for a "next(int)" that serves no more purpose (as of current
> CM).
>
This change is
>
> Gilles
>
>
>> Phil
>>
>> On 12/28/15 10:23 AM, erans@apache.org wrote:
>>> Repository: commons-math
>>> Updated Branches:
>>>   refs/heads/master 7b62d0155 -> 81585a3c4
>>>
>>>
>>> MATH-1307
>>>
>>> New base class for RNG implementations.
>>> The source of randomness is provided through the "nextInt()"
>>> method (to be defined in subclasses).
>>>
>>>
>>> [...]
>
[1] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org