You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Gilles <gi...@harfang.homelinux.org> on 2016/01/05 02:45:57 UTC

[All] Don't touch that code! (Was: [Math] About the refactoring of RNGs)

Hi.

[Preamble: In the following, I do not ask for opinions about the
design of Commons Math; I report an example of a dysfunctional
"community".]

A few days ago, rather than "shut down for holidays", I took upon me to
respond to the request of a newcomer to the CM discussions (MATH-1300).
I'm certainly not anywhere close to being an expert in PRNG.
But it looked easy enough to make that change.

While browsing through the related code in "o.a.c.m.random", I happened 
to
stumble upon one line of code that looked strange (because its sole 
purpose
seemed to clear bits that, anyways, would not be used by the caller).

Then there were the usual suspects when doing such an exercise in CM:
duplicate code ("BitsStreamRandomGenerator" vs 
"AbstractRandomGenerator"),
and unsafe code, that prompted me to raise questions on this ML, and to
open issues MATH-1307, MATH-1308 and MATH-1309 (itself related to the 
old
MATH-734).

My argumentation was crystal-clear.
No technical objection was raised, and I committed the proposed changes 
(as
they were shown in an attachment to the corresponding JIRA page).

The commit elicited a flood of negative comments that made me feel I 
must
have made some horrendous, and obviously stupid, mistake.

Around the same time I was making those little changes (creating 1 new 
file,
deleting 2, and less than 20 lines changed in total) to the development
branch ("master") containing code not supposed to be 
backwards-compatible
and not foreseen to be released any time soon, Luc was adding or 
modifying
dozens of files (largely more than 1000 lines to supposedly review) to 
the
MATH_3_X branch, thus to be included in the 3.6 version of CM to be
imminently released.

There is thus a double-standard reviewing process.[1]

As it turned out, nobody has, at this time, something technically sound 
to
argue against the change which I had made.

Instead, there was a barrage of protests that amounted to creating FUD:

FUD 1: "Quite possibly, yes, you are missing something."
Truth: Strong, but baseless, affirmation that I'm wrong in even 
thinking
        that some code statement could be useless.

FUD 2: "significant refactoring"
Truth: Less than 20 lines were modified (and for many, it's the same 
type
        of change).
        [Moreover, the behaviour and usage of the concrete classes are
        unchanged.]

FUD 3: "[...] rebasing to eliminate next(bits) adds nothing."
Truth: New code exposes what the underlying algorithms actually do 
while
        the previous code indirectly advertised non-existing 
functionality.
        [The change actually proves that the argument of "next(int)" is
        useless (i.e. the implementations always generate more bits than 
the
        argument would have one to believe).]

FUD 4: "[refactoring] to eliminate the (standard) next(int)"
Truth: Unchecked and ambiguous use of the word "standard".
        As far I could see by browsing some source codes in C, 
"next(void)"
        could be the standard in RNG codes, not "next(int bits)".
        The absence of "next(int)" in the newer JDK class 
"SplittableRandom"
        confirms the impression that "next(int)" could have been, 
indeed,
        useless for all practical purposes.

FUD 5: Potential breakage
Truth: 544 unit tests show no sign of regression for any of the RNG
        implementations.

FUD 6: "The tests cover only nextInt"
Truth: The "RandomGeneratorAbstractTest" class contains unit tests for 
*all*
        the generic methods.  In particular, the tests ensure that the
        distribution of the generated "int", "long", "float" and 
"double" is
        uniform.

FUD 7: "[...] implementation changes to hardened code"
Truth: Undefined use of the word "hardened": A specific unit test 
verifies
        that some hard-coded sequences are reproduced by the code. It 
passes
        with the new code, as it did with the old.
        [Other than that, the code comes with no warranty (applies to 
*both*
        the current version and the previous one).]

FUD 8: "[...] invariants may be broken by these changes"
Truth: Visual inspection of 1 line of code per generic method and 1 
line of
        code per RNG can prove whether there is a mistake or not.

FUD 9: "[...] possibility of introducing [...] performance issues."
Truth: Visual inspection of 1 line of code per generic method and 1 
line of
        code per RNG shows a reduction of the number of basic operations 
(one
        shift and one subtraction in each concrete RNG implementation).
        Moreover, IIUC the following results of a micro-benchmark 
comparison,
        no significant change in performance can be detected.
        [Although the change is for the better in almost all cases.]

+----------+
| MATH_3_X |
+----------+
nextInt() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 1.07711988e-05 3.96937236e-06 2.1542e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.10205274e-05 2.22826722e-06 2.2041e+03 1.0231e+00  
4.98657110e+01
           Well512a 1.33509150e-05 1.81097101e-06 2.6702e+03 1.2395e+00  
5.15943241e+02
          Well1024a 1.30304620e-05 9.68410328e-07 2.6061e+03 1.2098e+00  
4.51852641e+02
         Well19937a 1.56130116e-05 2.14122318e-06 3.1226e+03 1.4495e+00  
9.68362551e+02
         Well19937c 1.47900777e-05 1.21972362e-06 2.9580e+03 1.3731e+00  
8.03775773e+02
         Well44497a 1.73521386e-05 8.06996030e-07 3.4704e+03 1.6110e+00  
1.31618795e+03
         Well44497b 1.76055476e-05 5.14973607e-07 3.5211e+03 1.6345e+00  
1.36686975e+03
              ISAAC 1.20243626e-05 7.88169183e-07 2.4049e+03 1.1163e+00  
2.50632757e+02

nextDouble() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 2.21792835e-05 2.41114910e-06 4.4359e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.97174235e-05 2.03446115e-06 3.9435e+03 8.8900e-01 
-4.92371995e+02
           Well512a 2.12618083e-05 1.88123228e-06 4.2524e+03 9.5863e-01 
-1.83495042e+02
          Well1024a 2.41596883e-05 2.57029116e-06 4.8319e+03 1.0893e+00  
3.96080966e+02
         Well19937a 2.66921255e-05 1.49574930e-06 5.3384e+03 1.2035e+00  
9.02568405e+02
         Well19937c 2.92679702e-05 1.33918527e-06 5.8536e+03 1.3196e+00  
1.41773734e+03
         Well44497a 3.36010178e-05 2.73487416e-06 6.7202e+03 1.5150e+00  
2.28434687e+03
         Well44497b 3.59767542e-05 1.08315236e-06 7.1954e+03 1.6221e+00  
2.75949415e+03
              ISAAC 2.21894735e-05 1.53691118e-06 4.4379e+03 1.0005e+00  
2.03801200e+00

nextLong() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 2.34176352e-05 2.20143373e-06 4.6835e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.94922075e-05 1.69613941e-06 3.8984e+03 8.3237e-01 
-7.85085547e+02
           Well512a 2.49888171e-05 1.35979076e-06 4.9978e+03 1.0671e+00  
3.14236383e+02
          Well1024a 2.33254629e-05 9.16091045e-07 4.6651e+03 9.9606e-01 
-1.84344700e+01
         Well19937a 2.61210825e-05 7.35362338e-07 5.2242e+03 1.1154e+00  
5.40689466e+02
         Well19937c 2.77540845e-05 2.52800203e-06 5.5508e+03 1.1852e+00  
8.67289861e+02
         Well44497a 3.29713155e-05 2.55438277e-06 6.5943e+03 1.4080e+00  
1.91073605e+03
         Well44497b 3.42167709e-05 1.27562024e-06 6.8434e+03 1.4612e+00  
2.15982714e+03
              ISAAC 2.15235364e-05 1.57741593e-06 4.3047e+03 9.1912e-01 
-3.78819767e+02

+--------+
| master |
+--------+
nextInt() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 1.34103050e-05 2.45357474e-06 2.6821e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.14748419e-05 9.90008146e-07 2.2950e+03 8.5567e-01 
-3.87092621e+02
           Well512a 1.34941515e-05 1.69129854e-06 2.6988e+03 1.0063e+00  
1.67693020e+01
          Well1024a 1.50723034e-05 8.03930622e-07 3.0145e+03 1.1239e+00  
3.32399667e+02
         Well19937a 1.38263157e-05 5.69889705e-06 2.7653e+03 1.0310e+00  
8.32021280e+01
         Well19937c 1.59393823e-05 2.35562440e-06 3.1879e+03 1.1886e+00  
5.05815457e+02
         Well44497a 1.93589129e-05 2.64356495e-06 3.8718e+03 1.4436e+00  
1.18972157e+03
         Well44497b 2.03862615e-05 2.14390340e-06 4.0773e+03 1.5202e+00  
1.39519129e+03
              ISAAC 1.32823857e-05 2.02074578e-06 2.6565e+03 9.9046e-01 
-2.55838710e+01

nextDouble() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 2.32352997e-05 1.76107593e-06 4.6471e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.95680107e-05 5.15377258e-07 3.9136e+03 8.4217e-01 
-7.33457809e+02
           Well512a 2.40500380e-05 2.50013851e-06 4.8100e+03 1.0351e+00  
1.62947648e+02
          Well1024a 2.48173367e-05 2.93429135e-06 4.9635e+03 1.0681e+00  
3.16407385e+02
         Well19937a 2.75334969e-05 2.45923272e-06 5.5067e+03 1.1850e+00  
8.59639442e+02
         Well19937c 2.84004673e-05 1.95285979e-06 5.6801e+03 1.2223e+00  
1.03303351e+03
         Well44497a 3.55292049e-05 2.88860968e-06 7.1058e+03 1.5291e+00  
2.45878103e+03
         Well44497b 3.69400758e-05 1.55845565e-06 7.3880e+03 1.5898e+00  
2.74095521e+03
              ISAAC 2.21769477e-05 2.20944591e-06 4.4354e+03 9.5445e-01 
-2.11670411e+02

nextLong() (calls per timed block: 2000000, timed blocks: 100, time 
unit: ms)
               name      time/call      std error total time      ratio  
difference
JDKRandomGenerator 2.24688238e-05 1.33151962e-06 4.4938e+03 1.0000e+00  
0.00000000e+00
    MersenneTwister 1.81175642e-05 1.63071216e-06 3.6235e+03 8.0634e-01 
-8.70251910e+02
           Well512a 2.27232582e-05 1.06753850e-06 4.5447e+03 1.0113e+00  
5.08868800e+01
          Well1024a 2.27862427e-05 1.59353328e-06 4.5572e+03 1.0141e+00  
6.34837960e+01
         Well19937a 2.50605559e-05 1.00839808e-06 5.0121e+03 1.1153e+00  
5.18346431e+02
         Well19937c 2.68680821e-05 1.11341253e-06 5.3736e+03 1.1958e+00  
8.79851660e+02
         Well44497a 3.29918582e-05 7.18500728e-07 6.5984e+03 1.4683e+00  
2.10460688e+03
         Well44497b 3.47592845e-05 7.83038103e-07 6.9519e+03 1.5470e+00  
2.45809215e+03
              ISAAC 1.99627637e-05 1.40470250e-06 3.9926e+03 8.8847e-01 
-5.01212018e+02


In light of the above, I feel I can call foul on a behaviour that is 
fully focused
on stymieing (even mildly) "revolutionary"[2] endeavours.
A supportive community would have engaged in a constructive discussion 
(if it
felt that bad side-effects could ensue[3]), rather than being 
affirmative that some
parts of the library are out-of-bounds (for me).


Regards,
Gilles

[1] I'd rather not enter in the "meaning of meritocracy" debate; I'm 
not implying that
     I have the same skills as Luc and that I must benefit of the same 
trust.  Here we
     are considering a specific situation where on the one hand massive 
amounts of code
     are being accepted without a single review, while on the other hand 
an easily
     reviewable code (reading 20 lines and/or "mvn test") *must* obtain 
clearance from
     a PRNG expert (who is knowingly absent from these neighbourhoods).
[2] http://www.apachecon.com/eu2007/materials/ac2006.2.pdf (page 186).
[3] How about simply file a JIRA report (like: "Make sure no corners 
were cut in commit
     xxxx") rather than starting a FUD campaign on this list?


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org