You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@openoffice.apache.org by Pedro Giffuni <pf...@apache.org> on 2012/12/18 16:11:34 UTC

Re: Blog post

Hello;

Just to get the general public to know some of the things there are going on in
the AOO code, Andrea and I have been preparing a blog post about the new
random number generator:

https://blogs.apache.org/preview/OOo/?previewEntry=random_numbers_in_calc_small


Just thought we should give you a chance to complain about it before it goes
live ;).

cheers,

Pedro.

Re: Blog post

Posted by Kay Schenk <ka...@gmail.com>.
On Thu, Dec 20, 2012 at 10:08 AM, Andrea Pescetti <pe...@apache.org>wrote:

> On 18/12/2012 Pedro Giffuni wrote:
>
>> For the record, I am perfectly OK with all the changes and turning this
>> into
>> a community post or whatever the collective feedbacks turns it into ;).
>>
>
> Now live at
> https://blogs.apache.org/OOo/**entry/random_numbers_in_calc_**small<https://blogs.apache.org/OOo/entry/random_numbers_in_calc_small>
>
> Regards,
>   Andrea.
>

Very nice and thank you Pedro!




-- 
----------------------------------------------------------------------------------------
MzK

"No act of kindness, no matter how small, is ever wasted."
                                                                         --
Aesop

Re: Blog post

Posted by Andrea Pescetti <pe...@apache.org>.
On 18/12/2012 Pedro Giffuni wrote:
> For the record, I am perfectly OK with all the changes and turning this into
> a community post or whatever the collective feedbacks turns it into ;).

Now live at
https://blogs.apache.org/OOo/entry/random_numbers_in_calc_small

Regards,
   Andrea.

Re: Blog post

Posted by Pedro Giffuni <pf...@apache.org>.
Tthank you Andrea!


----- Messaggio originale -----
> Da: Andrea Pescetti 
...
> 
> On 18/12/2012 Pedro Giffuni wrote:
>>>  Da: Rob Weir
>>>     So it might be worth
>>>  encouraging some more rigorous testing here.  In fact, maybe your blog
>>>  post can help recruit some volunteers? ...
>>  That is certainly welcome. I hope Andrea is taking notes.
> 
> I am! But first: please note that the blog post was entirely written by Pedro, I 
> merely helped in posting it to Roller (our blogging application). And anyone 
> wishing to access Roller must simply request it on this list.
> 
> The following changes have now been done on

> https://blogs.apache.org/preview/OOo/?previewEntry=random_numbers_in_calc_small
> 
> 1) Minor fixes suggested by Donald
> 2) Removed the reference to this being the first implementation from the last 
> paragraph.
> 3) Clarification that we don't aim at crypto-grade quality
> 4) Call for testing volunteers
> 


For the record, I am perfectly OK with all the changes and turning this into
a community post or whatever the collective feedbacks turns it into ;).
Pedro.

Re: Blog post

Posted by Andrea Pescetti <pe...@apache.org>.
On 18/12/2012 Pedro Giffuni wrote:
>> Da: Rob Weir
>>    So it might be worth
>> encouraging some more rigorous testing here.  In fact, maybe your blog
>> post can help recruit some volunteers? ...
> That is certainly welcome. I hope Andrea is taking notes.

I am! But first: please note that the blog post was entirely written by 
Pedro, I merely helped in posting it to Roller (our blogging 
application). And anyone wishing to access Roller must simply request it 
on this list.

The following changes have now been done on
https://blogs.apache.org/preview/OOo/?previewEntry=random_numbers_in_calc_small

1) Minor fixes suggested by Donald
2) Removed the reference to this being the first implementation from the 
last paragraph.
3) Clarification that we don't aim at crypto-grade quality
4) Call for testing volunteers

Regards,
   Andrea.

Re: Blog post

Posted by Pedro Giffuni <pf...@apache.org>.



----- Messaggio originale -----
> Da: Rob Weir

> 
> 
> But would the earlier implementation also pass that same test?
> 
> Two test suites specifically for pseudo random number generators are:
> 
> Dieharder:  http://www.phy.duke.edu/~rgb/General/dieharder.php
> 
> and
> 
> this test from NIST:  http://csrc.nist.gov/groups/ST/toolkit/rng/index.html
> 
>>  The algorithm is known to pass some very strict statistical tests though.
>> 

Indeed I was referring to those two tests. Our new implementation
should pass them.

My simple understanding of the testing is that you usually don't embed the
test in a speadsheet: most testers will generate a sequence of, let's say
1 million numbers and pass them through the tests.

> 
> And that is an important thing, that we are starting with an algorithm
> that is already well known.  But any given implementation, due to
> subtleties of the implementation details, might not do as well as
> theory.  We see that with crypto all the time.

One thing that must be clear is that we don't generate crypto-grade
random numbers, that's far more complex than what I intended to
do. I was rather more concerned on the "Portable" part of PRNG:
it is critical that the RAND() function in Calc doesn't depend on
libc's rand() and it's a shame on whomever design the original
implementation. The algorithm we are using is not really very fast
or has a longest period it just solves a bug ;).


>   So it might be worth
> encouraging some more rigorous testing here.  In fact, maybe your blog
> post can help recruit some volunteers?   Say that we take our numeric
> algorithms very seriously, we're looking for the best.  This shows in
> our use of COIN-MP solver code, but also in our new PRNG.  We
> encourage anyone who wants to help to put our code through rigorous
> tests and help us find any problems, etc.
>

That is certainly welcome. I hope Andrea is taking notes.


We probably should implement Mersenne twister but with some tweaks
for the seeding, I just haven't seen the need to spend time on it. :)

cheers,

Pedro.

Re: Blog post

Posted by Rob Weir <ro...@apache.org>.
On Tue, Dec 18, 2012 at 1:56 PM, Pedro Giffuni <pf...@apache.org> wrote:
> Hi Rob;
>
>
> ----- Messaggio originale -----
>> Da: Rob Weir
> ...
>>
>> On Dec 18, 2012, at 10:12 AM, Pedro Giffuni wrote:
>>
>>>  Hello;
>>>
>>>  Just to get the general public to know some of the things there are going
>> on in
>>>  the AOO code, Andrea and I have been preparing a blog post about the new
>>>  random number generator:
>>>
>>>
>> https://blogs.apache.org/preview/OOo/?previewEntry=random_numbers_in_calc_small
>>>
>>
>> Nice. Is it worth describing the testing? In particular, do we have
>> any tests that show clear improvements?  Anything that can be shown in
>> a chart?
>>
>
> The test I did were pretty simple: I only generated a graph with 1000 values and
> then regenerated several times a complete row (more that 1 million values) to
> see no values were negative or overflowed.
>
> The problem is that in the platforms I use the period of libc's rand() is already
> long enough that a graphic plot of just 500-1000 values wont show any
> periodicity.
>


But would the earlier implementation also pass that same test?

Two test suites specifically for pseudo random number generators are:

Dieharder:  http://www.phy.duke.edu/~rgb/General/dieharder.php

and

this test from NIST:  http://csrc.nist.gov/groups/ST/toolkit/rng/index.html

> The algorithm is known to pass some very strict statistical tests though.
>

And that is an important thing, that we are starting with an algorithm
that is already well known.  But any given implementation, due to
subtleties of the implementation details, might not do as well as
theory.  We see that with crypto all the time.   So it might be worth
encouraging some more rigorous testing here.  In fact, maybe your blog
post can help recruit some volunteers?   Say that we take our numeric
algorithms very seriously, we're looking for the best.  This shows in
our use of COIN-MP solver code, but also in our new PRNG.  We
encourage anyone who wants to help to put our code through rigorous
tests and help us find any problems, etc.

-Rob

> Pedro.

Re: Blog post

Posted by Pedro Giffuni <pf...@apache.org>.
Hi Rob;


----- Messaggio originale -----
> Da: Rob Weir 
...
> 
> On Dec 18, 2012, at 10:12 AM, Pedro Giffuni wrote:
> 
>>  Hello;
>> 
>>  Just to get the general public to know some of the things there are going 
> on in
>>  the AOO code, Andrea and I have been preparing a blog post about the new
>>  random number generator:
>> 
>> 
> https://blogs.apache.org/preview/OOo/?previewEntry=random_numbers_in_calc_small
>> 
> 
> Nice. Is it worth describing the testing? In particular, do we have
> any tests that show clear improvements?  Anything that can be shown in
> a chart?
> 

The test I did were pretty simple: I only generated a graph with 1000 values and
then regenerated several times a complete row (more that 1 million values) to
see no values were negative or overflowed.

The problem is that in the platforms I use the period of libc's rand() is already
long enough that a graphic plot of just 500-1000 values wont show any
periodicity.

The algorithm is known to pass some very strict statistical tests though.

Pedro.

Re: Blog post

Posted by Pedro Giffuni <pf...@apache.org>.
Hi Dennis;


----- Messaggio originale -----
...
> 
> I would remove the last sentence or just talk about the benefit of having it in 
> ALv2 without claiming any firsts.  
> 

I agree. When I wrote the code I thought I was the first working on it. Afterwards
I found out that hanya had written ported the Mersenne twister algorithm as an
extension .. which should be at least 4 times faster and has a better period.

A was considering cancelling the blog post but it's still a good example about
how OpenOffice is easy to hack and that there are simple improvements that can be done.

Pedro.

RE: Blog post

Posted by "Dennis E. Hamilton" <de...@acm.org>.
I would remove the last sentence or just talk about the benefit of having it in ALv2 without claiming any firsts.  (Also, the code was adapted from another source and I think that should be acknowledged. I don't know the source.  It is not exactly the algorithm published in the Wichmann-Hill paper I have.)

 - Dennis

Concerning performance, etc.

The improved Wichmann-Hill pseudo-random generator paper that is referenced is behind a pay wall.  There is a non-pay-wall version available at 
<http://www.eurometros.org/component_search.php?component_type=algorithms>.

Finally, the new Wichmann-Hill algorithm is slower than the original version and also Mersenne Twister.  It is simple to implement.  The implementation works on any 32-bit-capable (x86 but also including x64) system by running 4 nearly-31-bit pseudo-random generators in parallel and combining their results.  The 4 generators have different maximum periods such that it takes an extremely long period before those maximum periods simultaneously return to their starting values.

In the addition of the improved Wichmann-Hill algorithm for ScRandom() in Calc by Pedro, 
<https://issues.apache.org/ooo/show_bug.cgi?id=121421#c28>, the initial seed is essentially a randomly-generated, roughly 124-bit sequence distributed into seeds for the 4 internal generators.  (128 bits are generated, but since the generators work modulo 31-bit primes, the unique seeds have fewer than 2^124 combinations.)  

I'm not aware of any other analysis that has been done.  The only unexplored area, I think, is whether further initial-seed conditioning is needed to ensure maximum periods and also avoid a few improbable but awful pathological cases.  It would be useful to run one of the well-known pseudo-random-generator stress tests to confirm that the reported qualities of Wichmann-Hill #2 are being delivered.

-----Original Message-----
From: Rob Weir [mailto:rabastus@gmail.com] 
Sent: Tuesday, December 18, 2012 09:29
To: dev@openoffice.apache.org
Subject: Re: Blog post

On Dec 18, 2012, at 10:12 AM, Pedro Giffuni <pf...@apache.org> wrote:

> Hello;
>
> Just to get the general public to know some of the things there are going on in
> the AOO code, Andrea and I have been preparing a blog post about the new
> random number generator:
>
> https://blogs.apache.org/preview/OOo/?previewEntry=random_numbers_in_calc_small
>

Nice. Is it worth describing the testing? In particular, do we have
any tests that show clear improvements?  Anything that can be shown in
a chart?

-Rob


>
> Just thought we should give you a chance to complain about it before it goes
> live ;).
>
> cheers,
>
> Pedro.


Re: Blog post

Posted by Rob Weir <ra...@gmail.com>.
On Dec 18, 2012, at 10:12 AM, Pedro Giffuni <pf...@apache.org> wrote:

> Hello;
>
> Just to get the general public to know some of the things there are going on in
> the AOO code, Andrea and I have been preparing a blog post about the new
> random number generator:
>
> https://blogs.apache.org/preview/OOo/?previewEntry=random_numbers_in_calc_small
>

Nice. Is it worth describing the testing? In particular, do we have
any tests that show clear improvements?  Anything that can be shown in
a chart?

-Rob


>
> Just thought we should give you a chance to complain about it before it goes
> live ;).
>
> cheers,
>
> Pedro.

RE: Blog post

Posted by Manuel del Valle <m....@outlook.com>.

> Date: Tue, 18 Dec 2012 07:33:22 -0800
> From: pfg@apache.org
> Subject: Re: Blog post
> To: dev@openoffice.apache.org
> 
> ...
> ----- Messaggio originale -----
> > 
> >Grammar weenie:
> > 
> > "produce very limited results but" -> "produce very limited 
> > results, but"
> > 
> > "dangerously such random" -> "dangerously such a random"
> > 
> > "well documented algorithms but" -> "well documented 
> > algorithms, but"
> > 
> > "used by Microsoft, after" -> "used by Microsoft; after"
> > 
> > "not complex the implementation" -> "not complex; the 
> > implementation"
> > 
> 
> All these are welcome.
> 
> > "first in Apache OpenOffice than other in Office Suites" -> 
> > "first in
> > Apache OpenOffice earlier than other Office Suites" (was this your
> > intended meaning?)
> > 
> 
> Not really..
> Hmm .. perhaps "earlier in Apache OpenOffice with respect to other Office Suites".
> 
> 
> No pun intended there: MS-Office got it first but it is not prime quality.
> 
> Gnumeric implemented Mersenne Twister which has some better characteristics
> long ago.
> 
> Pedro.


Could it be...?

"is not that we got this code developed first in Apache OpenOffice
than other in Office Suites" ->
"is not that we got this code developed first in Apache OpenOffice
than in any other office suite"

I'm not a native speaker, so I can't guarantee, but it "reads" better to me ;)
Regards,

Manuel
 		 	   		  

Re: Blog post

Posted by Pedro Giffuni <pf...@apache.org>.
...
----- Messaggio originale -----
> 
>Grammar weenie:
> 
> "produce very limited results but" -> "produce very limited 
> results, but"
> 
> "dangerously such random" -> "dangerously such a random"
> 
> "well documented algorithms but" -> "well documented 
> algorithms, but"
> 
> "used by Microsoft, after" -> "used by Microsoft; after"
> 
> "not complex the implementation" -> "not complex; the 
> implementation"
> 

All these are welcome.

> "first in Apache OpenOffice than other in Office Suites" -> 
> "first in
> Apache OpenOffice earlier than other Office Suites" (was this your
> intended meaning?)
> 

Not really..
Hmm .. perhaps "earlier in Apache OpenOffice with respect to other Office Suites".


No pun intended there: MS-Office got it first but it is not prime quality.

Gnumeric implemented Mersenne Twister which has some better characteristics
long ago.

Pedro.

Re: Blog post

Posted by Donald Whytock <dw...@gmail.com>.
Grammar weenie:

"produce very limited results but" -> "produce very limited results, but"

"dangerously such random" -> "dangerously such a random"

"well documented algorithms but" -> "well documented algorithms, but"

"used by Microsoft, after" -> "used by Microsoft; after"

"not complex the implementation" -> "not complex; the implementation"

"first in Apache OpenOffice than other in Office Suites" -> "first in
Apache OpenOffice earlier than other Office Suites" (was this your
intended meaning?)

Don

On Tue, Dec 18, 2012 at 10:11 AM, Pedro Giffuni <pf...@apache.org> wrote:
> Hello;
>
> Just to get the general public to know some of the things there are going on in
> the AOO code, Andrea and I have been preparing a blog post about the new
> random number generator:
>
> https://blogs.apache.org/preview/OOo/?previewEntry=random_numbers_in_calc_small
>
>
> Just thought we should give you a chance to complain about it before it goes
> live ;).
>
> cheers,
>
> Pedro.