You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Luc Maisonobe <Lu...@free.fr> on 2010/11/16 22:16:10 UTC

[math] inconsistent use of random generators

Hi all,

Some of our algorithms do use random number generation. I quickly
reviewed them and found different ways to use them.

Genetic algorithm use a single static RandomGenerator shared by all
instances. It default to JDKRandomGenerator and can be reset by calling
setRandomGenerator.

Multi-start optimizers use a separate RandomGenerator for each instance,
set by a constructor argument.

NaturalRanking use either a RandomDataImpl which is itself either based
on a RandomGenerator constructor argument or a JDKRandomGenerator by
default if the user did not provide a generator.

Kmeans++ use directly a java.util.Random instance provided as a
constructor argument (only this class can be used, none of our
generators can be used here).


What about changing this to be more consistent ? I would like to have
all our algorithms use the RandomGenerator interface, thus allowing the
user to put the generator more suited to their needs (it can be the JDK
one or a better one like Mersenne-Twister or one of the WELL
generators). I would also like to have one generator for each instance
set up at construction time. I also think JDKRandomGenerator is probably
not a good default and one of the more modern generators we now have
could be used like Well19937c.

Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] inconsistent use of random generators

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Nov 17, 2010 at 12:19 AM, Luc Maisonobe <Lu...@free.fr>wrote:

> > In Mahout, we did this by having a static method in a utility class for
> > getting a standard generator for either testing or normal operation.
>  This
> > has turned out very well.
>
> It's a very good idea for production code, but I am not sure I
> understand it's use for test code. Doesn't that induce problems when new
> tests are included which change the running order of the tests or when
> only one test is run ?


No.  The seed is constant.  This means that the test gets the same results
no matter what.

Moreover, since the generator is obtained in the same way, and it is the
utility method that checks
to see if a test is being run, exactly the same production code paths are
exercised in production and
test mode.


> In these cases the current state for the
> generator will not be the same in each situation. Do tests reseed the
> generator when they start ?
>

The utility returns a consistently seeded generator if running a test.  The
code using the generator just asks
the utility class for a generator and doesn't know if it is seeded for a
test or not.

Re: [math] inconsistent use of random generators

Posted by Luc Maisonobe <Lu...@free.fr>.
Le 17/11/2010 02:14, Ted Dunning a écrit :
> It is also desirable to have a way to inject a test generator so that test
> cases can be made deterministic.

Yes, this is what is done in the classes that already have a constructor
argument, the tests do use it and use hard coded funny constants to seed
the generator.

> 
> In Mahout, we did this by having a static method in a utility class for
> getting a standard generator for either testing or normal operation.  This
> has turned out very well.

It's a very good idea for production code, but I am not sure I
understand it's use for test code. Doesn't that induce problems when new
tests are included which change the running order of the tests or when
only one test is run ? In these cases the current state for the
generator will not be the same in each situation. Do tests reseed the
generator when they start ?

Luc

> 
> +1 on using a better default generator.
> 
> On Tue, Nov 16, 2010 at 1:16 PM, Luc Maisonobe <Lu...@free.fr>wrote:
> 
>> I would also like to have one generator for each instance
>> set up at construction time. I also think JDKRandomGenerator is probably
>> not a good default and one of the more modern generators we now have
>> could be used like Well19937c.
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] inconsistent use of random generators

Posted by Ted Dunning <te...@gmail.com>.
It is also desirable to have a way to inject a test generator so that test
cases can be made deterministic.

In Mahout, we did this by having a static method in a utility class for
getting a standard generator for either testing or normal operation.  This
has turned out very well.

+1 on using a better default generator.

On Tue, Nov 16, 2010 at 1:16 PM, Luc Maisonobe <Lu...@free.fr>wrote:

> I would also like to have one generator for each instance
> set up at construction time. I also think JDKRandomGenerator is probably
> not a good default and one of the more modern generators we now have
> could be used like Well19937c.
>

Re: [math] inconsistent use of random generators

Posted by Phil Steitz <ph...@gmail.com>.
On 11/16/10 4:16 PM, Luc Maisonobe wrote:
> Hi all,
>
> Some of our algorithms do use random number generation. I quickly
> reviewed them and found different ways to use them.
>
> Genetic algorithm use a single static RandomGenerator shared by all
> instances. It default to JDKRandomGenerator and can be reset by calling
> setRandomGenerator.

The shared static was for reproducibility of results. The setup is a 
little awkward there (as noted in the comments to MATH-207).  The 
sharing needed is really within a single full "instance" of the 
framework.  Different classes instantiated by the GeneticAlgorithm 
need to be able to share a source of randomness if reproducible 
results are desired.  I would see this as an exceptional case and 
welcome suggestions on how to remove the smell here without 
polluting the API.

+1 for ditching JDKRandomGenerator for a better default

+1 for uniformly making RandomGenerator the type of the configured 
entity (as it is in GA)

>
> Multi-start optimizers use a separate RandomGenerator for each instance,
> set by a constructor argument.

In general, I like this the best, though in some cases setters may 
also be useful.

>
> NaturalRanking use either a RandomDataImpl which is itself either based
> on a RandomGenerator constructor argument or a JDKRandomGenerator by
> default if the user did not provide a generator.

This is fine, IMO, but we should change the default.  The publicly 
exposed thing to configure is a RandomGenerator.

>
> Kmeans++ use directly a java.util.Random instance provided as a
> constructor argument (only this class can be used, none of our
> generators can be used here).

This should be changed to a RandomGenerator, which will allow our 
generators to be plugged in (the reason that RandomGenerator exists :)
>
>
> What about changing this to be more consistent ? I would like to have
> all our algorithms use the RandomGenerator interface, thus allowing the
> user to put the generator more suited to their needs (it can be the JDK
> one or a better one like Mersenne-Twister or one of the WELL
> generators). I would also like to have one generator for each instance
> set up at construction time. I also think JDKRandomGenerator is probably
> not a good default and one of the more modern generators we now have
> could be used like Well19937c.

I like Ted's ideas on making it easier to fix seeds for the tests.

Phil
>
> Luc
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] inconsistent use of random generators

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
> [...]
> 
> What about changing this to be more consistent ?

+1
Obviously :-)

> [...]

Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org