You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Gilles Sadowski <gi...@harfang.homelinux.org> on 2012/07/16 23:46:14 UTC

[Math] Issues 764 and 823

Hi.

Referring to the discussion here:
  https://issues.apache.org/jira/browse/MATH-764
  https://issues.apache.org/jira/browse/MATH-823

Do we agree:
 * To add new constructors to the distribution classes (in package
   "o.a.c.m.distribution") that take a "RandomGenerator" (in package
   "o.a.c.m.random") as an argument?
 * That this argument (reference) will be stored in the distribution
   object but can be manipulated from outside (e.g. "setSeed") so that
   the class is not strictly immutable? [Immutability is impossible to
   enforce since there is no way to copy such an object.]
 * That the distribution classes do not need a setter for the RNG, nor a
   a method to re-seed the RNG? [I.e. if a user wants a new RNG, he must
   instantiate a new distribution object.]
 * That the ad-hoc code of the sampling methods currently in
   "RandomDataImpl" (in package "o.a.c.m.random") will be moved over to the
   "sample" method in the correpsonding distribution class?
 * That "RandomData" and "RandomDataImpl" will be refactored so that methods
   that duplicate functionality transferred to the distribution will
   deprecated in 3.1 and removed in 4.0?
 * That the interface "RandomData" and its unique implementation
  ("RandomDataImpl") will be merged in 4.0?


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Issues 764 and 823

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Mon, Jul 16, 2012 at 04:39:41PM -0700, Phil Steitz wrote:
> On 7/16/12 2:46 PM, Gilles Sadowski wrote:
> > Hi.
> >
> > Referring to the discussion here:
> >   https://issues.apache.org/jira/browse/MATH-764
> >   https://issues.apache.org/jira/browse/MATH-823

Those issues have been resolved in revision 1363604.

> >
> > Do we agree:
> >  * To add new constructors to the distribution classes (in package
> >    "o.a.c.m.distribution") that take a "RandomGenerator" (in package
> >    "o.a.c.m.random") as an argument?
> +1

Done.

> >  * That this argument (reference) will be stored in the distribution
> >    object but can be manipulated from outside (e.g. "setSeed") so that
> >    the class is not strictly immutable? [Immutability is impossible to
> >    enforce since there is no way to copy such an object.]
> +1

Done.

> >  * That the distribution classes do not need a setter for the RNG, nor a
> >    a method to re-seed the RNG? [I.e. if a user wants a new RNG, he must
> >    instantiate a new distribution object.]
> +1 for no setter, -1 for no reseed.

The "reseedRandomGenerator" method already existed, so this decision must
anyways be postponed to 4.0.

> >  * That the ad-hoc code of the sampling methods currently in
> >    "RandomDataImpl" (in package "o.a.c.m.random") will be moved over to the
> >    "sample" method in the correpsonding distribution class?
> +1

Done.

> >  * That "RandomData" and "RandomDataImpl" will be refactored so that methods
> >    that duplicate functionality transferred to the distribution will
> >    deprecated in 3.1 and removed in 4.0?
> -0  If we keep these classes, which I am inclined to support
> (combined into one), they should delegate implementations to
> distributions (exactly the reverse of how it is now).

Not done yet. A ticket should be opened.

For 3.1:
 * either we leave the situation as is (i.e. duplicate code in
   "RandomDataImpl" and in the distribution classes),
 * or we delegate to the distribution's "sample" method (in the
   obvious but possibly inefficient way).

Other ideas?

> >  * That the interface "RandomData" and its unique implementation
> >   ("RandomDataImpl") will be merged in 4.0?
> +1

A new JIRA ticket should be opened.


Regards,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Issues 764 and 823

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Mon, Jul 16, 2012 at 09:08:51PM -0700, Phil Steitz wrote:
> On 7/16/12 5:17 PM, Gilles Sadowski wrote:
> > On Mon, Jul 16, 2012 at 04:39:41PM -0700, Phil Steitz wrote:
> >> On 7/16/12 2:46 PM, Gilles Sadowski wrote:
> >>> Hi.
> >>>
> >>> Referring to the discussion here:
> >>>   https://issues.apache.org/jira/browse/MATH-764
> >>>   https://issues.apache.org/jira/browse/MATH-823
> >>>
> >>> Do we agree:
> >>>  * To add new constructors to the distribution classes (in package
> >>>    "o.a.c.m.distribution") that take a "RandomGenerator" (in package
> >>>    "o.a.c.m.random") as an argument?
> >> +1
> >>>  * That this argument (reference) will be stored in the distribution
> >>>    object but can be manipulated from outside (e.g. "setSeed") so that
> >>>    the class is not strictly immutable? [Immutability is impossible to
> >>>    enforce since there is no way to copy such an object.]
> >> +1
> >>>  * That the distribution classes do not need a setter for the RNG, nor a
> >>>    a method to re-seed the RNG? [I.e. if a user wants a new RNG, he must
> >>>    instantiate a new distribution object.]
> >> +1 for no setter, -1 for no reseed.
> > Why do you need a "reSeed" method since it can be called on the RNG object
> > that was passed to the constructor of the distribution?
> > I.e. it is the user's responsibility to keep a reference to the RNG object
> > if he wants to later modify it.
> > IMO, it makes it more obvious that the RNG is "owned" by the user's code,
> > not by the distribution instance.
> 
> I get your point, but in my view, any random data generation API
> should expose a reseed method to reseed the underlying source of
> randomness.  So rather than force users to hold a reference to the
> RandomData instance provided at construction time to do this
> indirectly, I think it would be more convenient / more complete to
> provide the method in the same API that does the data generation,
> which is now going to be the distribution.

Currently, I only see bad usage examples of this functionality:

-----
final RandomGenerator rng = new Well512a();
final int num = 5;
final NormalDistribution[] dists = new NormalDistribution[num];

// Initialize the distributions.
for (int i = 0; i < num; i++) {
  final double mean = ...
  final double sigma = ...
  dists[i] = new NormalDistribution(rng, mean, sigma);
}

// Do some work.
// ...

// Reseed the RNG.
for (NormalDistribution g : dists) {
  final long seed = ...
  g.setSeed(seed);
}
-----

The "setSeed" method is very error-prone, as the last block shows:
 1. No loop is necessary since there is a single instance of the RNG,
    shared by all the distributions.
 2. The same seed is used by all the distributions, although the loop make
    it look like each is reseeded differently. 


Another problem is that "RandomGenerator" has 3 "setSeed" methods, and the
distribution interfaces must then be expanded to accomodate all the
variants; and this additional layer of delegation will not save a single
typing stroke of the user.


One situation where a "setSeed" is needed is when there is no more handle to
the RNG:

-----
// Initialize the distributions.
for (int i = 0; i < num; i++) { 
  final double mean = ...
  final double sigma = ...
  dists[i] = new NormalDistribution(new Well512a(), mean, sigma);
}
-----

But I would argue that if a user later needs access to the RNGs, the best
way is indeed to keep the handles:

-----
final RandomGenerator[] rng = new RandomGenerator[num];

// Initialize the RNGs and the distributions.
for (int i = 0; i < num; i++) {
  rng[i] = new Well512a();

  final double mean = ... 
  final double sigma = ...
  dists[i] = new NormalDistribution(rng[i], mean, sigma);
}

// ...

// Reseed the RNGs.
for (RandomGenerator r : rng) {
  final long seed = ...
  r.setSeed(seed);
}
-----

> 
> >
> >>>  * That the ad-hoc code of the sampling methods currently in
> >>>    "RandomDataImpl" (in package "o.a.c.m.random") will be moved over to the
> >>>    "sample" method in the correpsonding distribution class?
> >> +1
> >>>  * That "RandomData" and "RandomDataImpl" will be refactored so that methods
> >>>    that duplicate functionality transferred to the distribution will
> >>>    deprecated in 3.1 and removed in 4.0?
> >> -0  If we keep these classes, which I am inclined to support
> >> (combined into one), they should delegate implementations to
> >> distributions (exactly the reverse of how it is now).
> > No problem. But IIUC, this will require instantiating a distribution object
> > for each call; in some situations, this will be an obvious performance loss
> > (e.g. when sampling from a distribution with same parameters).
> 
> Yeah, we will probably want to cache these.

This will be rather difficult, since there should be one instance for each
combination (the number of which is infinite) of the parameters. The memory
footprint will quickly become unbearable.
Fixed size pools can be useless given a (un)suitable striping of the
parameters in the successive calls to the method.
Fecthing small objects (like distributions) from the cache might not be any
faster than recreating them (the more so that the RNG itself can be reused)
with the appropriate parameters.

Regards,
Gilles

> 
> Phil
> >
> >>>  * That the interface "RandomData" and its unique implementation
> >>>   ("RandomDataImpl") will be merged in 4.0?
> >> +1
> >>
> > Gilles
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Issues 764 and 823

Posted by Phil Steitz <ph...@gmail.com>.
On 7/16/12 5:17 PM, Gilles Sadowski wrote:
> On Mon, Jul 16, 2012 at 04:39:41PM -0700, Phil Steitz wrote:
>> On 7/16/12 2:46 PM, Gilles Sadowski wrote:
>>> Hi.
>>>
>>> Referring to the discussion here:
>>>   https://issues.apache.org/jira/browse/MATH-764
>>>   https://issues.apache.org/jira/browse/MATH-823
>>>
>>> Do we agree:
>>>  * To add new constructors to the distribution classes (in package
>>>    "o.a.c.m.distribution") that take a "RandomGenerator" (in package
>>>    "o.a.c.m.random") as an argument?
>> +1
>>>  * That this argument (reference) will be stored in the distribution
>>>    object but can be manipulated from outside (e.g. "setSeed") so that
>>>    the class is not strictly immutable? [Immutability is impossible to
>>>    enforce since there is no way to copy such an object.]
>> +1
>>>  * That the distribution classes do not need a setter for the RNG, nor a
>>>    a method to re-seed the RNG? [I.e. if a user wants a new RNG, he must
>>>    instantiate a new distribution object.]
>> +1 for no setter, -1 for no reseed.
> Why do you need a "reSeed" method since it can be called on the RNG object
> that was passed to the constructor of the distribution?
> I.e. it is the user's responsibility to keep a reference to the RNG object
> if he wants to later modify it.
> IMO, it makes it more obvious that the RNG is "owned" by the user's code,
> not by the distribution instance.

I get your point, but in my view, any random data generation API
should expose a reseed method to reseed the underlying source of
randomness.  So rather than force users to hold a reference to the
RandomData instance provided at construction time to do this
indirectly, I think it would be more convenient / more complete to
provide the method in the same API that does the data generation,
which is now going to be the distribution.

>
>>>  * That the ad-hoc code of the sampling methods currently in
>>>    "RandomDataImpl" (in package "o.a.c.m.random") will be moved over to the
>>>    "sample" method in the correpsonding distribution class?
>> +1
>>>  * That "RandomData" and "RandomDataImpl" will be refactored so that methods
>>>    that duplicate functionality transferred to the distribution will
>>>    deprecated in 3.1 and removed in 4.0?
>> -0  If we keep these classes, which I am inclined to support
>> (combined into one), they should delegate implementations to
>> distributions (exactly the reverse of how it is now).
> No problem. But IIUC, this will require instantiating a distribution object
> for each call; in some situations, this will be an obvious performance loss
> (e.g. when sampling from a distribution with same parameters).

Yeah, we will probably want to cache these.

Phil
>
>>>  * That the interface "RandomData" and its unique implementation
>>>   ("RandomDataImpl") will be merged in 4.0?
>> +1
>>
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Issues 764 and 823

Posted by Gilles Sadowski <gi...@harfang.homelinux.org>.
On Mon, Jul 16, 2012 at 04:39:41PM -0700, Phil Steitz wrote:
> On 7/16/12 2:46 PM, Gilles Sadowski wrote:
> > Hi.
> >
> > Referring to the discussion here:
> >   https://issues.apache.org/jira/browse/MATH-764
> >   https://issues.apache.org/jira/browse/MATH-823
> >
> > Do we agree:
> >  * To add new constructors to the distribution classes (in package
> >    "o.a.c.m.distribution") that take a "RandomGenerator" (in package
> >    "o.a.c.m.random") as an argument?
> +1
> >  * That this argument (reference) will be stored in the distribution
> >    object but can be manipulated from outside (e.g. "setSeed") so that
> >    the class is not strictly immutable? [Immutability is impossible to
> >    enforce since there is no way to copy such an object.]
> +1
> >  * That the distribution classes do not need a setter for the RNG, nor a
> >    a method to re-seed the RNG? [I.e. if a user wants a new RNG, he must
> >    instantiate a new distribution object.]
> +1 for no setter, -1 for no reseed.

Why do you need a "reSeed" method since it can be called on the RNG object
that was passed to the constructor of the distribution?
I.e. it is the user's responsibility to keep a reference to the RNG object
if he wants to later modify it.
IMO, it makes it more obvious that the RNG is "owned" by the user's code,
not by the distribution instance.

> >  * That the ad-hoc code of the sampling methods currently in
> >    "RandomDataImpl" (in package "o.a.c.m.random") will be moved over to the
> >    "sample" method in the correpsonding distribution class?
> +1
> >  * That "RandomData" and "RandomDataImpl" will be refactored so that methods
> >    that duplicate functionality transferred to the distribution will
> >    deprecated in 3.1 and removed in 4.0?
> -0  If we keep these classes, which I am inclined to support
> (combined into one), they should delegate implementations to
> distributions (exactly the reverse of how it is now).

No problem. But IIUC, this will require instantiating a distribution object
for each call; in some situations, this will be an obvious performance loss
(e.g. when sampling from a distribution with same parameters).

> >  * That the interface "RandomData" and its unique implementation
> >   ("RandomDataImpl") will be merged in 4.0?
> +1
> 

Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [Math] Issues 764 and 823

Posted by Phil Steitz <ph...@gmail.com>.
On 7/16/12 2:46 PM, Gilles Sadowski wrote:
> Hi.
>
> Referring to the discussion here:
>   https://issues.apache.org/jira/browse/MATH-764
>   https://issues.apache.org/jira/browse/MATH-823
>
> Do we agree:
>  * To add new constructors to the distribution classes (in package
>    "o.a.c.m.distribution") that take a "RandomGenerator" (in package
>    "o.a.c.m.random") as an argument?
+1
>  * That this argument (reference) will be stored in the distribution
>    object but can be manipulated from outside (e.g. "setSeed") so that
>    the class is not strictly immutable? [Immutability is impossible to
>    enforce since there is no way to copy such an object.]
+1
>  * That the distribution classes do not need a setter for the RNG, nor a
>    a method to re-seed the RNG? [I.e. if a user wants a new RNG, he must
>    instantiate a new distribution object.]
+1 for no setter, -1 for no reseed.
>  * That the ad-hoc code of the sampling methods currently in
>    "RandomDataImpl" (in package "o.a.c.m.random") will be moved over to the
>    "sample" method in the correpsonding distribution class?
+1
>  * That "RandomData" and "RandomDataImpl" will be refactored so that methods
>    that duplicate functionality transferred to the distribution will
>    deprecated in 3.1 and removed in 4.0?
-0  If we keep these classes, which I am inclined to support
(combined into one), they should delegate implementations to
distributions (exactly the reverse of how it is now).
>  * That the interface "RandomData" and its unique implementation
>   ("RandomDataImpl") will be merged in 4.0?
+1

Phil
>
>
> Regards,
> Gilles
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org