You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Ole Ersoy <ol...@gmail.com> on 2014/10/09 15:12:06 UTC

[Math] Separating Sampling from Distributions

Hello,

Just sharing a few more thoughts on this WRT:
https://issues.apache.org/jira/browse/MATH-1124

(1) The issues currently are:
You have to inject an RNG when using the constructor lengthening instantiation time and possibly increasing memory usage without benefit.

(2)
The design of the distribution is heavier than it needs to be.  For example if you subclass AbstractIntegerDistribution the code I pasted below, which I believe is used only in sampling, is included.  As a result of this anything that uses the distributions become heavier and more complicated than need be, including:
- test code
- subclasses
- composites
- etc.

SAMPLING ONLY CODE IN AbstractIntegerDistribution

     /**
      * RandomData instance used to generate samples from the distribution.
      * @deprecated As of 3.1, to be removed in 4.0. Please use the
      * {@link #random} instance variable instead.
      */
     @Deprecated
     protected final RandomDataImpl randomData = new RandomDataImpl();

     /**
      * RNG instance used to generate samples from the distribution.
      * @since 3.1
      */
     protected final RandomGenerator random;

     /**
      * @deprecated As of 3.1, to be removed in 4.0. Please use
      * {@link #AbstractIntegerDistribution(RandomGenerator)} instead.
      */
     @Deprecated
     protected AbstractIntegerDistribution() {
         // Legacy users are only allowed to access the deprecated "randomData".
         // New users are forbidden to use this constructor.
         random = null;
     }
     /**
      * @param rng Random number generator.
      * @since 3.1
      */
     protected AbstractIntegerDistribution(RandomGenerator rng) {
         random = rng;
     }

     /**
      * {@inheritDoc}
      *
      * The default implementation returns
      * <ul>
      * <li>{@link #getSupportLowerBound()} for {@code p = 0},</li>
      * <li>{@link #getSupportUpperBound()} for {@code p = 1}, and</li>
      * <li>{@link #solveInverseCumulativeProbability(double, int, int)} for
      *     {@code 0 < p < 1}.</li>
      * </ul>
      */
     public int inverseCumulativeProbability(final double p) throws OutOfRangeException {
         if (p < 0.0 || p > 1.0) {
             throw new OutOfRangeException(p, 0, 1);
         }

         int lower = getSupportLowerBound();
         if (p == 0.0) {
             return lower;
         }
         if (lower == Integer.MIN_VALUE) {
             if (checkedCumulativeProbability(lower) >= p) {
                 return lower;
             }
         } else {
             lower -= 1; // this ensures cumulativeProbability(lower) < p, which
                         // is important for the solving step
         }

         int upper = getSupportUpperBound();
         if (p == 1.0) {
             return upper;
         }

         // use the one-sided Chebyshev inequality to narrow the bracket
         // cf. AbstractRealDistribution.inverseCumulativeProbability(double)
         final double mu = getNumericalMean();
         final double sigma = FastMath.sqrt(getNumericalVariance());
         final boolean chebyshevApplies = !(Double.isInfinite(mu) || Double.isNaN(mu) ||
                 Double.isInfinite(sigma) || Double.isNaN(sigma) || sigma == 0.0);
         if (chebyshevApplies) {
             double k = FastMath.sqrt((1.0 - p) / p);
             double tmp = mu - k * sigma;
             if (tmp > lower) {
                 lower = ((int) FastMath.ceil(tmp)) - 1;
             }
             k = 1.0 / k;
             tmp = mu + k * sigma;
             if (tmp < upper) {
                 upper = ((int) FastMath.ceil(tmp)) - 1;
             }
         }

         return solveInverseCumulativeProbability(p, lower, upper);
     }

     /**
      * This is a utility function used by {@link
      * #inverseCumulativeProbability(double)}. It assumes {@code 0 < p < 1} and
      * that the inverse cumulative probability lies in the bracket {@code
      * (lower, upper]}. The implementation does simple bisection to find the
      * smallest {@code p}-quantile <code>inf{x in Z | P(X<=x) >= p}</code>.
      *
      * @param p the cumulative probability
      * @param lower a value satisfying {@code cumulativeProbability(lower) < p}
      * @param upper a value satisfying {@code p <= cumulativeProbability(upper)}
      * @return the smallest {@code p}-quantile of this distribution
      */
     protected int solveInverseCumulativeProbability(final double p, int lower, int upper) {
         while (lower + 1 < upper) {
             int xm = (lower + upper) / 2;
             if (xm < lower || xm > upper) {
                 /*
                  * Overflow.
                  * There will never be an overflow in both calculation methods
                  * for xm at the same time
                  */
                 xm = lower + (upper - lower) / 2;
             }

             double pm = checkedCumulativeProbability(xm);
             if (pm >= p) {
                 upper = xm;
             } else {
                 lower = xm;
             }
         }
         return upper;
     }
     /** {@inheritDoc} */
     public void reseedRandomGenerator(long seed) {
         random.setSeed(seed);
         randomData.reSeed(seed);
     }

     /**
      * {@inheritDoc}
      *
      * The default implementation uses the
      * <a href="http://en.wikipedia.org/wiki/Inverse_transform_sampling">
      * inversion method</a>.
      */
     public int sample() {
         return inverseCumulativeProbability(random.nextDouble());
     }

     /**
      * {@inheritDoc}
      *
      * The default implementation generates the sample by calling
      * {@link #sample()} in a loop.
      */
     public int[] sample(int sampleSize) {
         if (sampleSize <= 0) {
             throw new NotStrictlyPositiveException(
                     LocalizedFormats.NUMBER_OF_SAMPLES, sampleSize);
         }
         int[] out = new int[sampleSize];
         for (int i = 0; i < sampleSize; i++) {
             out[i] = sample();
         }
         return out;
     }



POSSIBLE AbstractIntegerDistribution SIZE IF SAMPLING CODE REMOVED

     /**
      * {@inheritDoc}
      *
      * The default implementation uses the identity
      * <p>{@code P(x0 < X <= x1) = P(X <= x1) - P(X <= x0)}</p>
      */
     public double cumulativeProbability(int x0, int x1) throws NumberIsTooLargeException {
         if (x1 < x0) {
             throw new NumberIsTooLargeException(LocalizedFormats.LOWER_ENDPOINT_ABOVE_UPPER_ENDPOINT,
                     x0, x1, true);
         }
         return cumulativeProbability(x1) - cumulativeProbability(x0);
     }

     /**
      * Computes the cumulative probability function and checks for {@code NaN}
      * values returned. Throws {@code MathInternalError} if the value is
      * {@code NaN}. Rethrows any exception encountered evaluating the cumulative
      * probability function. Throws {@code MathInternalError} if the cumulative
      * probability function returns {@code NaN}.
      *
      * @param argument input value
      * @return the cumulative probability
      * @throws MathInternalError if the cumulative probability is {@code NaN}
      */
     private double checkedCumulativeProbability(int argument)
         throws MathInternalError {
         double result = Double.NaN;
         result = cumulativeProbability(argument);
         if (Double.isNaN(result)) {
             throw new MathInternalError(LocalizedFormats
                     .DISCRETE_CUMULATIVE_PROBABILITY_RETURNED_NAN, argument);
         }
         return result;
     }

Side Note:
There is a logProbability method that just computes the log of a probability.  If someone needs to do this can't they just do FastMath.log(probability) directly?

Cheers,
- Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] AbstractIntegerDistribution.public double logProbability(int x) was Re: [Math] Separating Sampling from Distributions

Posted by Ole Ersoy <ol...@gmail.com>.

> We could not yet add it to the interface due to binary / source
> compatibility rules, but it will be done for 4.0, see the ticket:
> https://issues.apache.org/jira/browse/MATH-1039

Just one more question/suggestion (Hopefully better than the earlier 50K one :) ) - would it make sense to have the distributions that have the log functionality implement an interface instead of inheriting the interface / default implementation?

Also, not sure if this is frowned on, but it would be great if there was a link to the issue (Or at least a note like "see MATH-1039") that contained the patch in such an interface, such that it is more self documenting.

Cheers,
- Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] AbstractIntegerDistribution.public double logProbability(int x) was Re: [Math] Separating Sampling from Distributions

Posted by Ole Ersoy <ol...@gmail.com>.

>> We could not yet add it to the interface due to binary / source
>> compatibility rules, but it will be done for 4.0, see the ticket:
>> https://issues.apache.org/jira/browse/MATH-1039
>
> OK - Thanks for the explanation.  Based on the description in the ticket it seems like it would be better to have it as part of a utility, but that's just my 50K foot view.

Ooops - foot in mouth moment - looked at the patch - looks very very good - sorry for the noise.

Cheers,
Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] AbstractIntegerDistribution.public double logProbability(int x) was Re: [Math] Separating Sampling from Distributions

Posted by Ole Ersoy <ol...@gmail.com>.


On 10/09/2014 11:07 AM, Thomas Neidhart wrote:
> On Thu, Oct 9, 2014 at 6:03 PM, Ole Ersoy <ol...@gmail.com> wrote:
>
>> I think that, for some distribution, the computation of log(p) is more
>>>>> accurate. This was a feature request from not long ago.
>>>>>
>>>>>
>>>> Just pasting the method below:
>>>>
>>>>       public double logProbability(int x) {
>>>>           return FastMath.log(probability(x));
>>>>       }
>>>>
>>>> So it retrieves the probability of the index and uses FastMath to compute
>>>> the log.  We could do:
>>>>
>>>> FastMath(distribution.probability(x));
>>>>
>>>> It just seems a bit over engineered to have something this trivial in the
>>>> API, unless I'm missing something.
>>>>
>>>>
>>> This is the default implementation. For some distributions there is a more
>>> efficient way to compute the logProbability directly, and for these
>>> distributions the method is overridden (e.g. GeometricDistribution,
>>> BinomialDistribution, ...).
>>> Otherwise one would have to call probability(x) first, which in many cases
>>> involves calling exp(...) and then compute the log of it, which is just a
>>> waste of cpu.
>>>
>>
>> So I guess my question is "Why is there value in having this as part of
>> the public API of AbstractIntegerDistribution?".  Not really a big deal,
>> but it just seems like it's an implementation detail.
>>
>
> We could not yet add it to the interface due to binary / source
> compatibility rules, but it will be done for 4.0, see the ticket:
> https://issues.apache.org/jira/browse/MATH-1039

OK - Thanks for the explanation.  Based on the description in the ticket it seems like it would be better to have it as part of a utility, but that's just my 50K foot view.

Cheers,
- Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] AbstractIntegerDistribution.inverseCumulativeDistribution Unexpected Result

Posted by Ole Ersoy <ol...@gmail.com>.

> The variance of the DiceDistribution is wrong.
>
> Correct is the following: 2.91 or (70/24).
>
> You can verify the correct results also with a
> UniformIntegerDistribution(1, 6).
>
> Obviously the wrong variance was not noticed as the tests in this class
> do not compute the inverseCumulativeProbability.

I rewrote the tests and now they pass.  Just wanted to share the precision info.  If I make it smaller the tests that use it fail.

     	double precision = 0.000000000000001;
     	Assert.assertEquals(1, diceDistribution.inverseCumulativeProbability(0));
     	Assert.assertEquals(1, diceDistribution.inverseCumulativeProbability((1d-Double.MIN_VALUE)/6d));
     	Assert.assertEquals(2, diceDistribution.inverseCumulativeProbability((1d+precision)/6d));
     	Assert.assertEquals(2, diceDistribution.inverseCumulativeProbability((2d-Double.MIN_VALUE)/6d));
     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability((2d+precision)/6d));
     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability((3d-Double.MIN_VALUE)/6d));
     	Assert.assertEquals(4, diceDistribution.inverseCumulativeProbability((3d+precision)/6d));
     	Assert.assertEquals(4, diceDistribution.inverseCumulativeProbability((4d-Double.MIN_VALUE)/6d));
     	Assert.assertEquals(5, diceDistribution.inverseCumulativeProbability((4d+precision)/6d));
     	Assert.assertEquals(5, diceDistribution.inverseCumulativeProbability((5d-precision)/6d));//Can't use Double.MIN
     	Assert.assertEquals(6, diceDistribution.inverseCumulativeProbability((5d+precision)/6d));
     	Assert.assertEquals(6, diceDistribution.inverseCumulativeProbability((6d-precision)/6d));//Can't use Double.MIN
     	Assert.assertEquals(6, diceDistribution.inverseCumulativeProbability((6d)/6d));

Cheers,
- Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] AbstractIntegerDistribution.inverseCumulativeDistribution Unexpected Result

Posted by Ole Ersoy <ol...@gmail.com>.

  
> The variance of the DiceDistribution is wrong.
>
> Correct is the following: 2.91 or (70/24).
>
> You can verify the correct results also with a
> UniformIntegerDistribution(1, 6).
>
> Obviously the wrong variance was not noticed as the tests in this class
> do not compute the inverseCumulativeProbability.

Thanks Thomas!  Do you want me to send a pull request with updates to the test?

Cheers,
- Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] AbstractIntegerDistribution.inverseCumulativeDistribution Unexpected Result

Posted by Thomas Neidhart <th...@gmail.com>.

On 10/10/2014 10:24 PM, Ole Ersoy wrote:
> Hi,
> 
> I ran the following assertions on the dice distribution in the
> AbstractIntegerDistributionTest:
> 
>         Assert.assertEquals(3,
> diceDistribution.inverseCumulativeProbability(0.7d/6d));  //Expecting 1
>         Assert.assertEquals(3,
> diceDistribution.inverseCumulativeProbability(0.8d/6d));  //Expecting 1
>         Assert.assertEquals(3,
> diceDistribution.inverseCumulativeProbability(0.9d/6d));  //Expecting 1
>         Assert.assertEquals(3,
> diceDistribution.inverseCumulativeProbability(1d/6d));    //Expecting 2
>         Assert.assertEquals(3,
> diceDistribution.inverseCumulativeProbability(1.5d/6d));  //Expecting 2
>         Assert.assertEquals(3,
> diceDistribution.inverseCumulativeProbability(2d/6d));    //Expecting 2
>        
>         Assert.assertEquals(3,
> diceDistribution.inverseCumulativeProbability(2.9d/6d));  //Expecting 3
>         Assert.assertEquals(3,
> diceDistribution.inverseCumulativeProbability(3.0d/6d));  //Expecting 4
>         Assert.assertEquals(4,
> diceDistribution.inverseCumulativeProbability(3.1d/6d));
>        
>         Assert.assertEquals(4,
> diceDistribution.inverseCumulativeProbability(3.2d/6d));       
>         Assert.assertEquals(4,
> diceDistribution.inverseCumulativeProbability(3.5d/6d));
> 
>         Assert.assertEquals(4,
> diceDistribution.inverseCumulativeProbability(4.0d/6d)); //Expecting 5
>         Assert.assertEquals(4,
> diceDistribution.inverseCumulativeProbability(5.0d/6d)); //Expecting 5
>         Assert.assertEquals(4,
> diceDistribution.inverseCumulativeProbability(5.1d/6d)); //Expecting 5
>         Assert.assertEquals(5,
> diceDistribution.inverseCumulativeProbability(5.5d/6d));
>         Assert.assertEquals(6,
> diceDistribution.inverseCumulativeProbability(6.0d/6d));
> 
> All of them pass.  Thoughts?

The variance of the DiceDistribution is wrong.

Correct is the following: 2.91 or (70/24).

You can verify the correct results also with a
UniformIntegerDistribution(1, 6).

Obviously the wrong variance was not noticed as the tests in this class
do not compute the inverseCumulativeProbability.

Thomas

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

[Math] AbstractIntegerDistribution.inverseCumulativeDistribution Unexpected Result

Posted by Ole Ersoy <ol...@gmail.com>.

Hi,

I ran the following assertions on the dice distribution in the AbstractIntegerDistributionTest:

     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability(0.7d/6d));  //Expecting 1
     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability(0.8d/6d));  //Expecting 1
     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability(0.9d/6d));  //Expecting 1
     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability(1d/6d));    //Expecting 2
     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability(1.5d/6d));  //Expecting 2
     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability(2d/6d));    //Expecting 2
     	
     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability(2.9d/6d));  //Expecting 3
     	Assert.assertEquals(3, diceDistribution.inverseCumulativeProbability(3.0d/6d));  //Expecting 4
     	Assert.assertEquals(4, diceDistribution.inverseCumulativeProbability(3.1d/6d));
     	
     	Assert.assertEquals(4, diceDistribution.inverseCumulativeProbability(3.2d/6d));    	
     	Assert.assertEquals(4, diceDistribution.inverseCumulativeProbability(3.5d/6d));

     	Assert.assertEquals(4, diceDistribution.inverseCumulativeProbability(4.0d/6d)); //Expecting 5
     	Assert.assertEquals(4, diceDistribution.inverseCumulativeProbability(5.0d/6d)); //Expecting 5
     	Assert.assertEquals(4, diceDistribution.inverseCumulativeProbability(5.1d/6d)); //Expecting 5
     	Assert.assertEquals(5, diceDistribution.inverseCumulativeProbability(5.5d/6d));
     	Assert.assertEquals(6, diceDistribution.inverseCumulativeProbability(6.0d/6d));

All of them pass.  Thoughts?

Cheers,
- Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] AbstractIntegerDistribution.public double logProbability(int x) was Re: [Math] Separating Sampling from Distributions

Posted by Thomas Neidhart <th...@gmail.com>.

On Thu, Oct 9, 2014 at 6:03 PM, Ole Ersoy <ol...@gmail.com> wrote:

> I think that, for some distribution, the computation of log(p) is more
>>>> accurate. This was a feature request from not long ago.
>>>>
>>>>
>>> Just pasting the method below:
>>>
>>>      public double logProbability(int x) {
>>>          return FastMath.log(probability(x));
>>>      }
>>>
>>> So it retrieves the probability of the index and uses FastMath to compute
>>> the log.  We could do:
>>>
>>> FastMath(distribution.probability(x));
>>>
>>> It just seems a bit over engineered to have something this trivial in the
>>> API, unless I'm missing something.
>>>
>>>
>> This is the default implementation. For some distributions there is a more
>> efficient way to compute the logProbability directly, and for these
>> distributions the method is overridden (e.g. GeometricDistribution,
>> BinomialDistribution, ...).
>> Otherwise one would have to call probability(x) first, which in many cases
>> involves calling exp(...) and then compute the log of it, which is just a
>> waste of cpu.
>>
>
> So I guess my question is "Why is there value in having this as part of
> the public API of AbstractIntegerDistribution?".  Not really a big deal,
> but it just seems like it's an implementation detail.
>

We could not yet add it to the interface due to binary / source
compatibility rules, but it will be done for 4.0, see the ticket:
https://issues.apache.org/jira/browse/MATH-1039

Thomas

[Math] AbstractIntegerDistribution.public double logProbability(int x) was Re: [Math] Separating Sampling from Distributions

Posted by Ole Ersoy <ol...@gmail.com>.

>>> I think that, for some distribution, the computation of log(p) is more
>>> accurate. This was a feature request from not long ago.
>>>
>>
>> Just pasting the method below:
>>
>>      public double logProbability(int x) {
>>          return FastMath.log(probability(x));
>>      }
>>
>> So it retrieves the probability of the index and uses FastMath to compute
>> the log.  We could do:
>>
>> FastMath(distribution.probability(x));
>>
>> It just seems a bit over engineered to have something this trivial in the
>> API, unless I'm missing something.
>>
>
> This is the default implementation. For some distributions there is a more
> efficient way to compute the logProbability directly, and for these
> distributions the method is overridden (e.g. GeometricDistribution,
> BinomialDistribution, ...).
> Otherwise one would have to call probability(x) first, which in many cases
> involves calling exp(...) and then compute the log of it, which is just a
> waste of cpu.

So I guess my question is "Why is there value in having this as part of the public API of AbstractIntegerDistribution?".  Not really a big deal, but it just seems like it's an implementation detail.

Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Separating Sampling from Distributions

Posted by Thomas Neidhart <th...@gmail.com>.

On Thu, Oct 9, 2014 at 4:43 PM, Ole Ersoy <ol...@gmail.com> wrote:

>
>
> On 10/09/2014 08:43 AM, Gilles wrote:
>
>> On Thu, 09 Oct 2014 08:12:06 -0500, Ole Ersoy wrote:
>>
>>> Hello,
>>>
>>> Just sharing a few more thoughts on this WRT:
>>> https://issues.apache.org/jira/browse/MATH-1124
>>>
>>> (1) The issues currently are:
>>> You have to inject an RNG when using the constructor lengthening
>>> instantiation time and possibly increasing memory usage without
>>> benefit.
>>>
>>> (2)
>>> The design of the distribution is heavier than it needs to be.  For
>>> example if you subclass AbstractIntegerDistribution the code I pasted
>>> below, which I believe is used only in sampling, is included.  As a
>>> result of this anything that uses the distributions become heavier and
>>> more complicated than need be, including:
>>> - test code
>>> - subclasses
>>> - composites
>>> - etc.
>>>
>>> SAMPLING ONLY CODE IN AbstractIntegerDistribution
>>>
>>
>> I'm not sure for "AbstractIntegerDistribution", but
>> "inverseCumulativeProbability"
>> from "AbstractRealDistribution" is used in some classes in package
>> "o.a.c.m.stat".
>>
>
> I have not looked at this in detail, but is it possible that
> "o.a.c.m.stat" could be served in a more "Lightweight" way if this method
> were on a separate class in perhaps a o.a.c.m.montecarlo package?
>
> To me it seems more ideal to have something like
> AbstractIntegerDistributionSampler and AbstractRealDistributionSampler
> that own all the sampling related code.  I also prefer composition over
> inheritance when possible, but would have to look closer at how that might
> work.
>
>
>>  [...]
>>>
>>>
>>> Side Note:
>>> There is a logProbability method that just computes the log of a
>>> probability.  If someone needs to do this can't they just do
>>> FastMath.log(probability) directly?
>>>
>>
>> I think that, for some distribution, the computation of log(p) is more
>> accurate. This was a feature request from not long ago.
>>
>
> Just pasting the method below:
>
>     public double logProbability(int x) {
>         return FastMath.log(probability(x));
>     }
>
> So it retrieves the probability of the index and uses FastMath to compute
> the log.  We could do:
>
> FastMath(distribution.probability(x));
>
> It just seems a bit over engineered to have something this trivial in the
> API, unless I'm missing something.
>

This is the default implementation. For some distributions there is a more
efficient way to compute the logProbability directly, and for these
distributions the method is overridden (e.g. GeometricDistribution,
BinomialDistribution, ...).
Otherwise one would have to call probability(x) first, which in many cases
involves calling exp(...) and then compute the log of it, which is just a
waste of cpu.

Thomas

Re: [Math] Separating Sampling from Distributions

Posted by Ole Ersoy <ol...@gmail.com>.


On 10/09/2014 08:43 AM, Gilles wrote:
> On Thu, 09 Oct 2014 08:12:06 -0500, Ole Ersoy wrote:
>> Hello,
>>
>> Just sharing a few more thoughts on this WRT:
>> https://issues.apache.org/jira/browse/MATH-1124
>>
>> (1) The issues currently are:
>> You have to inject an RNG when using the constructor lengthening
>> instantiation time and possibly increasing memory usage without
>> benefit.
>>
>> (2)
>> The design of the distribution is heavier than it needs to be.  For
>> example if you subclass AbstractIntegerDistribution the code I pasted
>> below, which I believe is used only in sampling, is included.  As a
>> result of this anything that uses the distributions become heavier and
>> more complicated than need be, including:
>> - test code
>> - subclasses
>> - composites
>> - etc.
>>
>> SAMPLING ONLY CODE IN AbstractIntegerDistribution
>
> I'm not sure for "AbstractIntegerDistribution", but "inverseCumulativeProbability"
> from "AbstractRealDistribution" is used in some classes in package "o.a.c.m.stat".

I have not looked at this in detail, but is it possible that "o.a.c.m.stat" could be served in a more "Lightweight" way if this method were on a separate class in perhaps a o.a.c.m.montecarlo package?

To me it seems more ideal to have something like AbstractIntegerDistributionSampler and AbstractRealDistributionSampler that own all the sampling related code.  I also prefer composition over inheritance when possible, but would have to look closer at how that might work.

>
>> [...]
>>
>>
>> Side Note:
>> There is a logProbability method that just computes the log of a
>> probability.  If someone needs to do this can't they just do
>> FastMath.log(probability) directly?
>
> I think that, for some distribution, the computation of log(p) is more
> accurate. This was a feature request from not long ago.

Just pasting the method below:

     public double logProbability(int x) {
         return FastMath.log(probability(x));
     }

So it retrieves the probability of the index and uses FastMath to compute the log.  We could do:

FastMath(distribution.probability(x));

It just seems a bit over engineered to have something this trivial in the API, unless I'm missing something.

Cheers,
- Ole

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org

Re: [Math] Separating Sampling from Distributions

Posted by Gilles <gi...@harfang.homelinux.org>.

On Thu, 09 Oct 2014 08:12:06 -0500, Ole Ersoy wrote:
> Hello,
>
> Just sharing a few more thoughts on this WRT:
> https://issues.apache.org/jira/browse/MATH-1124
>
> (1) The issues currently are:
> You have to inject an RNG when using the constructor lengthening
> instantiation time and possibly increasing memory usage without
> benefit.
>
> (2)
> The design of the distribution is heavier than it needs to be.  For
> example if you subclass AbstractIntegerDistribution the code I pasted
> below, which I believe is used only in sampling, is included.  As a
> result of this anything that uses the distributions become heavier 
> and
> more complicated than need be, including:
> - test code
> - subclasses
> - composites
> - etc.
>
> SAMPLING ONLY CODE IN AbstractIntegerDistribution

I'm not sure for "AbstractIntegerDistribution", but 
"inverseCumulativeProbability"
from "AbstractRealDistribution" is used in some classes in package 
"o.a.c.m.stat".

> [...]
>
>
> Side Note:
> There is a logProbability method that just computes the log of a
> probability.  If someone needs to do this can't they just do
> FastMath.log(probability) directly?

I think that, for some distribution, the computation of log(p) is more
accurate. This was a feature request from not long ago.


Regards,
Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org