You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Sébastien Brisard <se...@m4x.org> on 2011/11/28 08:18:32 UTC

[math] Inconsistencies (bugs) in PascalDistribution?

Hello,
while working on MATH-711, I think I stumbled upon some
inconsistencies in the implementation of the Pascal distribution. In
fact, these might well be a bug, but since I'm a bit rusty on
probabilities, I would be grateful for some feedback.
Here is the thing. The header of the Javadoc states that the random
variable (say X) being represented by this class is the number of
*failures*. First of all, I'm not questioning this convention, but I
think the current Javadoc is slightly confusing, because it refers to
the Wikipedia website, where the opposite convention is adopted (X is
the number of *successes*). This would not matter too much, but the
Javadoc would require some alterations -- I think.
More disturbing: the names of the parameters of the distribution are
not consistent with the wikipedia page being referred to:
  - p is the probability of success in both cases: OK,
  - r is the number of failures in Wikipedia, but the number of
successes in CM... I would suggest that we at least rename this
parameter s or something. Except if it is generally accepted by the
"Pascal community" that the first parameter should always be called r,
no matter the convention. In which case I would recommend that the
references to Wikipedia be removed.

Finally, the bug. According to my hand-calcs, I think that with the
notations of CM, the mean of X is given by
mean(X) = r * (1 - p) / p,
while the currently implemented formula is
r * p / (1 - p),
which is actually the formula corresponding to the Wikipedia convention.

Here is a very naive piece of code supporting my calcs
{code}
public class PascalDistributionDemo {
    public static void main(String[] args) {
        final int r = 10;
        final double p = 0.2;
        final int numTerms = 1000;
        final PascalDistribution distribution = new PascalDistribution(r, p);
        double mean = 0.;
        for (int k = numTerms - 1; k >= 0; k--) {
            mean += k * distribution.probability(k);
        }
        // The following prints 40.00000000000012
        System.out.println("Estimate of the mean = " + mean);
        // The following prints 2.5
        System.out.println("CM implementation = " +
                           distribution.getNumericalMean());
        // The following prints 2.5
        System.out.println("r * p / (1 - p) = " + (r * p / (1 - p)));
        // The following prints 40.0
        System.out.println("r * (1 - p) / p = " + (r * (1 - p) / p));
    }
}
{code}

Again, I'm not an expert in this area, so I might be very, very wrong,
and I apologize beforehand if that's the case. If I'm right, though, I
think that
1. the Javadoc needs to be seriously rewritten, and the convention
adopted *must* be stated very clearly,
2. the implementation needs thorough verification.

I'm happy to do that and open a ticket on this issue once you give me the go...

Best,
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Phil Steitz <ph...@gmail.com>.
On 11/28/11 12:18 AM, Sébastien Brisard wrote:
> Hello,
> while working on MATH-711, I think I stumbled upon some
> inconsistencies in the implementation of the Pascal distribution. In
> fact, these might well be a bug, but since I'm a bit rusty on
> probabilities, I would be grateful for some feedback.
> Here is the thing. The header of the Javadoc states that the random
> variable (say X) being represented by this class is the number of
> *failures*. First of all, I'm not questioning this convention, but I
> think the current Javadoc is slightly confusing, because it refers to
> the Wikipedia website, where the opposite convention is adopted (X is
> the number of *successes*). This would not matter too much, but the
> Javadoc would require some alterations -- I think.
> More disturbing: the names of the parameters of the distribution are
> not consistent with the wikipedia page being referred to:
>   - p is the probability of success in both cases: OK,
>   - r is the number of failures in Wikipedia, but the number of
> successes in CM... I would suggest that we at least rename this
> parameter s or something. Except if it is generally accepted by the
> "Pascal community" that the first parameter should always be called r,
> no matter the convention. In which case I would recommend that the
> references to Wikipedia be removed.
>
> Finally, the bug. According to my hand-calcs, I think that with the
> notations of CM, the mean of X is given by
> mean(X) = r * (1 - p) / p,
> while the currently implemented formula is
> r * p / (1 - p),
> which is actually the formula corresponding to the Wikipedia convention.
>
> Here is a very naive piece of code supporting my calcs
> {code}
> public class PascalDistributionDemo {
>     public static void main(String[] args) {
>         final int r = 10;
>         final double p = 0.2;
>         final int numTerms = 1000;
>         final PascalDistribution distribution = new PascalDistribution(r, p);
>         double mean = 0.;
>         for (int k = numTerms - 1; k >= 0; k--) {
>             mean += k * distribution.probability(k);
>         }
>         // The following prints 40.00000000000012
>         System.out.println("Estimate of the mean = " + mean);
>         // The following prints 2.5
>         System.out.println("CM implementation = " +
>                            distribution.getNumericalMean());
>         // The following prints 2.5
>         System.out.println("r * p / (1 - p) = " + (r * p / (1 - p)));
>         // The following prints 40.0
>         System.out.println("r * (1 - p) / p = " + (r * (1 - p) / p));
>     }
> }
> {code}
>
> Again, I'm not an expert in this area, so I might be very, very wrong,
> and I apologize beforehand if that's the case. If I'm right, though, I
> think that
> 1. the Javadoc needs to be seriously rewritten, and the convention
> adopted *must* be stated very clearly,
> 2. the implementation needs thorough verification.
>
> I'm happy to do that and open a ticket on this issue once you give me the go...

Go ahead an open a ticket. 

Phil
>
> Best,
> Sébastien
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Mikkel Meyer Andersen <mi...@mikl.dk>.
2011/12/5 Sébastien Brisard <se...@m4x.org>:
> Hi Mikkel,
>> It seems like only the test was changed in r1210359 (svn diff -r
>> 1210359). This does _not_ correspond to the log (svn log -r 1210359).
>
> Here is an excerpt of the email sent automatically when I committed
> revision 1210359
>
> ==========BEGIN EXCERPT==========
> Author: celestin
> Date: Mon Dec  5 08:15:38 2011
> New Revision: 1210359
>
> URL: http://svn.apache.org/viewvc?rev=1210359&view=rev
> Log:
> - Corrected expressions for mean and variance in
> distribution.PascalDistribution (MATH-715).
> - Made javadoc more explicit
> - Restored SVN properties to various files in package distribution.
>
> ....................
>
> Modified: commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
> URL: http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java?rev=1210359&r1=1210358&r2=1210359&view=diff
> ==============================================================================
> --- commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
> (original)
> +++ commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
> Mon Dec  5 08:15:38 2011
> @@ -31,10 +31,25 @@ import org.apache.commons.math.util.Fast
>  * </p>
>  * <p>
>  * There are various ways to express the probability mass and distribution
> - * functions for the Pascal distribution.  The convention employed by the
> - * library is to express these functions in terms of the number of failures in
> - * a Bernoulli experiment
> - * (see <a href="http://en.wikipedia.org/wiki/Negative_binomial_distribution#Waiting_time_in_a_Bernoulli_process">Waiting
> Time in a Bernoulli Process</a>).
> + * functions for the Pascal distribution. The present implementation represents
> + * the distribution of the number of failures before {@code r} successes occur.
> + * This is the convention adopted in e.g.
> + * <a href="http://mathworld.wolfram.com/NegativeBinomialDistribution.html">MathWorld</a>,
> + * but <em>not</em> in
> + * <a href="http://en.wikipedia.org/wiki/Negative_binomial_distribution">Wikipedia</a>.
> + * </p>
> + * <p>
> + * For a random variable {@code X} whose values are distributed
> according to this
> + * distribution, the probability mass function is given by<br/>
> + * {@code P(X = k) = C(k + r - 1, r - 1) * p^r * (1 - p)^k,}<br/>
> + * where {@code r} is the number of successes, {@code p} is the probability of
> + * success, and {@code X} is the total number of failures. {@code C(n, k)} is
> + * the binomial coefficient ({@code n} choose {@code k}). The mean and variance
> + * of {@code X} are<br/>
> + * {@code E(X) = (1 - p) * r / p, var(X) = (1 - p) * r / p^2.}<br/>
> + * Finally, the cumulative distribution function is given by<br/>
> + * {@code P(X <= k) = I(p, r, k + 1)},
> + * where I is the regularized incomplete Beta function.
>  * </p>
>  *
>  * @see <a href="http://en.wikipedia.org/wiki/Negative_binomial_distribution">
> @@ -159,25 +174,24 @@ public class PascalDistribution extends
>     * {@inheritDoc}
>     *
>     * For number of successes {@code r} and probability of success {@code p},
> -     * the mean is {@code (r * p) / (1 - p)}.
> +     * the mean is {@code r * (1 - p) / p}.
>     */
>    public double getNumericalMean() {
>        final double p = getProbabilityOfSuccess();
>        final double r = getNumberOfSuccesses();
> -        return (r * p) / (1 - p);
> +        return (r * (1 - p)) / p;
>    }
>
>    /**
>     * {@inheritDoc}
>     *
>     * For number of successes {@code r} and probability of success {@code p},
> -     * the variance is {@code (r * p) / (1 - p)^2}.
> +     * the variance is {@code r * (1 - p) / p^2}.
>     */
>    public double getNumericalVariance() {
>        final double p = getProbabilityOfSuccess();
>        final double r = getNumberOfSuccesses();
> -        final double pInv = 1 - p;
> -        return (r * p) / (pInv * pInv);
> +        return r * (1 - p) / (p * p);
>    }
>
> ==========END EXCERPT==========
>
> It seems to me that everything (?) is OK?
>
>>
>> It seems like the changes was done in r1210358. Is that true?
>>
> I don't think so (see extract below).
>
> ==========BEGIN EXCERPT==========
>
> Author: sgoeschl
> Date: Mon Dec  5 08:02:46 2011
> New Revision: 1210358
>
> URL: http://svn.apache.org/viewvc?rev=1210358&view=rev
> Log:
> [maven-release-plugin] prepare for next development iteration
>
> Modified:
>   commons/proper/email/trunk/pom.xml
>
> ==========END EXCERPT==========
>
> I don't really understand what is happening. I hope I didn't mess up
> anything this morning. Please keep me posted.
> Sébastien
>
Hi,

Now I looked it through and it looks good to me - and also the SVN
behaves fine again (it could also be mine that acted weird).

Thanks again for fixing this.

Cheers, Mikkel.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Sébastien Brisard <se...@m4x.org>.
Hi Mikkel,
> It seems like only the test was changed in r1210359 (svn diff -r
> 1210359). This does _not_ correspond to the log (svn log -r 1210359).

Here is an excerpt of the email sent automatically when I committed
revision 1210359

==========BEGIN EXCERPT==========
Author: celestin
Date: Mon Dec  5 08:15:38 2011
New Revision: 1210359

URL: http://svn.apache.org/viewvc?rev=1210359&view=rev
Log:
- Corrected expressions for mean and variance in
distribution.PascalDistribution (MATH-715).
- Made javadoc more explicit
- Restored SVN properties to various files in package distribution.

....................

Modified: commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
URL: http://svn.apache.org/viewvc/commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java?rev=1210359&r1=1210358&r2=1210359&view=diff
==============================================================================
--- commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
(original)
+++ commons/proper/math/trunk/src/main/java/org/apache/commons/math/distribution/PascalDistribution.java
Mon Dec  5 08:15:38 2011
@@ -31,10 +31,25 @@ import org.apache.commons.math.util.Fast
 * </p>
 * <p>
 * There are various ways to express the probability mass and distribution
- * functions for the Pascal distribution.  The convention employed by the
- * library is to express these functions in terms of the number of failures in
- * a Bernoulli experiment
- * (see <a href="http://en.wikipedia.org/wiki/Negative_binomial_distribution#Waiting_time_in_a_Bernoulli_process">Waiting
Time in a Bernoulli Process</a>).
+ * functions for the Pascal distribution. The present implementation represents
+ * the distribution of the number of failures before {@code r} successes occur.
+ * This is the convention adopted in e.g.
+ * <a href="http://mathworld.wolfram.com/NegativeBinomialDistribution.html">MathWorld</a>,
+ * but <em>not</em> in
+ * <a href="http://en.wikipedia.org/wiki/Negative_binomial_distribution">Wikipedia</a>.
+ * </p>
+ * <p>
+ * For a random variable {@code X} whose values are distributed
according to this
+ * distribution, the probability mass function is given by<br/>
+ * {@code P(X = k) = C(k + r - 1, r - 1) * p^r * (1 - p)^k,}<br/>
+ * where {@code r} is the number of successes, {@code p} is the probability of
+ * success, and {@code X} is the total number of failures. {@code C(n, k)} is
+ * the binomial coefficient ({@code n} choose {@code k}). The mean and variance
+ * of {@code X} are<br/>
+ * {@code E(X) = (1 - p) * r / p, var(X) = (1 - p) * r / p^2.}<br/>
+ * Finally, the cumulative distribution function is given by<br/>
+ * {@code P(X <= k) = I(p, r, k + 1)},
+ * where I is the regularized incomplete Beta function.
 * </p>
 *
 * @see <a href="http://en.wikipedia.org/wiki/Negative_binomial_distribution">
@@ -159,25 +174,24 @@ public class PascalDistribution extends
     * {@inheritDoc}
     *
     * For number of successes {@code r} and probability of success {@code p},
-     * the mean is {@code (r * p) / (1 - p)}.
+     * the mean is {@code r * (1 - p) / p}.
     */
    public double getNumericalMean() {
        final double p = getProbabilityOfSuccess();
        final double r = getNumberOfSuccesses();
-        return (r * p) / (1 - p);
+        return (r * (1 - p)) / p;
    }

    /**
     * {@inheritDoc}
     *
     * For number of successes {@code r} and probability of success {@code p},
-     * the variance is {@code (r * p) / (1 - p)^2}.
+     * the variance is {@code r * (1 - p) / p^2}.
     */
    public double getNumericalVariance() {
        final double p = getProbabilityOfSuccess();
        final double r = getNumberOfSuccesses();
-        final double pInv = 1 - p;
-        return (r * p) / (pInv * pInv);
+        return r * (1 - p) / (p * p);
    }

==========END EXCERPT==========

It seems to me that everything (?) is OK?

>
> It seems like the changes was done in r1210358. Is that true?
>
I don't think so (see extract below).

==========BEGIN EXCERPT==========

Author: sgoeschl
Date: Mon Dec  5 08:02:46 2011
New Revision: 1210358

URL: http://svn.apache.org/viewvc?rev=1210358&view=rev
Log:
[maven-release-plugin] prepare for next development iteration

Modified:
   commons/proper/email/trunk/pom.xml

==========END EXCERPT==========

I don't really understand what is happening. I hope I didn't mess up
anything this morning. Please keep me posted.
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Sébastien Brisard <se...@m4x.org>.
Hi Mikkel,
>
> This is not to bash, but merely to try to
> stress the importance of keeping a stringent svn history.
>
> Cheers, Mikkel.
>
Thanks for pointing that out. I have run into numerous problems with
SVN this morning, and, although I'm sure I've committed with a log
(it's actually still recorded in eclipse), I would not be surprised if
this log had gone. I will correct that immediately. Thanks again,
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Mikkel Meyer Andersen <mi...@mikl.dk>.
2011/12/5 Sébastien Brisard <se...@m4x.org>:
> Hi Mikkel,
>>
>> No, you are more than welcome to do the patch and I'll review it :-). Thanks!
>>
>> Cheers, Mikkel.
>>
> I've committed a correction (r1210359). Could you please review it and
> post your comments on the JIRA ticket (MATH-715).
> Thanks a lot!
> Sébastien
>
Dear Sébastien,

It seems like only the test was changed in r1210359 (svn diff -r
1210359). This does _not_ correspond to the log (svn log -r 1210359).
It seems like the changes was done in r1210358. Is that true? svn log
is empty for r1210358. This is not to bash, but merely to try to
stress the importance of keeping a stringent svn history.

Cheers, Mikkel.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Sébastien Brisard <se...@m4x.org>.
Hi Mikkel,
>
> No, you are more than welcome to do the patch and I'll review it :-). Thanks!
>
> Cheers, Mikkel.
>
I've committed a correction (r1210359). Could you please review it and
post your comments on the JIRA ticket (MATH-715).
Thanks a lot!
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Mikkel Meyer Andersen <mi...@mikl.dk>.
2011/11/29 Sébastien Brisard <se...@m4x.org>:
> Hi Mikkel
>>
>> Thanks for discovering this! I did the implementation, and apparently
>> assumed notation as the referred source. Sorry for this.
>>
> I think there is nothing wrong in the source. Only, you adopted a
> different definition of the random variable represented by the
> implemented PascalDistribution:
> 1. In wikipedia, the r-parameter is the total number of failures,
> 2. In the CM implementation, the Javadoc states that r is the total
> number of successes, and the PMF is (correctly) derived accordingly.
>>
>> We should also check the variance - this might be suffering from wrong
>> notation as well.
>>
> Yes, I'll do that, too!
>>
>> Cheers, Mikkel.
>>
>>
> Now is not a good time to work on o.a.c.m.distribution, since
> Christian is working on MATH-703. Once this is done, I'll commit a
> patch to correct this. You will be very welcome to review the changes
> (except of course if you want to alter your baby yourself ;) ).
No, you are more than welcome to do the patch and I'll review it :-). Thanks!

Cheers, Mikkel.
> Sébastien
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Sébastien Brisard <se...@m4x.org>.
Hi Mikkel
>
> Thanks for discovering this! I did the implementation, and apparently
> assumed notation as the referred source. Sorry for this.
>
I think there is nothing wrong in the source. Only, you adopted a
different definition of the random variable represented by the
implemented PascalDistribution:
1. In wikipedia, the r-parameter is the total number of failures,
2. In the CM implementation, the Javadoc states that r is the total
number of successes, and the PMF is (correctly) derived accordingly.
>
> We should also check the variance - this might be suffering from wrong
> notation as well.
>
Yes, I'll do that, too!
>
> Cheers, Mikkel.
>
>
Now is not a good time to work on o.a.c.m.distribution, since
Christian is working on MATH-703. Once this is done, I'll commit a
patch to correct this. You will be very welcome to review the changes
(except of course if you want to alter your baby yourself ;) ).
Sébastien


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Mikkel Meyer Andersen <mi...@mikl.dk>.
2011/11/28 Phil Steitz <ph...@gmail.com>:
> On 11/28/11 12:18 AM, Sébastien Brisard wrote:
>> Hello,
>> while working on MATH-711, I think I stumbled upon some
>> inconsistencies in the implementation of the Pascal distribution. In
>> fact, these might well be a bug, but since I'm a bit rusty on
>> probabilities, I would be grateful for some feedback.
>> Here is the thing. The header of the Javadoc states that the random
>> variable (say X) being represented by this class is the number of
>> *failures*. First of all, I'm not questioning this convention, but I
>> think the current Javadoc is slightly confusing, because it refers to
>> the Wikipedia website, where the opposite convention is adopted (X is
>> the number of *successes*). This would not matter too much, but the
>> Javadoc would require some alterations -- I think.
>> More disturbing: the names of the parameters of the distribution are
>> not consistent with the wikipedia page being referred to:
>>   - p is the probability of success in both cases: OK,
>>   - r is the number of failures in Wikipedia, but the number of
>> successes in CM... I would suggest that we at least rename this
>> parameter s or something. Except if it is generally accepted by the
>> "Pascal community" that the first parameter should always be called r,
>> no matter the convention. In which case I would recommend that the
>> references to Wikipedia be removed.
>>
>> Finally, the bug. According to my hand-calcs, I think that with the
>> notations of CM, the mean of X is given by
>> mean(X) = r * (1 - p) / p,
>> while the currently implemented formula is
>> r * p / (1 - p),
>> which is actually the formula corresponding to the Wikipedia convention.
>>
>> Here is a very naive piece of code supporting my calcs
>> {code}
>> public class PascalDistributionDemo {
>>     public static void main(String[] args) {
>>         final int r = 10;
>>         final double p = 0.2;
>>         final int numTerms = 1000;
>>         final PascalDistribution distribution = new PascalDistribution(r, p);
>>         double mean = 0.;
>>         for (int k = numTerms - 1; k >= 0; k--) {
>>             mean += k * distribution.probability(k);
>>         }
>>         // The following prints 40.00000000000012
>>         System.out.println("Estimate of the mean = " + mean);
>>         // The following prints 2.5
>>         System.out.println("CM implementation = " +
>>                            distribution.getNumericalMean());
>>         // The following prints 2.5
>>         System.out.println("r * p / (1 - p) = " + (r * p / (1 - p)));
>>         // The following prints 40.0
>>         System.out.println("r * (1 - p) / p = " + (r * (1 - p) / p));
>>     }
>> }
>> {code}
>>
>> Again, I'm not an expert in this area, so I might be very, very wrong,
>> and I apologize beforehand if that's the case. If I'm right, though, I
>> think that
>> 1. the Javadoc needs to be seriously rewritten, and the convention
>> adopted *must* be stated very clearly,
>> 2. the implementation needs thorough verification.
>
> I looked a little more carefully at this.  I think the mean formula
> is incorrect, so this is a bug.  Regarding the convention used, it
> is clearly stated in the class javadoc, but you are right it should
> be made more clear exactly what is being computed.  I am -0 for
> changing the implementation, because that would force current users
> to switch success/failure.  It would be best, I think to add full
> formulas to the javadoc.  I don't think it is necessary to use the
> same letters as the references do.
>
> Phil
>>
>> I'm happy to do that and open a ticket on this issue once you give me the go...
>>
>> Best,
>> Sébastien
>>
>>
Thanks for discovering this! I did the implementation, and apparently
assumed notation as the referred source. Sorry for this.

We should also check the variance - this might be suffering from wrong
notation as well.

Cheers, Mikkel.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] Inconsistencies (bugs) in PascalDistribution?

Posted by Phil Steitz <ph...@gmail.com>.
On 11/28/11 12:18 AM, Sébastien Brisard wrote:
> Hello,
> while working on MATH-711, I think I stumbled upon some
> inconsistencies in the implementation of the Pascal distribution. In
> fact, these might well be a bug, but since I'm a bit rusty on
> probabilities, I would be grateful for some feedback.
> Here is the thing. The header of the Javadoc states that the random
> variable (say X) being represented by this class is the number of
> *failures*. First of all, I'm not questioning this convention, but I
> think the current Javadoc is slightly confusing, because it refers to
> the Wikipedia website, where the opposite convention is adopted (X is
> the number of *successes*). This would not matter too much, but the
> Javadoc would require some alterations -- I think.
> More disturbing: the names of the parameters of the distribution are
> not consistent with the wikipedia page being referred to:
>   - p is the probability of success in both cases: OK,
>   - r is the number of failures in Wikipedia, but the number of
> successes in CM... I would suggest that we at least rename this
> parameter s or something. Except if it is generally accepted by the
> "Pascal community" that the first parameter should always be called r,
> no matter the convention. In which case I would recommend that the
> references to Wikipedia be removed.
>
> Finally, the bug. According to my hand-calcs, I think that with the
> notations of CM, the mean of X is given by
> mean(X) = r * (1 - p) / p,
> while the currently implemented formula is
> r * p / (1 - p),
> which is actually the formula corresponding to the Wikipedia convention.
>
> Here is a very naive piece of code supporting my calcs
> {code}
> public class PascalDistributionDemo {
>     public static void main(String[] args) {
>         final int r = 10;
>         final double p = 0.2;
>         final int numTerms = 1000;
>         final PascalDistribution distribution = new PascalDistribution(r, p);
>         double mean = 0.;
>         for (int k = numTerms - 1; k >= 0; k--) {
>             mean += k * distribution.probability(k);
>         }
>         // The following prints 40.00000000000012
>         System.out.println("Estimate of the mean = " + mean);
>         // The following prints 2.5
>         System.out.println("CM implementation = " +
>                            distribution.getNumericalMean());
>         // The following prints 2.5
>         System.out.println("r * p / (1 - p) = " + (r * p / (1 - p)));
>         // The following prints 40.0
>         System.out.println("r * (1 - p) / p = " + (r * (1 - p) / p));
>     }
> }
> {code}
>
> Again, I'm not an expert in this area, so I might be very, very wrong,
> and I apologize beforehand if that's the case. If I'm right, though, I
> think that
> 1. the Javadoc needs to be seriously rewritten, and the convention
> adopted *must* be stated very clearly,
> 2. the implementation needs thorough verification.

I looked a little more carefully at this.  I think the mean formula
is incorrect, so this is a bug.  Regarding the convention used, it
is clearly stated in the class javadoc, but you are right it should
be made more clear exactly what is being computed.  I am -0 for
changing the implementation, because that would force current users
to switch success/failure.  It would be best, I think to add full
formulas to the javadoc.  I don't think it is necessary to use the
same letters as the references do.

Phil
>
> I'm happy to do that and open a ticket on this issue once you give me the go...
>
> Best,
> Sébastien
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org