You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Phil Steitz <ph...@steitz.com> on 2004/05/12 07:43:04 UTC

[math] Getting 1.0 out the door -- tasks remaining

I am going to try to complete the missing users guides sections by the end 
of this week.  I will also rename and move the BeanList stuff (which has 
been fixed) and commit the reworked and expanded continuous distribution 
tests (similar to what I committed earlier this week for discrete)  By my 
reckoning, that leaves only the following to do before we can cut a beta / 
RC (I would prefer to start with a beta):

1) Decide what to do about inverse cumulative probabilities where p = 1 
(easy solution is to document and throw)

2) Decide what, if anything to do about the root-finding interfaces.  I am 
OK releasing as is.

3) Add SummaryStatisticsBean?  Any other improvements / renaming / 
refactoring of the beleaguered univariate statistics classes?

4) Decide what to do about RealMatrix rank.  Only reasonable solution at 
this point appears to be to drop it from the interface.  Alternatively, we 
could drop the interface altogether, possibly replacing with a factory 
pattern.  This last raises the dodgy issue of API consistency throughout 
commons-math.  More would be better, but would lead to more work ;-)

5) Get a last round of feedback on APIs, last pass through tests and javadoc.

6) Decide whether or not to add BigDecimalMatrix.

Am I missing anything?

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Getting 1.0 out the door -- tasks remaining

Posted by Phil Steitz <ph...@steitz.com>.
Phil Steitz wrote:
> J.Pietschmann wrote:
> 
>> Phil Steitz wrote:
>>
>>> 1) Decide what to do about inverse cumulative probabilities where p = 
>>> 1 (easy solution is to document and throw)
>>
>>
>>
>> Nearly +1
>>
> 
> My own "nearly +1" on this just turned to -1.  After looking some more 
> at the code and thinking some more, I think that both p=1 and p=0 should 
> be handled correctly in all cases.  The difficult cases are when the 
> probability density function has unbounded support.  Here is what I 
> propose for the values of inverseCumulativeProbability() at p=0 and p=1 
> for current distributions.  Unless otherwise noted, these values are 
> intented to be independent of distribution parameters.
> 
> Distribution         p=0                     p=1
> ------------------------------------------------------------------
> Binomial               0               Integer.MAX_VALUE
> Chisquare              0               Double.POSITIVE_INFINITY
> Exponential            0               Double.POSITIVE_INFINITY
> F                      0               Double.POSITIVE_INFINITY
> Gamma                  0               Double.POSITIVE_INFINITY
> HyperGeometric         0               finite, parameter-dependent
> Normal       Double.NEGATIVE_INFINITY  Double.POSITIVE_INFINITY
> T            Double.NEGATIVE_INFINITY  Double.POSITIVE_INFINITY
> 
> Other than the value for Chisquare with p=1 (which causes R to hang), 
> these values are consistent with what R returns using the q* functions. 
> It might be more convenient to return Double.MAX_VALUE, 
> -Double.MAX_VALUE in place of the INFINITY's (since then we could just 
> use getDomainLowerBound at 0 and 1) but this would not be correct 
> mathematically.  If there are no objections, I will find a way to get 
> the values above returned.

I have committed changes and tests to ensure that the values in the table 
above are returned, modulo correcting the following mistakes:

Both of the discrete distributions (Binomial and Hypergeometric) should 
return -1 for the inverseCumulativeProbability(0).  The definition that we 
are using is that inverseCumulativeProbability(p) = the largest x such that
P(X <= x) <= p.

Since 0 has positive probability for both the Binomial and Hypergeometric 
distributions, and the function is integer-valued, the correct value to 
return in these cases is actually -1, not 0.
> 
> Phil
> 
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Getting 1.0 out the door -- tasks remaining

Posted by Al Chou <ho...@yahoo.com>.
--- Phil Steitz <ph...@steitz.com> wrote:
> J.Pietschmann wrote:
> > Phil Steitz wrote:
> > 
> >> 1) Decide what to do about inverse cumulative probabilities where p = 
> >> 1 (easy solution is to document and throw)
> > 
> > 
> > Nearly +1
> > 
> 
> My own "nearly +1" on this just turned to -1.  After looking some more at 
> the code and thinking some more, I think that both p=1 and p=0 should be 
> handled correctly in all cases.  The difficult cases are when the 
> probability density function has unbounded support.  Here is what I 
> propose for the values of inverseCumulativeProbability() at p=0 and p=1 
> for current distributions.  Unless otherwise noted, these values are 
> intented to be independent of distribution parameters.
> 
> Distribution         p=0                     p=1
> ------------------------------------------------------------------
> Binomial               0               Integer.MAX_VALUE
> Chisquare              0               Double.POSITIVE_INFINITY
> Exponential            0               Double.POSITIVE_INFINITY
> F                      0               Double.POSITIVE_INFINITY
> Gamma                  0               Double.POSITIVE_INFINITY
> HyperGeometric         0               finite, parameter-dependent
> Normal       Double.NEGATIVE_INFINITY  Double.POSITIVE_INFINITY
> T            Double.NEGATIVE_INFINITY  Double.POSITIVE_INFINITY
> 
> Other than the value for Chisquare with p=1 (which causes R to hang), 
> these values are consistent with what R returns using the q* functions. 
> It might be more convenient to return Double.MAX_VALUE, -Double.MAX_VALUE 
> in place of the INFINITY's (since then we could just use 
> getDomainLowerBound at 0 and 1) but this would not be correct 
> mathematically.  If there are no objections, I will find a way to get the 
> values above returned.

+1 to the values in the table above.  As a user I would prefer to be returned
an infinity rather than MAX_VALUE where possible (it's too bad the integer
types don't provide infinity values), because even though I would often
recognize 1e+308 or thereabouts as Double.POSITIVE_INFINITY, I would still have
to do that conversion mentally, and I would always wonder whether the returned
value was actually MAX_VALUE or just the implementation-dependent
representation of POSITIVE_INFINITY.  Also consider what would happen if the
data type were changed to float.  Then if MAX_VALUE were used, the numeric
value returned for p = 1 would differ depending on the data type.  With the
infinity values, although there's a class difference between
Double.POSITIVE_INFINITY and Float.POSITIVE_INFINITY, the concept is clearly
identical.  It's strange that BigDecimal doesn't provide infinity values,
though.  Maybe that's something Commons should address at some point.


Al


	
		
__________________________________
Do you Yahoo!?
Friends.  Fun.  Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Getting 1.0 out the door -- tasks remaining

Posted by Phil Steitz <ph...@steitz.com>.
J.Pietschmann wrote:
> Phil Steitz wrote:
> 
>> 1) Decide what to do about inverse cumulative probabilities where p = 
>> 1 (easy solution is to document and throw)
> 
> 
> Nearly +1
> 

My own "nearly +1" on this just turned to -1.  After looking some more at 
the code and thinking some more, I think that both p=1 and p=0 should be 
handled correctly in all cases.  The difficult cases are when the 
probability density function has unbounded support.  Here is what I 
propose for the values of inverseCumulativeProbability() at p=0 and p=1 
for current distributions.  Unless otherwise noted, these values are 
intented to be independent of distribution parameters.

Distribution         p=0                     p=1
------------------------------------------------------------------
Binomial               0               Integer.MAX_VALUE
Chisquare              0               Double.POSITIVE_INFINITY
Exponential            0               Double.POSITIVE_INFINITY
F                      0               Double.POSITIVE_INFINITY
Gamma                  0               Double.POSITIVE_INFINITY
HyperGeometric         0               finite, parameter-dependent
Normal       Double.NEGATIVE_INFINITY  Double.POSITIVE_INFINITY
T            Double.NEGATIVE_INFINITY  Double.POSITIVE_INFINITY

Other than the value for Chisquare with p=1 (which causes R to hang), 
these values are consistent with what R returns using the q* functions. 
It might be more convenient to return Double.MAX_VALUE, -Double.MAX_VALUE 
in place of the INFINITY's (since then we could just use 
getDomainLowerBound at 0 and 1) but this would not be correct 
mathematically.  If there are no objections, I will find a way to get the 
values above returned.

Phil






---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Getting 1.0 out the door -- tasks remaining

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Phil Steitz wrote:
>>> root-finding interfaces.  
...
> Does that mean that you think we need to change the interfaces.  If so, 
> how exactly?  Along the lines that I suggested earlier (stateless, value 
> objects returned)?

Actually I don't know how to proceed. It would be nice to have a common
pattern for to interfaces for solving non-linear equations (aka root
finding), solving systems of linear equations, interpolation and
perhaps some functions from the stat area. OTOH, each problem has some
unique aspects and performance tradeoffs (in terms of copying stuff
and perhaps more), and I have no good idea how to get this unified
while keeping it reasonably simple and intuitive. Duh!

J.Pietschmann


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Getting 1.0 out the door -- tasks remaining

Posted by Phil Steitz <ph...@steitz.com>.
J.Pietschmann wrote:

> 
>> 2) Decide what, if anything to do about the root-finding interfaces.  
>> I am OK releasing as is.
> 
> 
> Uh, oh!

Does that mean that you think we need to change the interfaces.  If so, 
how exactly?  Along the lines that I suggested earlier (stateless, value 
objects returned)?

Phil
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Getting 1.0 out the door -- tasks remaining

Posted by "J.Pietschmann" <j3...@yahoo.de>.
Phil Steitz wrote:
> 1) Decide what to do about inverse cumulative probabilities where p = 1 
> (easy solution is to document and throw)

Nearly +1

> 2) Decide what, if anything to do about the root-finding interfaces.  I 
> am OK releasing as is.

Uh, oh!

> 4) Decide what to do about RealMatrix rank.  Only reasonable solution at 
> this point appears to be to drop it from the interface.

I'd vote for dropping it. A robust implementation would require
SVD, which is quite complex in itself, and I personally never found
a real usage for a matrix rank unless it dropped out of a related
computation as a side effect anyway.

> 6) Decide whether or not to add BigDecimalMatrix.

I'm undecided; if the unit tests are up to a decent coverage, I
think it could be included.

J.Pietschmann

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org