You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Phil Steitz <ph...@steitz.com> on 2003/05/21 16:10:41 UTC

[math] Priorities, help needed

I am working on getting myself set up with Maven, but I wanted to get 
this list out to any who might be willing to a) contribute or b) comment 
on priorities or direction.

The proposal presents the following initial scope:

     * Simple univariate statistics (mean, standard deviation, n,
       confidence intervals)
     * Frequency distributions
     * t-test, chi-square test
     * Random numbers from Gaussian, Exponential, Poisson distributions
     * Random sampling/resampling
     * Bivariate regression, corellation

and mathematical algorithms such as the following:

     * Basic Complex Number representation with algebraic operations
     * Newton's method for finding roots
     * Binomial coefficients
     * Exponential growth and decay (set up for financial applications)
     * Polynomial Interpolation (curve fitting)
     * Basic Matrix representation with algebraic operations

The following items need completion:

* Univariate needs confidence intervals.  I would recommend doing this
   by first defining a t-statistic in TestStatistic and then using it.
   This is very simple. "Nice to haves" (IMHO) for Univariate would be
   addition of quantiles (1,5,10,25,50,75,90,95,99) and boostrap
   confidence intervals for the versions that
   store data and maybe higher order moments (if possible) for
   UnivariateImpl. I would prioritize the quantiles (most important) and
   t-based confidence intervals over the higher order moments or
   bootstrap confidence intervals.

* t-test statistic needs to be added and we should probably add the
   capability of actually performing t- and chi-square tests at fixed
   significance levels (.1, .05, .01, .001).  Down the road, numerical
   approximation of the t- and chi-square distributions could be added to
   enable user-supplied significance levels. Also, more tests.

* the RealMatrixImpl class is missing some key method implementations.
   The critical thing is inversion.  We need to implement a numerically
   sound inversion algorithm.  This will enable solve() and also
   support general linear regression.

The following items have no submitted implementation.  I will continue 
to submit solutions for these things, but obviously we need more, 
better, faster:-)

* ComplexNumber interface and implementation.  The only tricky thing
   here is making division numerically sound and what extended value
   topology to adopt.  If no one else jumps on this, I will submit a
   cleaned up version of what I have, along with some references.

* Bivariate Regression, corellation.  This could be done with simple
   formulas manipulating arrays and this is probably what we should aim
   for in an initial release.  Down the road, we should use the
   RealMatrixImpl solve() to support general linear regression.  I have
   an implementation (of simple regression) that I could clean up and
   submit; but again, I would be glad to let someone else submit this.

* Binomial coefficients  I have an "exact" implementation that is
   limited to what can be stored in a long.  This should be extended to
   use BigIntegers and potentially to support logarithmic
   representations.

The following are items for which I do not have full Java code:

* Newton's method for finding roots
* Exponential growth and decay (set up for financial applications)
* Polynomial Interpolation (curve fitting)
* Sampling from Collections (maybe belongs in Collections???)

It would be a good idea for us to agree on priorities.  Personally, I 
would list things more or less in the order presented above.

Obviously, one more thing that we need help on is documentation. My 
personal top priority is to get some basic material submitted for the 
maven site.  Finally, there is *lots* of cleanup to do in the existing 
code and javadoc and more test cases to add (esp. tests for the 
"rolling" capability in UnivariateImpl).

Regards,

Phil



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Priorities, help needed

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Phil Steitz wrote:
> Obviously, one more thing that we need help on is documentation. My 
> personal top priority is to get some basic material submitted for the 
> maven site.  
>

Here's another source to reference in the javadoc for particular 
implementations:

http://mathworld.wolfram.com/

This is where I quickly grabbed Geometric Mean:
http://mathworld.wolfram.com/GeometricMean.html

They have excellent info, often with great details. I just got done 
using Figurate Numbers to quickly determine different sized/shaped 
neighborhoods in cellular automata.
http://mathworld.wolfram.com/FigurateNumber.html

-Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org