You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/06/21 05:07:59 UTC

[math] Univariate / StoreUnivariate interface cleanup

I think its always wise now to consult before making interface changes. 
  I'd like to make a couple cleanup changes in the Univariate and 
StoreUnivariate Interfaces:

(1) I don't think we really need getProduct in the Univariate interface 
(unless we want it to be public and available to users like getSum and 
getSumSq, which is another questionable method).

     /**
      * Returns the product of the available values
      * @return The product or Double.NaN if no values have been added.
      */
     abstract double getProduct();

As getProduct is no longer used for geomean, now sumLog is used and 
might be a more appropriate method to place in the interface.

(2) getKurtosis and getSkewness are now fully implemented in 
UnivariateImpl, a while ago I added them to the Univariate interface, as 
such they are not needed to be defined in the StoreUnivariate interface 
(they are inherited). I'd like to remove them from there as they are 
repetative.

(3) I'd like to move getKurtosisClass and the static constants up to 
Univariate, as getKurtosis is available there now.

-Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Univariate / StoreUnivariate interface cleanup

Posted by Phil Steitz <ph...@steitz.com>.
Mark R. Diggory wrote:
> 
> 
> Phil Steitz wrote:
> 
> 
>>
>> I would not add these to Univariate. I understand your desire to keep 
>> them implemented in UnivariateImpl, but I do not want to force 
>> implementation of these statistics in the Univariate interface.  
> 
> 
> I also understand your interest in being able to provide a "subset" of 
> statistical functions via Univariate. The problem is that I want to be 
> able to provide the same sort of capabilities in the Storageless 
> approaches that we're providing in the Storage based approaches.
> 
> I feel we have implemented these methods throughout all the versions of 
> Univariate, they are common to all implementations, I believe then it is 
> wise to place them into the interface that spans all implementations.

-1  I would prefer to remove them from UnivariateImpl

> 
> I also am starting to feel that as long as the storageless approach is 
> based solely on the Univariate Interface, neither of us will be happy, 
> or one of us will end up having to compromise our position.

Implementation classes implement methods defined in interfaces.  The 
Univariate interface defines the methods that all Univariate 
implementations must support.  As such, it should be limited to core 
functions. Skewness and Kurtosis are non-core, IMHO.

> 
> I really want to find a way to provide an extensible means to add other 
> statistical methods to the library and make them available via 
> interfaces. I understand the importance of keeping the interfaces 
> controlled or I wouldn't have opened up the discussion. If you like "x 
> method", and I like "y method", we should be able to find room for both 
> of these in the library *and* in the interfaces. There are allot of 
> useful statistics out there that would benefit from implementation over 
> a non-storage or storage based strategy.
> 
> One consideration is that delineating the Interfaces on "Univariate" vs. 
> "StoreUnivariate" is way too *implementation specific*. I don't think 
> implementation specific interfaces are very useful.  I understand its 
> logical to base them on implementation initially, but I'm starting to 
> feel it would be far less to restrictive to draw lines in terms of 
> functionality. better to categorize on "application" or "usage". 
> Something like:
> 
> RankStatistics
> 
> MomentStatistics
> 
> NonParametricStatistics
> 
> etc.
> 
> (not suggesting these as the ultimate solution, just an example)
> 
> then if someone writes a particular statistic and they what to donate it 
> to the package, there's room for expansion.
> 
> I'm going to spend some of my own time looking Aspect Oriented Design 
> further. I think there has to be a means of separating the statistical 
> approach from the underlying storage/storageless implementation.

I suggest that our time is better spent completing the remaining tasks 
for initial release.  As stated in the proposal

"Commons-Math is a library of lightweight, self-contained mathematics 
and statistics components addressing the most common practical problems 
not immediately available in the Java programming language or 
commons-lang. The guiding principles for commons-math are:

    1. Real-world application use cases determine priority
    2. Emphasis on small, easily integrated components rather than large 
libraries with complex dependencies
    3. All algorithms are fully documented and follow generally accepted 
best practices
    4. In situations where multiple standard algorithms exist, use the 
Strategy pattern to support multiple implementations
    5. Limited dependencies. No external dependencies beyond Commons 
components and the JDK "

Commons-math is not intended to be a general-purpose statistics package 
any more than it is intended to be a general-purpose numerics package. 
  If this is your area of interest, it might be better to start a 
separate project to build a general-purpose Java statistics library.

Phil

> 
> -Mark
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Univariate / StoreUnivariate interface cleanup

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Phil Steitz wrote:
> Mark R. Diggory wrote:
> 
>> I think its always wise now to consult before making interface 
>> changes.  I'd like to make a couple cleanup changes in the Univariate 
>> and StoreUnivariate Interfaces:
>>
>> (1) I don't think we really need getProduct in the Univariate 
>> interface (unless we want it to be public and available to users like 
>> getSum and getSumSq, which is another questionable method).
> 
> 
> I would like to leave getSum and getSumSq in, since these are 
> generically useful.  I would be fine with dropping getProduct.
> 
>>
>>     /**
>>      * Returns the product of the available values
>>      * @return The product or Double.NaN if no values have been added.
>>      */
>>     abstract double getProduct();
>>
>> As getProduct is no longer used for geomean, now sumLog is used and 
>> might be a more appropriate method to place in the interface.
> 
> 
> I would not add sumLog

I wasn't really that thrilled about it myself.
> 
>>
>> (2) getKurtosis and getSkewness are now fully implemented in 
>> UnivariateImpl, a while ago I added them to the Univariate interface, 
>> as such they are not needed to be defined in the StoreUnivariate 
>> interface (they are inherited). I'd like to remove them from there as 
>> they are repetative.
>>
>> (3) I'd like to move getKurtosisClass and the static constants up to 
>> Univariate, as getKurtosis is available there now.
> 
> 
> I would not add these to Univariate. I understand your desire to keep 
> them implemented in UnivariateImpl, but I do not want to force 
> implementation of these statistics in the Univariate interface.  

I also understand your interest in being able to provide a "subset" of 
statistical functions via Univariate. The problem is that I want to be 
able to provide the same sort of capabilities in the Storageless 
approaches that we're providing in the Storage based approaches.

I feel we have implemented these methods throughout all the versions of 
Univariate, they are common to all implementations, I believe then it is 
wise to place them into the interface that spans all implementations.

I also am starting to feel that as long as the storageless approach is 
based solely on the Univariate Interface, neither of us will be happy, 
or one of us will end up having to compromise our position.

I really want to find a way to provide an extensible means to add other 
statistical methods to the library and make them available via 
interfaces. I understand the importance of keeping the interfaces 
controlled or I wouldn't have opened up the discussion. If you like "x 
method", and I like "y method", we should be able to find room for both 
of these in the library *and* in the interfaces. There are allot of 
useful statistics out there that would benefit from implementation over 
a non-storage or storage based strategy.

One consideration is that delineating the Interfaces on "Univariate" vs. 
"StoreUnivariate" is way too *implementation specific*. I don't think 
implementation specific interfaces are very useful.  I understand its 
logical to base them on implementation initially, but I'm starting to 
feel it would be far less to restrictive to draw lines in terms of 
functionality. better to categorize on "application" or "usage". 
Something like:

RankStatistics

MomentStatistics

NonParametricStatistics

etc.

(not suggesting these as the ultimate solution, just an example)

then if someone writes a particular statistic and they what to donate it 
to the package, there's room for expansion.

I'm going to spend some of my own time looking Aspect Oriented Design 
further. I think there has to be a means of separating the statistical 
approach from the underlying storage/storageless implementation.

-Mark





---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Univariate / StoreUnivariate interface cleanup

Posted by Phil Steitz <ph...@steitz.com>.
Mark R. Diggory wrote:
> I think its always wise now to consult before making interface changes. 
>  I'd like to make a couple cleanup changes in the Univariate and 
> StoreUnivariate Interfaces:
> 
> (1) I don't think we really need getProduct in the Univariate interface 
> (unless we want it to be public and available to users like getSum and 
> getSumSq, which is another questionable method).

I would like to leave getSum and getSumSq in, since these are 
generically useful.  I would be fine with dropping getProduct.

> 
>     /**
>      * Returns the product of the available values
>      * @return The product or Double.NaN if no values have been added.
>      */
>     abstract double getProduct();
> 
> As getProduct is no longer used for geomean, now sumLog is used and 
> might be a more appropriate method to place in the interface.

I would not add sumLog

> 
> (2) getKurtosis and getSkewness are now fully implemented in 
> UnivariateImpl, a while ago I added them to the Univariate interface, as 
> such they are not needed to be defined in the StoreUnivariate interface 
> (they are inherited). I'd like to remove them from there as they are 
> repetative.
> 
> (3) I'd like to move getKurtosisClass and the static constants up to 
> Univariate, as getKurtosis is available there now.

I would not add these to Univariate. I understand your desire to keep 
them implemented in UnivariateImpl, but I do not want to force 
implementation of these statistics in the Univariate interface.  I agree 
with Al that these things have very limited practical use. Nowadays, 
graphical analysis and/or direct characterization of sample 
distributions, such as what EmpiricalDistribution provides, have pretty 
much obsoleted these summary measures for practical purposes.  Many 
contemporary introductory statistics texts omit them.

I would also suggest making skewness and kurtosis package-scoped in 
StatUtils, to keep the publicly exposed static methods to a minimum.

Phil

> 
> -Mark
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Univariate / StoreUnivariate interface cleanup

Posted by Al Chou <ho...@yahoo.com>.
--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> I think its always wise now to consult before making interface changes. 
>   I'd like to make a couple cleanup changes in the Univariate and 
> StoreUnivariate Interfaces:

Thanks for consulting!


> (1) I don't think we really need getProduct in the Univariate interface 
> (unless we want it to be public and available to users like getSum and 
> getSumSq, which is another questionable method).
> 
>      /**
>       * Returns the product of the available values
>       * @return The product or Double.NaN if no values have been added.
>       */
>      abstract double getProduct();
> 
> As getProduct is no longer used for geomean, now sumLog is used and 
> might be a more appropriate method to place in the interface.

+1 for removing getProduct
-1 for providing getSumLog
Let's let users ask us for a product getter.  I can't easily think of a common
use case.  Not sure about getSumSq, either.  It'll be interesting and
educational to see what users want to and will try to do with this library.


Al

=====
Albert Davidson Chou

    Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org