You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/07/08 06:26:23 UTC

[math] Main Univariate Facade Implementations that work with UnivariateStatistics

Here is a patch with new versions of the Univariate Facades.
Included in this patch are:

1.) one new univariate "MixedListUnivariate" That accepts a 
TransformerMap to transform objects to primitive doubles.

2.) one new AbstractUnivariate implementation.

3.) There are many revisions to the current Implementations to make them 
work with both NumberTransformers and the individual UnivariateStatistics.

4.) all Moment based stats (like skew and kurt) are moved up into the 
Univariate interface.

5.) The StorelessUnivariateStatistics have been reorganized to move 
calculations that do not need to be performed on "increment" further 
upstream to "getValue". This reduces the amount of calculation being 
done at the addValue stage (eliminating variance, skew and kurtosis 
calculations from the moments at this stage).

6.) All moment based statistics have been modified to support sharing a 
common moment. This way internal calculations for m1, m2, m3 and m4 do 
not need to be replicated within the individual stats, they can all 
share the same object.

I would really like to get some input on these from the group as they 
represent a rather large commit change on others work in the stat directory.

Lastly, I do have a version of StatUtils that works with 
UnivariateStatistics, but I'm now convinced that we no longer need 
StatUtils any more.

-Mark

Re: [math] Main Univariate Facade Implementations that work with UnivariateStatistics

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

I'm getting ready to commit these changes today, I thought I'd give a 
heads up before I did this and give one last opportunity for 
interaction. I plan to commit sometime this evening.

Just to note again, these changes are to the Univariate Implementations 
to get them working with the UnivariateStatistic library. If we do 
decide to move away from using the Univariate Interfaces, this is a 
stepping stone in that direction. I would welcome others to explore 
alternate strategies for UnivariateStatistic "containers/facades".

-Mark

Mark R. Diggory wrote:

> Phil Steitz wrote:
>
>> Given the consensus to move in the direction of disaggregated 
>> statistics, I
>> would agree that there is no internal need for StatUtils.
>>
>> As a final comment on this, I would like to point out that my 
>> opposition to
>> this approach was based on what I now see was a naive view that we could
>> actually agree on a set of commonly used univariate statistics and 
>> limit our
>> support to these. I never envisioned Univariate as a "large, monolithic
>> interface." I see now that this is an inherently limiting perspective 
>> and I
>> should not have proposed it. I was relying too much on my biased 
>> practical
>> experience/observation that once you get past the basic stuff, practical
>> applications drop off quickly. I was also overly concerned about 
>> performance
>> and overhead, again largely due to my own experience and application 
>> needs.
>>  
>>
>> The one thing that I don't understand about the new approach and I would
>> suggest reconsidering is why you want to retain the Univariate 
>> interfaces at
>> all.  As long as you have these and people depend on them, I don't 
>> think that
>> you will really have the full extensibility that you want and you 
>> will have
>> added complexity and overhead to deal with. Sort of the worst of both 
>> worlds.
>> The only thing that you *need* is a way to aggregate data (actually 
>> you have
>> this already -- just need shared aggregation).  Why not just move to 
>> a model
>> where a Univariate has a dynamic List of Statistics and do away with 
>> the getXXX
>> methods in the Univariate interfaces altogether?
>> Phil
>>
> Phil, I really value your input and work, you really help keep us on 
> track and to keep adventures like me from going "too far overboard". I 
> am approaching the contents of this patch in an attempt show how the 
> usage of the individual UnvariateStatistics initially relates back to 
> what we have already implemented.
>
> Do you think that Store/Univariate still provides a good example of 
> how stats can be aggregated together under a "beanlike" interface? 
> Maybe as such they are good initial examples of library usage. I do 
> still believe you are right that here is a logical subset of 
> statistics that could be categorized as "Descriptive Statistics", and 
> that we could place such a set within Univariate and still keep it 
> simple and light weight.
>
> I also recognize theres always going to be an interest in "expanding" 
> capabilities and having this modular UnviariateStatistic strategy at 
> the core of the implementations makes various aggregations of 
> statistics much more flexible and dynamic. I think the 
> Store/Univariate Interface/Implementations show us an initial strategy 
> for aggregation of the individual statistics under a bean-like 
> interface. As such they are very useful still for  immediate usage of 
> a "subset" of statistics. Maybe we should keep them but document them 
> as "front end tools" for users. Then also begin to work on something 
> along the lines that Brent recommended for an Aggregation Container, 
> but as a separate set of Interface/Implementations for now. What do 
> you think?
>
> -Mark
>


-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Main Univariate Facade Implementations that work with UnivariateStatistics

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Phil Steitz wrote:

>Given the consensus to move in the direction of disaggregated statistics, I
>would agree that there is no internal need for StatUtils.
>
>As a final comment on this, I would like to point out that my opposition to
>this approach was based on what I now see was a naive view that we could
>actually agree on a set of commonly used univariate statistics and limit our
>support to these. I never envisioned Univariate as a "large, monolithic
>interface." I see now that this is an inherently limiting perspective and I
>should not have proposed it. I was relying too much on my biased practical
>experience/observation that once you get past the basic stuff, practical
>applications drop off quickly. I was also overly concerned about performance
>and overhead, again largely due to my own experience and application needs.
>  
>
>The one thing that I don't understand about the new approach and I would
>suggest reconsidering is why you want to retain the Univariate interfaces at
>all.  As long as you have these and people depend on them, I don't think that
>you will really have the full extensibility that you want and you will have
>added complexity and overhead to deal with. Sort of the worst of both worlds.
>The only thing that you *need* is a way to aggregate data (actually you have
>this already -- just need shared aggregation).  Why not just move to a model
>where a Univariate has a dynamic List of Statistics and do away with the getXXX
>methods in the Univariate interfaces altogether? 
>
>Phil
>
Phil, I really value your input and work, you really help keep us on 
track and to keep adventures like me from going "too far overboard". I 
am approaching the contents of this patch in an attempt show how the 
usage of the individual UnvariateStatistics initially relates back to 
what we have already implemented.

Do you think that Store/Univariate still provides a good example of how 
stats can be aggregated together under a "beanlike" interface? Maybe as 
such they are good initial examples of library usage. I do still believe 
you are right that here is a logical subset of statistics that could be 
categorized as "Descriptive Statistics", and that we could place such a 
set within Univariate and still keep it simple and light weight.

I also recognize theres always going to be an interest in "expanding" 
capabilities and having this modular UnviariateStatistic strategy at the 
core of the implementations makes various aggregations of statistics 
much more flexible and dynamic. I think the Store/Univariate 
Interface/Implementations show us an initial strategy for aggregation of 
the individual statistics under a bean-like interface. As such they are 
very useful still for  immediate usage of a "subset" of statistics. 
Maybe we should keep them but document them as "front end tools" for 
users. Then also begin to work on something along the lines that Brent 
recommended for an Aggregation Container, but as a separate set of 
Interface/Implementations for now. What do you think?

-Mark

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Main Univariate Facade Implementations that work with UnivariateStatistics

Posted by Phil Steitz <st...@yahoo.com>.

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Here is a patch with new versions of the Univariate Facades.
> Included in this patch are:
> 
> 1.) one new univariate "MixedListUnivariate" That accepts a 
> TransformerMap to transform objects to primitive doubles.
> 
> 2.) one new AbstractUnivariate implementation.
> 
> 3.) There are many revisions to the current Implementations to make them 
> work with both NumberTransformers and the individual UnivariateStatistics.
> 
> 4.) all Moment based stats (like skew and kurt) are moved up into the 
> Univariate interface.
> 
> 5.) The StorelessUnivariateStatistics have been reorganized to move 
> calculations that do not need to be performed on "increment" further 
> upstream to "getValue". This reduces the amount of calculation being 
> done at the addValue stage (eliminating variance, skew and kurtosis 
> calculations from the moments at this stage).
> 
> 6.) All moment based statistics have been modified to support sharing a 
> common moment. This way internal calculations for m1, m2, m3 and m4 do 
> not need to be replicated within the individual stats, they can all 
> share the same object.
> 
> I would really like to get some input on these from the group as they 
> represent a rather large commit change on others work in the stat directory.
> 
> Lastly, I do have a version of StatUtils that works with 
> UnivariateStatistics, but I'm now convinced that we no longer need 
> StatUtils any more.

Given the consensus to move in the direction of disaggregated statistics, I
would agree that there is no internal need for StatUtils.

As a final comment on this, I would like to point out that my opposition to
this approach was based on what I now see was a naive view that we could
actually agree on a set of commonly used univariate statistics and limit our
support to these. I never envisioned Univariate as a "large, monolithic
interface." I see now that this is an inherently limiting perspective and I
should not have proposed it. I was relying too much on my biased practical
experience/observation that once you get past the basic stuff, practical
applications drop off quickly. I was also overly concerned about performance
and overhead, again largely due to my own experience and application needs.

The one thing that I don't understand about the new approach and I would
suggest reconsidering is why you want to retain the Univariate interfaces at
all.  As long as you have these and people depend on them, I don't think that
you will really have the full extensibility that you want and you will have
added complexity and overhead to deal with. Sort of the worst of both worlds.
The only thing that you *need* is a way to aggregate data (actually you have
this already -- just need shared aggregation).  Why not just move to a model
where a Univariate has a dynamic List of Statistics and do away with the getXXX
methods in the Univariate interfaces altogether? 

Phil

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org