You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Phil Steitz <st...@yahoo.com> on 2003/06/17 09:33:01 UTC

Re: [math] UnivariateImpl statistical computation strategies

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> I've got a design decision to make that I'd like to get others opinion 
> on. Currently, the strategy in UnivariateImpl is to calculate the 
> rudimentary building blocks of the statistics and then calculate the 
> statistics in the "getters" (getVariance, getSkewness, getKurtosis 
> etc.). Some cases its done in the getter, some cases its done in the 
> addValue method itself. Often its based on the implementors opinion of 
> where to put it, not on any hard logic.
> 
> This presents a debate with the following arguments:
> 
> (1) Bean etiquette suggests "getters" are for bean properties, its 
> usually recommended that  this means that they do nothing more than 
> return the value for a property. 

This is certainly not specified anywhere in the Javabeans spec.  In fact, the
spec explicitly states (sect 7.1) "So properties need not just be simple data
fields, they can actually be computed values. Updates
may have various programmatic side effects."  If the "etiguette" above were in
fact standard, entity EJBs, for example, would be impossible.  The power of the
javabeans specification is that it is an interface specification, not an
implementation specification.  Beans can and should manage their internal state
and the mapping between their internals and their publicly exposed properties
in the most convenient and efficient way possible.  

This is beneficial in our Univariate 
> case when calling a getter many times without adding a new value (lets 
> say you use "getKurtosis" allot in a calculation before adding another 
> value), then its more logical to have the kurtosis only calculated once 
> and put the code for calculating it in the addValue method.
> 
Huh?  Kurtosis is only defined for the versions that store all values.  If and
when we implement the corrected two-pass formulas, these may benefit from some
running sum computations; but for now, all computations should be performed on
demand, using the vector of stored values.  There is no reason to keep updating
as the values are added for the stored case.

> (2) However, If calling addValue many times (more likely the case) with 
> only the interest of getting the "getMean" back, its wasted 
> computational time to calculate all the other Stats (like kurtosis) in 
> addValue when you just want the results of "getMean" back after each 
> "addValue".

Yes.  The stored versions should use array-based computations, computing
statistics on demand in the getters.
> 
> I suspect this debate leads to a compromise similar to what I've done in 
> skew and kurt where all the rudimentary building blocks for all the 
> stats are built in addValue, and the detailed calculation specific to 
> that stat is done in the getter.
> 
I see no reason to do anything in addValue (other than add the value) for the
stored case.  Computations should be vector based -- unless the modified
two-pass stuff can have reduced computational overhead by keeping lagged
statistics.

> thoughts?
> Mark
> 
> p.s. In a more complex approach the user might be able to tune the 
> calculations given thier specific need. But this would require the 
> creation of a delegation framework and boolean switching to control the 
> behavior of the Implementation, allot of added complexity that would 
> need to be maintained, it could create more work than its worth.

-1
> 


p.s.  let's try keep subjects indicative of the content. (note change above)
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] UnivariateImpl statistical computation strategies

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Phil Steitz wrote:

>--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
>  
>
>>Tim O'Brien wrote:
>>
>>    
>>
>>>On Tue, 17 Jun 2003, Mark R. Diggory wrote:
>>> 
>>>
>>>      
>>>
>>>>Phil Steitz wrote:
>>>>   
>>>>
>>>>        
>>>>
>>>>>--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
>>>>>     
>>>>>
>>>>>          
>>>>>
>>>>I'm sorry, I'm not really talking about the spec, just a general trend 
>>>>in design of Java Beans that I've observed and kinda been "trained" to 
>>>>do. So, if its against the spec even, I suspect I should change my 
>>>>view-point.
>>>>   
>>>>
>>>>        
>>>>
>>>I also try to adhere to this practice, but let's all agree not to call
>>>Univariate implementations "JavaBeans" (even though we are going to derive 
>>>benefits from using the getXXX() syntax).
>>>
>>>Mark, let me know when you've come to a stopping point, and I'll move 
>>>source to the proposed packages. 
>>>
>>> 
>>>
>>>      
>>>
>>I want to retain the content/examples for my proposed changes, is it 
>>acceptable to start another directory or should I tag and create a cvs 
>>branch with the proposed changes in it?
>>
>>    
>>
>Why not just retain them locally?
>
>Is there a Jakarta policy on this kind of thing?  
>  
>

Because if one is a commiter, one can use the cvs branching mechanism to 
approach prototyping. if a prototype implmentation is ever approved it 
can be merged into the trunk. Providing it in the cvs opens it up easily 
to review by other developers simply by checking out that particular 
branch. Yes, I also wonder if there is a Jakarta Policy on such a thing.

-Mark



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] UnivariateImpl statistical computation strategies

Posted by Tim O'Brien <to...@discursive.com>.

On Tue, 17 Jun 2003, Phil Steitz wrote:
> --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> > I want to retain the content/examples for my proposed changes, is it 
> > acceptable to start another directory or should I tag and create a cvs 
> > branch with the proposed changes in it?
> > 
> Why not just retain them locally?
> 
> Is there a Jakarta policy on this kind of thing?  

Huh?  Please clarify, how does Phil's proposed package structure conflict 
with Mark's content/examples.  I don't see a conflict.

Tim


> 
> Phil
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> > 
> 
> 
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 
> 

-- 
----------------------
Tim O'Brien
Evanston, IL
(847) 863-7045
tobrien@discursive.com



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] UnivariateImpl statistical computation strategies

Posted by Phil Steitz <st...@yahoo.com>.

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Tim O'Brien wrote:
> 
> >On Tue, 17 Jun 2003, Mark R. Diggory wrote:
> >  
> >
> >>Phil Steitz wrote:
> >>    
> >>
> >>>--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> >>>      
> >>>
> >>I'm sorry, I'm not really talking about the spec, just a general trend 
> >>in design of Java Beans that I've observed and kinda been "trained" to 
> >>do. So, if its against the spec even, I suspect I should change my 
> >>view-point.
> >>    
> >>
> >
> >I also try to adhere to this practice, but let's all agree not to call
> >Univariate implementations "JavaBeans" (even though we are going to derive 
> >benefits from using the getXXX() syntax).
> >
> >Mark, let me know when you've come to a stopping point, and I'll move 
> >source to the proposed packages. 
> >
> >  
> >
> I want to retain the content/examples for my proposed changes, is it 
> acceptable to start another directory or should I tag and create a cvs 
> branch with the proposed changes in it?
> 
Why not just retain them locally?

Is there a Jakarta policy on this kind of thing?  

Phil
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] UnivariateImpl statistical computation strategies

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Tim O'Brien wrote:

>On Tue, 17 Jun 2003, Mark R. Diggory wrote:
>  
>
>>Phil Steitz wrote:
>>    
>>
>>>--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
>>>      
>>>
>>I'm sorry, I'm not really talking about the spec, just a general trend 
>>in design of Java Beans that I've observed and kinda been "trained" to 
>>do. So, if its against the spec even, I suspect I should change my 
>>view-point.
>>    
>>
>
>I also try to adhere to this practice, but let's all agree not to call
>Univariate implementations "JavaBeans" (even though we are going to derive 
>benefits from using the getXXX() syntax).
>
>Mark, let me know when you've come to a stopping point, and I'll move 
>source to the proposed packages. 
>
>  
>
I want to retain the content/examples for my proposed changes, is it 
acceptable to start another directory or should I tag and create a cvs 
branch with the proposed changes in it?




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] UnivariateImpl statistical computation strategies

Posted by Tim O'Brien <to...@discursive.com>.

On Tue, 17 Jun 2003, Mark R. Diggory wrote:

> Phil Steitz wrote:
> > --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> I'm sorry, I'm not really talking about the spec, just a general trend 
> in design of Java Beans that I've observed and kinda been "trained" to 
> do. So, if its against the spec even, I suspect I should change my 
> view-point.

I also try to adhere to this practice, but let's all agree not to call
Univariate implementations "JavaBeans" (even though we are going to derive 
benefits from using the getXXX() syntax).

Mark, let me know when you've come to a stopping point, and I'll move 
source to the proposed packages. 

----------------------
Tim O'Brien
Evanston, IL
(847) 863-7045
tobrien@discursive.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] UnivariateImpl statistical computation strategies

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Phil Steitz wrote:
> --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
>>(1) Bean etiquette suggests "getters" are for bean properties, its 
>>usually recommended that  this means that they do nothing more than 
>>return the value for a property. 
> 
> 
> This is certainly not specified anywhere in the Javabeans spec.  In fact, the
> spec explicitly states (sect 7.1) "So properties need not just be simple data
> fields, they can actually be computed values. Updates
> may have various programmatic side effects."  If the "etiguette" above were in
> fact standard, entity EJBs, for example, would be impossible.  The power of the
> javabeans specification is that it is an interface specification, not an
> implementation specification.  Beans can and should manage their internal state
> and the mapping between their internals and their publicly exposed properties
> in the most convenient and efficient way possible.  
> 
> This is beneficial in our Univariate 
> 

I'm sorry, I'm not really talking about the spec, just a general trend 
in design of Java Beans that I've observed and kinda been "trained" to 
do. So, if its against the spec even, I suspect I should change my 
view-point.


>>case when calling a getter many times without adding a new value (lets 
>>say you use "getKurtosis" allot in a calculation before adding another 
>>value), then its more logical to have the kurtosis only calculated once 
>>and put the code for calculating it in the addValue method.
>>
> 
> Huh?  Kurtosis is only defined for the versions that store all values.  If and
> when we implement the corrected two-pass formulas, these may benefit from some
> running sum computations; but for now, all computations should be performed on
> demand, using the vector of stored values.  There is no reason to keep updating
> as the values are added for the stored case.
> 

You should really review UnivariateImpl, I implemented memory free 
versions Kurtosis and and Skew quite some time ago. Now, I'm working on 
improving their accuracy through application similar to Wests algorithm 
for them. These are just "moments" there is no reason that they can't 
benifit from the same approach as variance.

> 
>>(2) However, If calling addValue many times (more likely the case) with 
>>only the interest of getting the "getMean" back, its wasted 
>>computational time to calculate all the other Stats (like kurtosis) in 
>>addValue when you just want the results of "getMean" back after each 
>>"addValue".
> 
> 
> Yes.  The stored versions should use array-based computations, computing
> statistics on demand in the getters.
> 

+1 and I've made these changes.

>>
>>p.s. In a more complex approach the user might be able to tune the 
>>calculations given thier specific need. But this would require the 
>>creation of a delegation framework and boolean switching to control the 
>>behavior of the Implementation, allot of added complexity that would 
>>need to be maintained, it could create more work than its worth.
> 
> 
> -1


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org