You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/11/07 22:54:23 UTC

[math] Proposal for Package restructuring and Class renaming

I have several modifications I'm planning to make, but in the spirit of 
consensus I want to propose them and attempt to get some agreement. So 
math developer opinions on the subject would be good.

1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions

Gives this package a more "generic" position to hold more than just 
"stat" distributions.

2.) Like in my last emails concerning "Univariate" I would like to, (and 
have done so in my checkout successfully) Make the following Class changes:

interface o.a.c.m.stat.StoreUnivariate -->
            abstract class o.a.c.m.stat.DescriptiveStatistics

this actually becomes a factory class and uses Discovery to instantiate 
new instances of the following implementations

*default implementation*
o.a.c.m.stat.StoreUnivariateImpl -->
           o.a.c.m.stat.univariate.StatisticsImpl

*alternate implementations*
o.a.c.m.stat.UnivariateImpl -->
           o.a.c.m.stat.univariate.StorelessStatisticsImpl

o.a.c.m.stat.ListUnivariateImpl -->
           o.a.c.m.stat.univariate.ListStatisticsImpl

o.a.c.m.stat.BeanListUnivariateImpl -->
           o.a.c.m.stat.univariate.BeanListStatisticsImpl

The benefit of this is that the Alternate Implementations can all be 
instantiated from the o.a.c.m.stat.DescriptiveStatistics factories 
newInstance(...) methods. Thus alternate implementations of 
DescriptiveStatistics can be written as Service Providers and set in the 
environment/JVM configuration. We can now write SP's for other tools 
like Matlab, Mathematica, JLink, C++ libraries, R, Omegahat ... the list 
goes on and on...

Someday, I'd like to see this design extended for Bivariate Statistics 
and Regression Classes. Eventually for Random Number generation as well.

-Mark

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://osprey.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by Al Chou <ho...@yahoo.com>.

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Al Chou wrote:
> > 
> > OK, I see.  The one thing I notice is that the names are getting awfully
> long,
> > especially for the non-default case.  I guess that's a price we pay for
> having
> > descriptive (no play on words intended) names like
> DescriptiveStatistics....
> 
> Maybe the Implementations could be abbreviated somewhat
> 
> o.a.c.math.stat.DescriptiveStatistics
> 
> o.a.c.math.stat.StorelessDscrStatsImpl
> o.a.c.math.stat.DscrStatsImpl
> 
> We could also consider pushing the actual implementation off into its 
> own packages
> 
> o.a.c.math.stat.impl.StorelessDscrStatsImpl
> o.a.c.math.stat.impl.DscrStatsImpl
> 
> This would even push all the univariate stat providers off into this 
> hierarchy as well
> 
> o.a.c.math.stat.impl.univar.StorelessUnivariateStatistic
> o.a.c.math.stat.impl.univar.UnivariateStatistic

Too much renaming and reorganization.  I didn't mean to complain too loudly,
and if the result is to use abbreviations, I retract my comments.  I probably
should have given more than half a second's thought to what alternative names
might be shorter, but in the absence of well-thought-out shorter names, I much
prefer the current proposal of DescriptiveStatistics.  Never use abbreviations
unless everyone already knows them (e.g., sin for sine), I say.

Al

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Al Chou wrote:
> 
> OK, I see.  The one thing I notice is that the names are getting awfully long,
> especially for the non-default case.  I guess that's a price we pay for having
> descriptive (no play on words intended) names like DescriptiveStatistics....

Maybe the Implementations could be abbreviated somewhat

o.a.c.math.stat.DescriptiveStatistics

o.a.c.math.stat.StorelessDscrStatsImpl
o.a.c.math.stat.DscrStatsImpl

We could also consider pushing the actual implementation off into its 
own packages

o.a.c.math.stat.impl.StorelessDscrStatsImpl
o.a.c.math.stat.impl.DscrStatsImpl

This would even push all the univariate stat providers off into this 
hierarchy as well

o.a.c.math.stat.impl.univar.StorelessUnivariateStatistic
o.a.c.math.stat.impl.univar.UnivariateStatistic

-M.
-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by Al Chou <ho...@yahoo.com>.

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Al Chou wrote:
> > --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
...
> >>2.) Like in my last emails concerning "Univariate" I would like to, (and 
> >>have done so in my checkout successfully) Make the following Class changes:
> >>
> >>interface o.a.c.m.stat.StoreUnivariate -->
> >>            abstract class o.a.c.m.stat.DescriptiveStatistics
> >>
> >>this actually becomes a factory class and uses Discovery to instantiate 
> >>new instances of the following implementations
> >>
> >>*default implementation*
> >>o.a.c.m.stat.StoreUnivariateImpl -->
> >>           o.a.c.m.stat.univariate.StatisticsImpl
> > 
> > 
> > Forgive me for not refamiliarizing myself with the code first, but should
> the
> > storeless version perhaps be the default implementation instead?  What do
> we
> > lose by going that way?  I'm thinking it would be nice to keep memory usage
> > lower if possible.
> 
> The Storeless version (UnivariateImpl) doesn't support rank Statistics 
> because of its storeless nature, the more fully featured implementation 
> is StoreUnivariateImpl, it does everything, but has the limitation of 
> requiring storage of the values. These are two different implementations 
> with different internal storage configurations. I choose 
> StoreUnivariateImpl because I think the default should have full 
> capabilities.
> 
> The storeless version is more of an Optimized solution, It probably wise 
> to suggest that one use it only if one needs that functionality (ie 
> trying to get moments across huge datasets or realtime value streams of 
> sorts)

That sounds reasonable.  Thanks for the refresher (I looked at the current code
based on your remarks, too).


> > Before we go overboard, can you give a quick example of instantiating one
> of
> > the implementations?  Or perhaps, both the default and one alternative
...
> Yes, like that
> 
> For the default Discovery configured implementation:
> 
> DescriptiveStatistics stats = DescriptiveStatistics.newInstance();
> 
> stats.addValue(5.0);
> ...
> 
> double mean = stats.getMean();
> 
> 
> For any alternate Implementations:
> 
> DescriptiveStatistics stats = 
> DescriptiveStatistics.newInstance(StorelessDescriptiveStatisticsImpl.class);
> 
> stats.addValue(5.0);
> ...
> 
> double mean = stats.getMean();
> 
> and/or
> 
> DescriptiveStatistics stats = 
>
DescriptiveStatistics.newInstance("o.a.c.math.stat.impl.StorelessDescriptiveStatisticsImpl");
> 
> stats.addValue(5.0);
> ...
> 
> double mean = stats.getMean();
> 
> depending n which people like more

OK, I see.  The one thing I notice is that the names are getting awfully long,
especially for the non-default case.  I guess that's a price we pay for having
descriptive (no play on words intended) names like DescriptiveStatistics....



Al

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.


Al Chou wrote:

> --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> 
>>I have several modifications I'm planning to make, but in the spirit of 
>>consensus I want to propose them and attempt to get some agreement. So 
>>math developer opinions on the subject would be good.
>>
>>1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
>>
>>Gives this package a more "generic" position to hold more than just 
>>"stat" distributions.
> 
> 
> What other kinds of distributions did you have in mind?  I'm asking out of
> complete ignorance.
> 
> 
> 
>>2.) Like in my last emails concerning "Univariate" I would like to, (and 
>>have done so in my checkout successfully) Make the following Class changes:
>>
>>interface o.a.c.m.stat.StoreUnivariate -->
>>            abstract class o.a.c.m.stat.DescriptiveStatistics
>>
>>this actually becomes a factory class and uses Discovery to instantiate 
>>new instances of the following implementations
>>
>>*default implementation*
>>o.a.c.m.stat.StoreUnivariateImpl -->
>>           o.a.c.m.stat.univariate.StatisticsImpl
> 
> 
> Forgive me for not refamiliarizing myself with the code first, but should the
> storeless version perhaps be the default implementation instead?  What do we
> lose by going that way?  I'm thinking it would be nice to keep memory usage
> lower if possible.
> 

The Storeless version (UnivariateImpl) doesn't support rank Statistics 
because of its storeless nature, the more fully featured implementation 
is StoreUnivariateImpl, it does everything, but has the limitation of 
requiring storage of the values. These are two different implementations 
with different internal storage configurations. I choose 
StoreUnivariateImpl because I think the default should have full 
capabilities.

The storeless version is more of an Optimized solution, It probably wise 
to suggest that one use it only if one needs that functionality (ie 
trying to get moments across huge datasets or realtime value streams of 
sorts)

> 
> 
>>*alternate implementations*
>>o.a.c.m.stat.UnivariateImpl -->
>>           o.a.c.m.stat.univariate.StorelessStatisticsImpl
>>
>>o.a.c.m.stat.ListUnivariateImpl -->
>>           o.a.c.m.stat.univariate.ListStatisticsImpl
>>
>>o.a.c.m.stat.BeanListUnivariateImpl -->
>>           o.a.c.m.stat.univariate.BeanListStatisticsImpl
>>
>>The benefit of this is that the Alternate Implementations can all be 
>>instantiated from the o.a.c.m.stat.DescriptiveStatistics factories 
>>newInstance(...) methods. Thus alternate implementations of 
>>DescriptiveStatistics can be written as Service Providers and set in the 
>>environment/JVM configuration. We can now write SP's for other tools 
>>like Matlab, Mathematica, JLink, C++ libraries, R, Omegahat ... the list 
>>goes on and on...
>>
>>Someday, I'd like to see this design extended for Bivariate Statistics 
>>and Regression Classes. Eventually for Random Number generation as well.
> 
> 
> Before we go overboard, can you give a quick example of instantiating one of
> the implementations?  Or perhaps, both the default and one alternative
> implementation?  Is it:
> 
> import org.apache.commons.math.stat.*;
> 

 > ...
 >
 > StoreUnivariateImpl defaultImplementation = 
DescriptiveStatistics.newInstance()
 > ;
 > StoreUnivariateImpl storagelessImplementation =
 > DescriptiveStatistics.newInstance( StorelessStatisticsImpl ) ;
 >

Yes, like that

For the default Discovery configured implementation:

DescriptiveStatistics stats = DescriptiveStatistics.newInstance();

stats.addValue(5.0);
...

double mean = stats.getMean();


For any alternate Implementations:

DescriptiveStatistics stats = 
DescriptiveStatistics.newInstance(StorelessDescriptiveStatisticsImpl.class);

stats.addValue(5.0);
...

double mean = stats.getMean();

and/or

DescriptiveStatistics stats = 
DescriptiveStatistics.newInstance("o.a.c.math.stat.impl.StorelessDescriptiveStatisticsImpl");

stats.addValue(5.0);
...

double mean = stats.getMean();

depending n which people like more


-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://osprey.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by Al Chou <ho...@yahoo.com>.

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Al Chou wrote:
> > 
> > Would you move the existing ones into
> > org.apache.commons.math.distributions.statistical or something so that the
> > probability distributions could be organized together under *.probability? 
> > Also, I noticed that the current package uses the singular "distribution"
> > rather than "distributions".
> 
> I suspect its unclear where this boundary would be drawn, I think all 
> the distributions would be both beneficial for both random number 
> distributions and statistical usage. I guess if it became clear that 
> there was a strong separation between the two then separate packages 
> would be warranted, but I'm not convinced of a difference. Yourself and 
> others may have more informed opinions.
> 
> -Mark

I don't have an informed opinion, so I'll fall back to the default opinion of
"lump everything together until/unless it's clear how to split it up".


Al

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Al Chou wrote:
> 
> Would you move the existing ones into
> org.apache.commons.math.distributions.statistical or something so that the
> probability distributions could be organized together under *.probability? 
> Also, I noticed that the current package uses the singular "distribution"
> rather than "distributions".

I suspect its unclear where this boundary would be drawn, I think all 
the distributions would be both beneficial for both random number 
distributions and statistical usage. I guess if it became clear that 
there was a strong separation between the two then separate packages 
would be warranted, but I'm not convinced of a difference. Yourself and 
others may have more informed opinions.

-Mark

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by Al Chou <ho...@yahoo.com>.

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Al Chou wrote:
> > --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> > 
> >>I have several modifications I'm planning to make, but in the spirit of 
> >>consensus I want to propose them and attempt to get some agreement. So 
> >>math developer opinions on the subject would be good.
> >>
> >>1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
> >>
> >>Gives this package a more "generic" position to hold more than just 
> >>"stat" distributions.
> > 
> > 
> > What other kinds of distributions did you have in mind?  I'm asking out of
> > complete ignorance.
> > 
> 
> Probability Distributions (Gamma, Beta, Poisson, Exponential, 
> Logarithmic, Hyperbolic ...) great examples of these are in Colt's
> 
> cern.jet.stat and cern.jet.random packages.
> 
> ... but are bound up as implementations of RandomNumberGeneration 
> classes...not that that a bad thing.
> 
> Eventually ours could be used in random number generation, I think they 
> should be a more dominant package.
> -Mark

Would you move the existing ones into
org.apache.commons.math.distributions.statistical or something so that the
probability distributions could be organized together under *.probability? 
Also, I noticed that the current package uses the singular "distribution"
rather than "distributions".


Al

=====
Albert Davidson Chou

    Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.


Al Chou wrote:

> --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> 
>>I have several modifications I'm planning to make, but in the spirit of 
>>consensus I want to propose them and attempt to get some agreement. So 
>>math developer opinions on the subject would be good.
>>
>>1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
>>
>>Gives this package a more "generic" position to hold more than just 
>>"stat" distributions.
> 
> 
> What other kinds of distributions did you have in mind?  I'm asking out of
> complete ignorance.
> 

Probability Distributions (Gamma, Beta, Poisson, Exponential, 
Logarithmic, Hyperbolic ...) great examples of these are in Colt's

cern.jet.stat and cern.jet.random packages.

... but are bound up as implementations of RandomNumberGeneration 
classes...not that that a bad thing.

Eventually ours could be used in random number generation, I think they 
should be a more dominant package.
-Mark

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://osprey.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by Al Chou <ho...@yahoo.com>.

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> I have several modifications I'm planning to make, but in the spirit of 
> consensus I want to propose them and attempt to get some agreement. So 
> math developer opinions on the subject would be good.
> 
> 1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
> 
> Gives this package a more "generic" position to hold more than just 
> "stat" distributions.

What other kinds of distributions did you have in mind?  I'm asking out of
complete ignorance.


> 2.) Like in my last emails concerning "Univariate" I would like to, (and 
> have done so in my checkout successfully) Make the following Class changes:
> 
> interface o.a.c.m.stat.StoreUnivariate -->
>             abstract class o.a.c.m.stat.DescriptiveStatistics
> 
> this actually becomes a factory class and uses Discovery to instantiate 
> new instances of the following implementations
> 
> *default implementation*
> o.a.c.m.stat.StoreUnivariateImpl -->
>            o.a.c.m.stat.univariate.StatisticsImpl

Forgive me for not refamiliarizing myself with the code first, but should the
storeless version perhaps be the default implementation instead?  What do we
lose by going that way?  I'm thinking it would be nice to keep memory usage
lower if possible.


> *alternate implementations*
> o.a.c.m.stat.UnivariateImpl -->
>            o.a.c.m.stat.univariate.StorelessStatisticsImpl
> 
> o.a.c.m.stat.ListUnivariateImpl -->
>            o.a.c.m.stat.univariate.ListStatisticsImpl
> 
> o.a.c.m.stat.BeanListUnivariateImpl -->
>            o.a.c.m.stat.univariate.BeanListStatisticsImpl
> 
> The benefit of this is that the Alternate Implementations can all be 
> instantiated from the o.a.c.m.stat.DescriptiveStatistics factories 
> newInstance(...) methods. Thus alternate implementations of 
> DescriptiveStatistics can be written as Service Providers and set in the 
> environment/JVM configuration. We can now write SP's for other tools 
> like Matlab, Mathematica, JLink, C++ libraries, R, Omegahat ... the list 
> goes on and on...
> 
> Someday, I'd like to see this design extended for Bivariate Statistics 
> and Regression Classes. Eventually for Random Number generation as well.

Before we go overboard, can you give a quick example of instantiating one of
the implementations?  Or perhaps, both the default and one alternative
implementation?  Is it:

import org.apache.commons.math.stat.*;

...

StoreUnivariateImpl defaultImplementation = DescriptiveStatistics.newInstance()
;
StoreUnivariateImpl storagelessImplementation =
DescriptiveStatistics.newInstance( StorelessStatisticsImpl ) ;



Al

=====
Albert Davidson Chou

    Get answers to Mac questions at http://www.Mac-Mgrs.org/ .

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Proposal for Package restructuring and Class renaming

Posted by Matt Cliff <ma...@mattcliff.com>.

I agree

On Fri, 7 Nov 2003, Mark R. Diggory wrote:

> I have several modifications I'm planning to make, but in the spirit of 
> consensus I want to propose them and attempt to get some agreement. So 
> math developer opinions on the subject would be good.
> 
> 1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
> 
> Gives this package a more "generic" position to hold more than just 
> "stat" distributions.
> 
> 2.) Like in my last emails concerning "Univariate" I would like to, (and 
> have done so in my checkout successfully) Make the following Class changes:
> 
> interface o.a.c.m.stat.StoreUnivariate -->
>             abstract class o.a.c.m.stat.DescriptiveStatistics
> 
> this actually becomes a factory class and uses Discovery to instantiate 
> new instances of the following implementations
> 
> *default implementation*
> o.a.c.m.stat.StoreUnivariateImpl -->
>            o.a.c.m.stat.univariate.StatisticsImpl
> 
> *alternate implementations*
> o.a.c.m.stat.UnivariateImpl -->
>            o.a.c.m.stat.univariate.StorelessStatisticsImpl
> 
> o.a.c.m.stat.ListUnivariateImpl -->
>            o.a.c.m.stat.univariate.ListStatisticsImpl
> 
> o.a.c.m.stat.BeanListUnivariateImpl -->
>            o.a.c.m.stat.univariate.BeanListStatisticsImpl
> 
> The benefit of this is that the Alternate Implementations can all be 
> instantiated from the o.a.c.m.stat.DescriptiveStatistics factories 
> newInstance(...) methods. Thus alternate implementations of 
> DescriptiveStatistics can be written as Service Providers and set in the 
> environment/JVM configuration. We can now write SP's for other tools 
> like Matlab, Mathematica, JLink, C++ libraries, R, Omegahat ... the list 
> goes on and on...
> 
> Someday, I'd like to see this design extended for Bivariate Statistics 
> and Regression Classes. Eventually for Random Number generation as well.
> 
> -Mark
> 
> 

-- 
      Matt Cliff            
      Cliff Consulting
      303.757.4912
      720.280.6324 (c)


      The label said install Windows 98 or better so I installed Linux.


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org