You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/11/07 22:54:23 UTC
[math] Proposal for Package restructuring and Class renaming
I have several modifications I'm planning to make, but in the spirit of
consensus I want to propose them and attempt to get some agreement. So
math developer opinions on the subject would be good.
1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
Gives this package a more "generic" position to hold more than just
"stat" distributions.
2.) Like in my last emails concerning "Univariate" I would like to, (and
have done so in my checkout successfully) Make the following Class changes:
interface o.a.c.m.stat.StoreUnivariate -->
abstract class o.a.c.m.stat.DescriptiveStatistics
this actually becomes a factory class and uses Discovery to instantiate
new instances of the following implementations
*default implementation*
o.a.c.m.stat.StoreUnivariateImpl -->
o.a.c.m.stat.univariate.StatisticsImpl
*alternate implementations*
o.a.c.m.stat.UnivariateImpl -->
o.a.c.m.stat.univariate.StorelessStatisticsImpl
o.a.c.m.stat.ListUnivariateImpl -->
o.a.c.m.stat.univariate.ListStatisticsImpl
o.a.c.m.stat.BeanListUnivariateImpl -->
o.a.c.m.stat.univariate.BeanListStatisticsImpl
The benefit of this is that the Alternate Implementations can all be
instantiated from the o.a.c.m.stat.DescriptiveStatistics factories
newInstance(...) methods. Thus alternate implementations of
DescriptiveStatistics can be written as Service Providers and set in the
environment/JVM configuration. We can now write SP's for other tools
like Matlab, Mathematica, JLink, C++ libraries, R, Omegahat ... the list
goes on and on...
Someday, I'd like to see this design extended for Bivariate Statistics
and Regression Classes. Eventually for Random Number generation as well.
-Mark
--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://osprey.hmdc.harvard.edu
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by Al Chou <ho...@yahoo.com>.
--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Al Chou wrote:
> >
> > OK, I see. The one thing I notice is that the names are getting awfully
> long,
> > especially for the non-default case. I guess that's a price we pay for
> having
> > descriptive (no play on words intended) names like
> DescriptiveStatistics....
>
> Maybe the Implementations could be abbreviated somewhat
>
> o.a.c.math.stat.DescriptiveStatistics
>
> o.a.c.math.stat.StorelessDscrStatsImpl
> o.a.c.math.stat.DscrStatsImpl
>
> We could also consider pushing the actual implementation off into its
> own packages
>
> o.a.c.math.stat.impl.StorelessDscrStatsImpl
> o.a.c.math.stat.impl.DscrStatsImpl
>
> This would even push all the univariate stat providers off into this
> hierarchy as well
>
> o.a.c.math.stat.impl.univar.StorelessUnivariateStatistic
> o.a.c.math.stat.impl.univar.UnivariateStatistic
Too much renaming and reorganization. I didn't mean to complain too loudly,
and if the result is to use abbreviations, I retract my comments. I probably
should have given more than half a second's thought to what alternative names
might be shorter, but in the absence of well-thought-out shorter names, I much
prefer the current proposal of DescriptiveStatistics. Never use abbreviations
unless everyone already knows them (e.g., sin for sine), I say.
Al
__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Al Chou wrote:
>
> OK, I see. The one thing I notice is that the names are getting awfully long,
> especially for the non-default case. I guess that's a price we pay for having
> descriptive (no play on words intended) names like DescriptiveStatistics....
Maybe the Implementations could be abbreviated somewhat
o.a.c.math.stat.DescriptiveStatistics
o.a.c.math.stat.StorelessDscrStatsImpl
o.a.c.math.stat.DscrStatsImpl
We could also consider pushing the actual implementation off into its
own packages
o.a.c.math.stat.impl.StorelessDscrStatsImpl
o.a.c.math.stat.impl.DscrStatsImpl
This would even push all the univariate stat providers off into this
hierarchy as well
o.a.c.math.stat.impl.univar.StorelessUnivariateStatistic
o.a.c.math.stat.impl.univar.UnivariateStatistic
-M.
--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by Al Chou <ho...@yahoo.com>.
--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Al Chou wrote:
> > --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
...
> >>2.) Like in my last emails concerning "Univariate" I would like to, (and
> >>have done so in my checkout successfully) Make the following Class changes:
> >>
> >>interface o.a.c.m.stat.StoreUnivariate -->
> >> abstract class o.a.c.m.stat.DescriptiveStatistics
> >>
> >>this actually becomes a factory class and uses Discovery to instantiate
> >>new instances of the following implementations
> >>
> >>*default implementation*
> >>o.a.c.m.stat.StoreUnivariateImpl -->
> >> o.a.c.m.stat.univariate.StatisticsImpl
> >
> >
> > Forgive me for not refamiliarizing myself with the code first, but should
> the
> > storeless version perhaps be the default implementation instead? What do
> we
> > lose by going that way? I'm thinking it would be nice to keep memory usage
> > lower if possible.
>
> The Storeless version (UnivariateImpl) doesn't support rank Statistics
> because of its storeless nature, the more fully featured implementation
> is StoreUnivariateImpl, it does everything, but has the limitation of
> requiring storage of the values. These are two different implementations
> with different internal storage configurations. I choose
> StoreUnivariateImpl because I think the default should have full
> capabilities.
>
> The storeless version is more of an Optimized solution, It probably wise
> to suggest that one use it only if one needs that functionality (ie
> trying to get moments across huge datasets or realtime value streams of
> sorts)
That sounds reasonable. Thanks for the refresher (I looked at the current code
based on your remarks, too).
> > Before we go overboard, can you give a quick example of instantiating one
> of
> > the implementations? Or perhaps, both the default and one alternative
...
> Yes, like that
>
> For the default Discovery configured implementation:
>
> DescriptiveStatistics stats = DescriptiveStatistics.newInstance();
>
> stats.addValue(5.0);
> ...
>
> double mean = stats.getMean();
>
>
> For any alternate Implementations:
>
> DescriptiveStatistics stats =
> DescriptiveStatistics.newInstance(StorelessDescriptiveStatisticsImpl.class);
>
> stats.addValue(5.0);
> ...
>
> double mean = stats.getMean();
>
> and/or
>
> DescriptiveStatistics stats =
>
DescriptiveStatistics.newInstance("o.a.c.math.stat.impl.StorelessDescriptiveStatisticsImpl");
>
> stats.addValue(5.0);
> ...
>
> double mean = stats.getMean();
>
> depending n which people like more
OK, I see. The one thing I notice is that the names are getting awfully long,
especially for the non-default case. I guess that's a price we pay for having
descriptive (no play on words intended) names like DescriptiveStatistics....
Al
__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Al Chou wrote:
> --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
>
>>I have several modifications I'm planning to make, but in the spirit of
>>consensus I want to propose them and attempt to get some agreement. So
>>math developer opinions on the subject would be good.
>>
>>1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
>>
>>Gives this package a more "generic" position to hold more than just
>>"stat" distributions.
>
>
> What other kinds of distributions did you have in mind? I'm asking out of
> complete ignorance.
>
>
>
>>2.) Like in my last emails concerning "Univariate" I would like to, (and
>>have done so in my checkout successfully) Make the following Class changes:
>>
>>interface o.a.c.m.stat.StoreUnivariate -->
>> abstract class o.a.c.m.stat.DescriptiveStatistics
>>
>>this actually becomes a factory class and uses Discovery to instantiate
>>new instances of the following implementations
>>
>>*default implementation*
>>o.a.c.m.stat.StoreUnivariateImpl -->
>> o.a.c.m.stat.univariate.StatisticsImpl
>
>
> Forgive me for not refamiliarizing myself with the code first, but should the
> storeless version perhaps be the default implementation instead? What do we
> lose by going that way? I'm thinking it would be nice to keep memory usage
> lower if possible.
>
The Storeless version (UnivariateImpl) doesn't support rank Statistics
because of its storeless nature, the more fully featured implementation
is StoreUnivariateImpl, it does everything, but has the limitation of
requiring storage of the values. These are two different implementations
with different internal storage configurations. I choose
StoreUnivariateImpl because I think the default should have full
capabilities.
The storeless version is more of an Optimized solution, It probably wise
to suggest that one use it only if one needs that functionality (ie
trying to get moments across huge datasets or realtime value streams of
sorts)
>
>
>>*alternate implementations*
>>o.a.c.m.stat.UnivariateImpl -->
>> o.a.c.m.stat.univariate.StorelessStatisticsImpl
>>
>>o.a.c.m.stat.ListUnivariateImpl -->
>> o.a.c.m.stat.univariate.ListStatisticsImpl
>>
>>o.a.c.m.stat.BeanListUnivariateImpl -->
>> o.a.c.m.stat.univariate.BeanListStatisticsImpl
>>
>>The benefit of this is that the Alternate Implementations can all be
>>instantiated from the o.a.c.m.stat.DescriptiveStatistics factories
>>newInstance(...) methods. Thus alternate implementations of
>>DescriptiveStatistics can be written as Service Providers and set in the
>>environment/JVM configuration. We can now write SP's for other tools
>>like Matlab, Mathematica, JLink, C++ libraries, R, Omegahat ... the list
>>goes on and on...
>>
>>Someday, I'd like to see this design extended for Bivariate Statistics
>>and Regression Classes. Eventually for Random Number generation as well.
>
>
> Before we go overboard, can you give a quick example of instantiating one of
> the implementations? Or perhaps, both the default and one alternative
> implementation? Is it:
>
> import org.apache.commons.math.stat.*;
>
> ...
>
> StoreUnivariateImpl defaultImplementation =
DescriptiveStatistics.newInstance()
> ;
> StoreUnivariateImpl storagelessImplementation =
> DescriptiveStatistics.newInstance( StorelessStatisticsImpl ) ;
>
Yes, like that
For the default Discovery configured implementation:
DescriptiveStatistics stats = DescriptiveStatistics.newInstance();
stats.addValue(5.0);
...
double mean = stats.getMean();
For any alternate Implementations:
DescriptiveStatistics stats =
DescriptiveStatistics.newInstance(StorelessDescriptiveStatisticsImpl.class);
stats.addValue(5.0);
...
double mean = stats.getMean();
and/or
DescriptiveStatistics stats =
DescriptiveStatistics.newInstance("o.a.c.math.stat.impl.StorelessDescriptiveStatisticsImpl");
stats.addValue(5.0);
...
double mean = stats.getMean();
depending n which people like more
--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://osprey.hmdc.harvard.edu
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by Al Chou <ho...@yahoo.com>.
--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Al Chou wrote:
> >
> > Would you move the existing ones into
> > org.apache.commons.math.distributions.statistical or something so that the
> > probability distributions could be organized together under *.probability?
> > Also, I noticed that the current package uses the singular "distribution"
> > rather than "distributions".
>
> I suspect its unclear where this boundary would be drawn, I think all
> the distributions would be both beneficial for both random number
> distributions and statistical usage. I guess if it became clear that
> there was a strong separation between the two then separate packages
> would be warranted, but I'm not convinced of a difference. Yourself and
> others may have more informed opinions.
>
> -Mark
I don't have an informed opinion, so I'll fall back to the default opinion of
"lump everything together until/unless it's clear how to split it up".
Al
__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Al Chou wrote:
>
> Would you move the existing ones into
> org.apache.commons.math.distributions.statistical or something so that the
> probability distributions could be organized together under *.probability?
> Also, I noticed that the current package uses the singular "distribution"
> rather than "distributions".
I suspect its unclear where this boundary would be drawn, I think all
the distributions would be both beneficial for both random number
distributions and statistical usage. I guess if it became clear that
there was a strong separation between the two then separate packages
would be warranted, but I'm not convinced of a difference. Yourself and
others may have more informed opinions.
-Mark
--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by Al Chou <ho...@yahoo.com>.
--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Al Chou wrote:
> > --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> >
> >>I have several modifications I'm planning to make, but in the spirit of
> >>consensus I want to propose them and attempt to get some agreement. So
> >>math developer opinions on the subject would be good.
> >>
> >>1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
> >>
> >>Gives this package a more "generic" position to hold more than just
> >>"stat" distributions.
> >
> >
> > What other kinds of distributions did you have in mind? I'm asking out of
> > complete ignorance.
> >
>
> Probability Distributions (Gamma, Beta, Poisson, Exponential,
> Logarithmic, Hyperbolic ...) great examples of these are in Colt's
>
> cern.jet.stat and cern.jet.random packages.
>
> ... but are bound up as implementations of RandomNumberGeneration
> classes...not that that a bad thing.
>
> Eventually ours could be used in random number generation, I think they
> should be a more dominant package.
> -Mark
Would you move the existing ones into
org.apache.commons.math.distributions.statistical or something so that the
probability distributions could be organized together under *.probability?
Also, I noticed that the current package uses the singular "distribution"
rather than "distributions".
Al
=====
Albert Davidson Chou
Get answers to Mac questions at http://www.Mac-Mgrs.org/ .
__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Al Chou wrote:
> --- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
>
>>I have several modifications I'm planning to make, but in the spirit of
>>consensus I want to propose them and attempt to get some agreement. So
>>math developer opinions on the subject would be good.
>>
>>1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
>>
>>Gives this package a more "generic" position to hold more than just
>>"stat" distributions.
>
>
> What other kinds of distributions did you have in mind? I'm asking out of
> complete ignorance.
>
Probability Distributions (Gamma, Beta, Poisson, Exponential,
Logarithmic, Hyperbolic ...) great examples of these are in Colt's
cern.jet.stat and cern.jet.random packages.
... but are bound up as implementations of RandomNumberGeneration
classes...not that that a bad thing.
Eventually ours could be used in random number generation, I think they
should be a more dominant package.
-Mark
--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://osprey.hmdc.harvard.edu
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by Al Chou <ho...@yahoo.com>.
--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> I have several modifications I'm planning to make, but in the spirit of
> consensus I want to propose them and attempt to get some agreement. So
> math developer opinions on the subject would be good.
>
> 1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
>
> Gives this package a more "generic" position to hold more than just
> "stat" distributions.
What other kinds of distributions did you have in mind? I'm asking out of
complete ignorance.
> 2.) Like in my last emails concerning "Univariate" I would like to, (and
> have done so in my checkout successfully) Make the following Class changes:
>
> interface o.a.c.m.stat.StoreUnivariate -->
> abstract class o.a.c.m.stat.DescriptiveStatistics
>
> this actually becomes a factory class and uses Discovery to instantiate
> new instances of the following implementations
>
> *default implementation*
> o.a.c.m.stat.StoreUnivariateImpl -->
> o.a.c.m.stat.univariate.StatisticsImpl
Forgive me for not refamiliarizing myself with the code first, but should the
storeless version perhaps be the default implementation instead? What do we
lose by going that way? I'm thinking it would be nice to keep memory usage
lower if possible.
> *alternate implementations*
> o.a.c.m.stat.UnivariateImpl -->
> o.a.c.m.stat.univariate.StorelessStatisticsImpl
>
> o.a.c.m.stat.ListUnivariateImpl -->
> o.a.c.m.stat.univariate.ListStatisticsImpl
>
> o.a.c.m.stat.BeanListUnivariateImpl -->
> o.a.c.m.stat.univariate.BeanListStatisticsImpl
>
> The benefit of this is that the Alternate Implementations can all be
> instantiated from the o.a.c.m.stat.DescriptiveStatistics factories
> newInstance(...) methods. Thus alternate implementations of
> DescriptiveStatistics can be written as Service Providers and set in the
> environment/JVM configuration. We can now write SP's for other tools
> like Matlab, Mathematica, JLink, C++ libraries, R, Omegahat ... the list
> goes on and on...
>
> Someday, I'd like to see this design extended for Bivariate Statistics
> and Regression Classes. Eventually for Random Number generation as well.
Before we go overboard, can you give a quick example of instantiating one of
the implementations? Or perhaps, both the default and one alternative
implementation? Is it:
import org.apache.commons.math.stat.*;
...
StoreUnivariateImpl defaultImplementation = DescriptiveStatistics.newInstance()
;
StoreUnivariateImpl storagelessImplementation =
DescriptiveStatistics.newInstance( StorelessStatisticsImpl ) ;
Al
=====
Albert Davidson Chou
Get answers to Mac questions at http://www.Mac-Mgrs.org/ .
__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Proposal for Package restructuring and Class renaming
Posted by Matt Cliff <ma...@mattcliff.com>.
I agree
On Fri, 7 Nov 2003, Mark R. Diggory wrote:
> I have several modifications I'm planning to make, but in the spirit of
> consensus I want to propose them and attempt to get some agreement. So
> math developer opinions on the subject would be good.
>
> 1.) o.a.c.math.stat.distributions --> o.a.c.math.distributions
>
> Gives this package a more "generic" position to hold more than just
> "stat" distributions.
>
> 2.) Like in my last emails concerning "Univariate" I would like to, (and
> have done so in my checkout successfully) Make the following Class changes:
>
> interface o.a.c.m.stat.StoreUnivariate -->
> abstract class o.a.c.m.stat.DescriptiveStatistics
>
> this actually becomes a factory class and uses Discovery to instantiate
> new instances of the following implementations
>
> *default implementation*
> o.a.c.m.stat.StoreUnivariateImpl -->
> o.a.c.m.stat.univariate.StatisticsImpl
>
> *alternate implementations*
> o.a.c.m.stat.UnivariateImpl -->
> o.a.c.m.stat.univariate.StorelessStatisticsImpl
>
> o.a.c.m.stat.ListUnivariateImpl -->
> o.a.c.m.stat.univariate.ListStatisticsImpl
>
> o.a.c.m.stat.BeanListUnivariateImpl -->
> o.a.c.m.stat.univariate.BeanListStatisticsImpl
>
> The benefit of this is that the Alternate Implementations can all be
> instantiated from the o.a.c.m.stat.DescriptiveStatistics factories
> newInstance(...) methods. Thus alternate implementations of
> DescriptiveStatistics can be written as Service Providers and set in the
> environment/JVM configuration. We can now write SP's for other tools
> like Matlab, Mathematica, JLink, C++ libraries, R, Omegahat ... the list
> goes on and on...
>
> Someday, I'd like to see this design extended for Bivariate Statistics
> and Regression Classes. Eventually for Random Number generation as well.
>
> -Mark
>
>
--
Matt Cliff
Cliff Consulting
303.757.4912
720.280.6324 (c)
The label said install Windows 98 or better so I installed Linux.
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org