You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Piotr Kochański <pi...@uw.edu.pl> on 2004/01/31 18:21:18 UTC

[math] Re: "Straw man" release plan

Hello,

<cut/>
> 2. Use Mark's magical maven release-generator to cut a 1.0-B1 release 
> including everything currently in CVS other than the /experimental tree.
> (Confidence intervals, the Bootstrap and Multiple Regression will have to 
> wait until 1.1.)

As I understand, math is now in a freezed state and it is better not
to submit any new code (except of that, which is needed to have
1.0 ready obviously)? 

I have some bootstrap and standard error code ready, but I guess 
it's better to wait with submission until 1.0 will be out.

Piotr

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: "Straw man" release plan

Posted by Phil Steitz <ph...@steitz.com>.

> 
> As I understand load(double[][]) would compute Empirical Distribution
> Function for every bootstraped sample (provided from some other source).
> Then, instead of having 
> 
> SummaryStatistics sampleStats
> 
> we should provide 
> 
> SummaryStatistics[] sampleStats
> 
> where this array would contain SummaryStatistics calculated
> for every sample.  SummaryStatistics getSampleStats() would
> be changed as well.
> 
> Similarly other methods/objects in EmpiricalDistribution  
> would have to be modified (e.g. binStats would have to be 
> an array of ArrayLists, etc.).
> 
> Do I get your intentions right?

No, I was thinking that EmpiricalDistribution could be used to model the 
bootstrap distribution of a statistic directly.  So, e.g., to compute 
bootstrap confidence intervals for a statistic S, the method would be:
1. Compute N values of S based on bootstrap samples.  Call the resulting 
N-length double array sHat[].
2. Load an EmpiricalDisribution using sHat[].
3. Use the percentiles of the EmpiricalDistribution to compute confidence 
intervals (ideally including bias correction. Cf. Efron & Tibshirani, 
_Intro to the Bootstrap_, chs. 13-14).

<snip/>

> 
> Two comments concerning EmpiricalDistribution 
> 1. Probably it would be nice to have load(double[]) method

Yes.  Needed for above.  Patches welcome :-)

> 2. Instead of
>    ArrayList getBinStats();
> there could be 
>    List getBinStats();

Yes.  It should return a List.

Thanks for the feedback.

Phil


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: "Straw man" release plan

Posted by Piotr Kochański <pi...@uw.edu.pl>.

Phil Steitz wrote:

> Thinking about how this will eventually work, it has occurred to me that 
> EmpiricalDistribution could be used to digest / represent bootstrap 
> distributions.  Since we want the interface for EmpiricalDistribution to 
> be complete for 1.0, we need to make sure that bootstrap data can be 
> loaded into EmpiricalDistribution conveniently (if this makes sense), so I 
> have been thinking about adding load() methods to EmpiricalDistribution 
> that take double[] arrays and streams as values, as well as an addValue() 
> method.  Does this make sense?  I would also appreciate any comments / 
> patches on how to improve the EmpiricalDistribution interface or 
> EmpiricalDistributionImpl.  If refactoring or even holding this from the 
> release are in order, I want to make sure that we do it.

As I understand load(double[][]) would compute Empirical Distribution
Function for every bootstraped sample (provided from some other source).
Then, instead of having 

SummaryStatistics sampleStats

we should provide 

SummaryStatistics[] sampleStats

where this array would contain SummaryStatistics calculated
for every sample.  SummaryStatistics getSampleStats() would
be changed as well.

Similarly other methods/objects in EmpiricalDistribution  
would have to be modified (e.g. binStats would have to be 
an array of ArrayLists, etc.).

Do I get your intentions right?

The zeroth row of every matrix could be reserved for original
sample and the rest for bootstrapped results (if they can be
calculated, i.e. samples are given). This can be achieved but
some effort has to be made to make it simple to use for those,
who does not care about bootstrap and want to get results
based only on the original sample. 

The other thing is that such an extension would be very
usefull as long as we play with such bootstrap algorithms,
which use those statistics which are memebers of SummaryStatistics.

Often this is not the case (classic example is Median or Trimmed Mean,
which is not among SummaryStatistics). Sometimes it is also
necessary (or more comfortable) to operate on the raw bootstrap
samples, not EDF calculated from those samples. In this two
cases bootstrap embeded into EmpiricalDistribution would not
be that useful.

Two comments concerning EmpiricalDistribution 
1. Probably it would be nice to have load(double[]) method
2. Instead of
   ArrayList getBinStats();
there could be 
   List getBinStats();

although I can't imagine practical situation, where other List then
ArrayList would be better.

Piotr

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] Re: "Straw man" release plan

Posted by Phil Steitz <ph...@steitz.com>.

Piotr Kochan'ski wrote:
> Hello,
> 
> <cut/>
> 
>>2. Use Mark's magical maven release-generator to cut a 1.0-B1 release 
>>including everything currently in CVS other than the /experimental tree.
>>(Confidence intervals, the Bootstrap and Multiple Regression will have to 
>>wait until 1.1.)
> 
> 
> As I understand, math is now in a freezed state and it is better not
> to submit any new code (except of that, which is needed to have
> 1.0 ready obviously)? 

"Frozen" is a bit strong.  My preference would be to hold off on 
confidence intervals and the bootstrap until we get 1.0 out.  I suppose we 
could add the bootstrap by itself, but I think it would be better to add 
it as part of a more general solution for resampling-based inference (incl 
non-parametric conf. intervals) and that will require some discussion.

Thinking about how this will eventually work, it has occurred to me that 
EmpiricalDistribution could be used to digest / represent bootstrap 
distributions.  Since we want the interface for EmpiricalDistribution to 
be complete for 1.0, we need to make sure that bootstrap data can be 
loaded into EmpiricalDistribution conveniently (if this makes sense), so I 
have been thinking about adding load() methods to EmpiricalDistribution 
that take double[] arrays and streams as values, as well as an addValue() 
method.  Does this make sense?  I would also appreciate any comments / 
patches on how to improve the EmpiricalDistribution interface or 
EmpiricalDistributionImpl.  If refactoring or even holding this from the 
release are in order, I want to make sure that we do it.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org