You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Brent Worden <br...@worden.org> on 2004/09/01 08:54:51 UTC

RE: [MATH] Summary proposed changes

> 1) Change the RealMatrix getEntry, getRow, getColumn methods to use
> 0-based indexing.

Looking at the implementation, I believe the current indexing is
satisfactory and I can't think of where using it with native arrays would be
overly burdensome or confusing.

As for letting the language dictate the indexing, I think this is a bad
practice for developing an API.  APIs are supposed to be language agnostic
and should exhibit the same behavior no matter the implementing language.
If we allow the language to dictate the behavior of an API method, its
possible the behavior will be different for other languages.  I feel these
situations should be avoided so the API is portable to a wide array of
languages, which I feel is a long-term goal of some of our developers.

> 2) Change the name of "BivariateRegression" to "UnivariateRegression" (or
> something else)

If we're bothering to change its name to make it less confusing, let's call
it what it is, SimpleLeastSquaresRegression.  If that is too long, then
SimpleRegression as least squares is the inferred method when one mentions
regression.

> 3) Change Variance to be configurable to generate the population
statistic.

Since population variance and sample variance are different statistics, they
should be different classes as that is the design we have chosen.

As for the static methods on the variance and standard deviation classes,
the javadoc should be changed to better explain the source of the mean
argument.  The comments should indicate the mean is pre-computed using the
same values that are going to be used to compute the variation estimate.
Any other mean passed in will result in the variation computation to be
unreliable.

> 4) Combine the univariate and multivariate packages, since it is confusing
> to separate statistics that focus on one variable and sometimes the word
> "univariate" is used in the context of multivariate techniques (e.g.
> "Univariate Anova").

"Regression is used to study relationships between measurable variables."
[Weisberg, 1985]

"Regression analysis is a statistical tool that utilizes the relations
between two or more quantitative variables..." [Neter, et al., 1985]

Both these statements indicate regression is a technique that involves more
than one variable.  Therefore, regression in general is a multivariate
technique.  The case where there is only one predictor is immaterial as
there are two variable quantities.  Would one call a model with one
predictor variable and two response variables a univariate technique?  I
wouldn't and I doubt if anyone else would.  The path we have chosen, by
placing procedures dealing with one variable in the univariate package and
all other procedures dealing with more than one variable is satisfactory and
makes for a good discriminant.

Brent Worden


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [MATH] Summary proposed changes

Posted by Kim van der Linde <ki...@kimvdlinde.com>.
Hi,

Brent Worden wrote:
>>1) Change the RealMatrix getEntry, getRow, getColumn methods to use
>>0-based indexing.
> 
> 
> Looking at the implementation, I believe the current indexing is
> satisfactory and I can't think of where using it with native arrays would be
> overly burdensome or confusing.

Well, so you think I requested this out of pure filosophical reasons. I 
am running into problems with it, that's why. But maybe I should just do 
it differently, and make a derived class from it and distribute that 
with the classes I am making.....

> APIs are supposed to be language agnostic

I think API's should be logical, and desinged such that they minimise 
errors.

> let's call
> it what it is, SimpleLeastSquaresRegression.  If that is too long, then
> SimpleRegression

Fine with me.

>>3) Change Variance to be configurable to generate the population
> statistic.
> 
> Since population variance and sample variance are different statistics, they
> should be different classes as that is the design we have chosen.

I disagree, but in that case I will follow the same way on these classes 
as mentioned for the Matrix classes.

>>4) Combine the univariate and multivariate packages, since it is confusing
>>to separate statistics that focus on one variable and sometimes the word
>>"univariate" is used in the context of multivariate techniques (e.g.
>>"Univariate Anova").

> Both these statements indicate regression is a technique that involves more
> than one variable.  Therefore, regression in general is a multivariate
> technique.  The case where there is only one predictor is immaterial as
> there are two variable quantities.  Would one call a model with one
> predictor variable and two response variables a univariate technique?  I
> wouldn't and I doubt if anyone else would.  The path we have chosen, by
> placing procedures dealing with one variable in the univariate package and
> all other procedures dealing with more than one variable is satisfactory and
> makes for a good discriminant.

See my response, this is not what I proposed. Anyway, common 
interpretation (even among my collegues who do nothing else that complex 
multivariate analyses) is that the one independent, one dependent 
regressions are univariate regressions, although they can see the logic 
as there are two variables.

But in that sense, the TTest should be within the multivariate package 
too. Both simple regression and t-tests are in the end simplified 
versions of the GLM using only one dependent and one independent variable.

Anyway, I thank you all for the help and I will just make derived 
classes were I need a different implementation as provided by this package.

Cheers,

Kim
-- 
http://www.kimvdlinde.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org