You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by Kim van der Linde <ki...@kimvdlinde.com> on 2004/08/27 00:55:23 UTC

[MATH] Matrix indices

Hi All,

I ran into a problem with the RealMatrixImpl class. The class is 
designed such that it uses the default 1 to n counting for the rows and 
columns. However, JAVA has as a default the 0 tot n-1 system. i now run 
into the problem that some standard methods return an array of n (0 tot 
n-1) while the methods require n+1 cells. I could of course write a 
method to create a n+1 array, but I do not see much problems with using 
the 0 to n-1 system of Java for the matrices. However, I could be 
completly wrong. BTW, I saw that the JAMA packages uses the underlying 
JAVA indexes, not the 1 to n system.

I have no problem changing all the methods if you all agree that we 
should change it....

Cheers,

Kim
-- 
http://www.kimvdlinde.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Al Chou <ho...@yahoo.com>.

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Al Chou wrote:
> > My personal preference would originally have been to use 1-based indexing
> > (actually, I really prefer Fortran's ability to let the user define the
> lower
> > bound index value in each array dimension if they so choose, even though
> that
> > facility is not that often used), but that was based on my assumption that
> > commons-math is a library for enabling people who know some math to more
> easily
> > do math in Java.  However, the few use cases we've heard about seem to be
> split
> > between that kind of user and Java programmers who have some need for math
> > capabilities.  The former class of user would more likely want/expect
> 1-based
> > indexing, the latter would expect 0-based indexing.  Quoting from our own
> home
> > page:
> > 
> > "Guiding principles:
> > 
> > 1. Real-world application use cases will determine development priority.
> > ...
> > 4. In situations where multiple standard algorithms exist, a  Strategy
> pattern
> > will be used to support multiple implementations."
> > 
> > Unfortunately, nowhere in our charter do I know of a statement that would
> help
> > us decide the priority between the two classes of users I described above
> (I
> > would be happy if someone pointed out some passage of our charter that I
> don't
> > know about that would resolve this decision).
> > 
> > I think ideally we would provide facilities to handle both 0- and 1-based
> > indexing (and even arbitrary lower-bound-based indexing, given that I'm
> > pipe-dreaming here).
> > 
> > The _Numerical Recipes_ solution to this problem is to stick to 1-based
> > indexing throughout, as all algorithms are described in standard
> mathematical
> > notation where the first index in each dimension is 1 (probably also
> because it
> > made for easier [possibly at least partially automated] translation from
> the
> > original 1-based Fortran source code into other languages).  For 0-based
> > languages (viz., C and presumably C++) they provided array data types that
> > automatically translated between the mathematical 1-based notation and the
> > underlying 0-based array data structure, at the expense of having one extra
> > element in each dimension of each array, I believe.  Perhaps we could do
> > something similar and even provide a user setting that would choose between
> > 0-based and 1-based behavior -- probably a global setting, to prevent the
> kind
> > of confusion that would result from mixing the two representations
> > unintentionally.
> > 
> > 
> > Al
> > 
> 
> At first this sounds like a concession to both sides of the issue, But, 
> my fear is that this will introduce more complexity and confuse the 
> user. I would say one or the other, not both. I think theres too much 
> ridiculous partisanship when it comes to subjects like "whats the best 
> language". Its amazing what people with fixate on and defend as 
> important (myself included). Thats why I suggest indexing in that of the 
> implementation language. If I were working with Fortran, I'd argue to 
> use 1 <= x < n.
> 
> I'm sure it doesn't matter very much either way...
> 
> -Mark

I think switching back and forth between 1-based mathematical notation and
0-based programming language notation would be horrible.  My preference would
be to stay in mathematical notation throughout my mathematical code.  I wonder
if it would work to return (when asked) larger-by-one double arrays that only
contain actual data in elements (1,1) through (n,n).  I guess there's a problem
of what to put in the zeroth elements, given that these are double arrays (not
object arrays), so that the values put in those elements would have to be valid
double values.  One could mistakenly access and use the zeroth elements without
necessarily recognizing that their very existence should be ignored.

In any case, I feel that whatever we choose to do should feel as natural as
possible for the majority of users.  I don't know who makes up the majority of
this library's users, but I suspect it's Java programmers who just need a bit
of math in their code.  If so, that _may_ mean 0-based indexing is what's going
to seem more natural to the majority of commons-math users (I do worry about
such users getting confused when they read 1-based mathematical explanations of
what our routines do, assuming they are not familiar with the mathematics and
are having to reconcile a new concept in an unfamiliar notation with what
they're familiar with as the programming language's natural notation).


Al

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.


Al Chou wrote:
> My personal preference would originally have been to use 1-based indexing
> (actually, I really prefer Fortran's ability to let the user define the lower
> bound index value in each array dimension if they so choose, even though that
> facility is not that often used), but that was based on my assumption that
> commons-math is a library for enabling people who know some math to more easily
> do math in Java.  However, the few use cases we've heard about seem to be split
> between that kind of user and Java programmers who have some need for math
> capabilities.  The former class of user would more likely want/expect 1-based
> indexing, the latter would expect 0-based indexing.  Quoting from our own home
> page:
> 
> "Guiding principles:
> 
> 1. Real-world application use cases will determine development priority.
> ...
> 4. In situations where multiple standard algorithms exist, a  Strategy pattern
> will be used to support multiple implementations."
> 
> Unfortunately, nowhere in our charter do I know of a statement that would help
> us decide the priority between the two classes of users I described above (I
> would be happy if someone pointed out some passage of our charter that I don't
> know about that would resolve this decision).
> 
> I think ideally we would provide facilities to handle both 0- and 1-based
> indexing (and even arbitrary lower-bound-based indexing, given that I'm
> pipe-dreaming here).
> 
> The _Numerical Recipes_ solution to this problem is to stick to 1-based
> indexing throughout, as all algorithms are described in standard mathematical
> notation where the first index in each dimension is 1 (probably also because it
> made for easier [possibly at least partially automated] translation from the
> original 1-based Fortran source code into other languages).  For 0-based
> languages (viz., C and presumably C++) they provided array data types that
> automatically translated between the mathematical 1-based notation and the
> underlying 0-based array data structure, at the expense of having one extra
> element in each dimension of each array, I believe.  Perhaps we could do
> something similar and even provide a user setting that would choose between
> 0-based and 1-based behavior -- probably a global setting, to prevent the kind
> of confusion that would result from mixing the two representations
> unintentionally.
> 
> 
> Al
> 

At first this sounds like a concession to both sides of the issue, But, 
my fear is that this will introduce more complexity and confuse the 
user. I would say one or the other, not both. I think theres too much 
ridiculous partisanship when it comes to subjects like "whats the best 
language". Its amazing what people with fixate on and defend as 
important (myself included). Thats why I suggest indexing in that of the 
implementation language. If I were working with Fortran, I'd argue to 
use 1 <= x < n.

I'm sure it doesn't matter very much either way...

-Mark

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Al Chou <ho...@yahoo.com>.

--- "Mark R. Diggory" <md...@latte.harvard.edu> wrote:
> Phil Steitz wrote:
> 
> > Mark R. Diggory wrote:
> > 
> >> I can give a couple other examples of matrices in java using 0 to n-1.
> >>
> >> Colt:
> >> http://dsd.lbl.gov/~hoschek/colt/api/cern/colt/matrix/DoubleMatrix2D.html
> >>
> >>> A matrix has a number of rows and columns, which are assigned upon 
> >>> instance construction - The matrix's size is then rows()*columns(). 
> >>> Elements are accessed via [row,column] coordinates. Legal coordinates
> >>>  range from [0,0] to [rows()-1,columns()-1]. Any attempt to access an
> >>>  element at a coordinate column<0 || column>=columns() || row<0 || 
> >>> row>=rows() will throw an IndexOutOfBoundsException.
> >>
> >>
> >>
> >> Ninja:
> >>
> >>> 0 <= index[i] < size(i)
> >>
> >>
> >>
> >> Most importantly, the JSR 83 for Multiarray is indexed in this manner:
> >>
> >> JSR 83:
> >> http://jcp.org/en/jsr/detail?id=83
> >>
> >>> Elements of a multiarray are identified by their indices along each
> >>> axis. Let a d-dimensional array A of elemental type T have extent nj
> >>> along its j-th axis, j = 0,...,d-1. Then, a valid index ij along the
> >>> j-th axis must be greater than or equal to zero and less than nj. An
> >>> attempt to reference an element A[i0,i1,...,id-1] with any invalid
> >>> index ij causes an ArrayIndexOutOfBoundsException to be thrown.
> >>
> >>
> >>
> >> I would agree with Kim because I feel that the implementation language
> >> should dictate the indexing strategy, not mathematical notation. It
> >> makes integrating the package other API's more consistent and easier for
> >> the user to implement.
> > 
> > 
> > I disagree, precisely for the reason that a matrix is NOT a MultiArray.
> 
> True, its an interface which may one day exist on a multiarray 
> implementation.
> 
> > I also disagree strongly that implementation details should determine 
> > API definition.  
> 
> This is a grey area for me because I do agree with your point about 
> Interfaces/Implementations, but, this is a java Interface in a java 
> environment, where users are encouraged to reference arrays using 0 <= i 
> < n, So I guess from my perspective, its an issue that is actually 
> outside the whole Interface/Implementation subject.
> 
> > The fact that Colt, Jama, and JADE all use 0-based 
> > indexing is troubling, however.  I would prefer to stick to the standard 
> > math notation, but if other [math] committers feel strongly about this, 
> > I could be talked into making this change.  Since it is an interface 
> > change, this would need to be done before 1.0.  I was about to close the 
> > 1.0-RC1 vote and send an annnouncment.  I will hold off until I hear 
> > others weigh in on this.
> > 
> > Phil
> > 
> 
> True, Lets give it a little room for discussion, will others in the 
> group please verbalize their opinion on this issue?

My personal preference would originally have been to use 1-based indexing
(actually, I really prefer Fortran's ability to let the user define the lower
bound index value in each array dimension if they so choose, even though that
facility is not that often used), but that was based on my assumption that
commons-math is a library for enabling people who know some math to more easily
do math in Java.  However, the few use cases we've heard about seem to be split
between that kind of user and Java programmers who have some need for math
capabilities.  The former class of user would more likely want/expect 1-based
indexing, the latter would expect 0-based indexing.  Quoting from our own home
page:

"Guiding principles:

1. Real-world application use cases will determine development priority.
...
4. In situations where multiple standard algorithms exist, a  Strategy pattern
will be used to support multiple implementations."

Unfortunately, nowhere in our charter do I know of a statement that would help
us decide the priority between the two classes of users I described above (I
would be happy if someone pointed out some passage of our charter that I don't
know about that would resolve this decision).

I think ideally we would provide facilities to handle both 0- and 1-based
indexing (and even arbitrary lower-bound-based indexing, given that I'm
pipe-dreaming here).

The _Numerical Recipes_ solution to this problem is to stick to 1-based
indexing throughout, as all algorithms are described in standard mathematical
notation where the first index in each dimension is 1 (probably also because it
made for easier [possibly at least partially automated] translation from the
original 1-based Fortran source code into other languages).  For 0-based
languages (viz., C and presumably C++) they provided array data types that
automatically translated between the mathematical 1-based notation and the
underlying 0-based array data structure, at the expense of having one extra
element in each dimension of each array, I believe.  Perhaps we could do
something similar and even provide a user setting that would choose between
0-based and 1-based behavior -- probably a global setting, to prevent the kind
of confusion that would result from mixing the two representations
unintentionally.

Al

> thnx,
> Mark
> 
> >>
> >> -Mark
> >>
> >> Kim van der Linde wrote:
> >>
> >>> Hi Phil,
> >>>
> >>> With the 1 base system, I keep casting back and forth between the 0 
> >>> based underlying java system and the 1 based matrix system. But only 
> >>> in selected cases, not as a general rule. It also requires me to make
> >>>  specific methods just to increase the row number by one, as the 
> >>> method returns them at the 0 based system. This happens always as 
> >>> soon as you use a matrix or array by itsself and with matrcies 
> >>> combined, as I do. The fact that java uses the "0-based" notation 
> >>> anyways conflicts with the mathematical notation.... Why not stay 
> >>> with that....???
> >>>
> >>> As most methods in the RealMatrixImpl return a RealMatrix and NOT a 
> >>> RealMatrixImpl, the methods need to be in the RealMAtrix itself also.
> >>>  There is no option to create a RealMatrixImpl directly from a 
> >>> RealMatrix, and requires casting around with new 
> >>> RealMatrixImpl(RealMAtrix.getArray()); MatrixUtil depends on how... 
> >>> But results in a lot of cating around also unless implemented as 
> >>> static's....
> >>>
> >>> Kim
> >>>
> >>> Phil Steitz wrote:
> >>>
> >>>> I need to understand better exactly what the problem is with the 
> >>>> indexing.  I don't see anything wrong with the current 
> >>>> implementation.  The element accessor methods use standard matrix 
> >>>> notation, which starts with index = 1 in each case.  The methods 
> >>>> that provide access to copies of the underlying double[][] arrays 
> >>>> are for efficiency and return arrays which are correctly sized to 
> >>>> hold the matrix data.  I would be -1 to changing the matrix accessor 
> >>>> methods (getEntry, setEntry, getRow, getColumn) to be "0-based" as 
> >>>> this conflicts with standard mathematical notation.
> >>>>
> >>>> I am OK with adding the additional methods above to (post 1.0) 
> >>>> RealMatrixImpl or a MatrixUtils class.
> >>>>
> >>>> Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Summary proposed changes

Posted by Kim van der Linde <ki...@kimvdlinde.com>.

Hi,

Brent Worden wrote:
>>1) Change the RealMatrix getEntry, getRow, getColumn methods to use
>>0-based indexing.
> 
> 
> Looking at the implementation, I believe the current indexing is
> satisfactory and I can't think of where using it with native arrays would be
> overly burdensome or confusing.

Well, so you think I requested this out of pure filosophical reasons. I 
am running into problems with it, that's why. But maybe I should just do 
it differently, and make a derived class from it and distribute that 
with the classes I am making.....

> APIs are supposed to be language agnostic

I think API's should be logical, and desinged such that they minimise 
errors.

> let's call
> it what it is, SimpleLeastSquaresRegression.  If that is too long, then
> SimpleRegression

Fine with me.

>>3) Change Variance to be configurable to generate the population
> statistic.
> 
> Since population variance and sample variance are different statistics, they
> should be different classes as that is the design we have chosen.

I disagree, but in that case I will follow the same way on these classes 
as mentioned for the Matrix classes.

>>4) Combine the univariate and multivariate packages, since it is confusing
>>to separate statistics that focus on one variable and sometimes the word
>>"univariate" is used in the context of multivariate techniques (e.g.
>>"Univariate Anova").

> Both these statements indicate regression is a technique that involves more
> than one variable.  Therefore, regression in general is a multivariate
> technique.  The case where there is only one predictor is immaterial as
> there are two variable quantities.  Would one call a model with one
> predictor variable and two response variables a univariate technique?  I
> wouldn't and I doubt if anyone else would.  The path we have chosen, by
> placing procedures dealing with one variable in the univariate package and
> all other procedures dealing with more than one variable is satisfactory and
> makes for a good discriminant.

See my response, this is not what I proposed. Anyway, common 
interpretation (even among my collegues who do nothing else that complex 
multivariate analyses) is that the one independent, one dependent 
regressions are univariate regressions, although they can see the logic 
as there are two variables.

But in that sense, the TTest should be within the multivariate package 
too. Both simple regression and t-tests are in the end simplified 
versions of the GLM using only one dependent and one independent variable.

Anyway, I thank you all for the help and I will just make derived 
classes were I need a different implementation as provided by this package.

Cheers,

Kim
-- 
http://www.kimvdlinde.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

RE: [MATH] Summary proposed changes

Posted by Brent Worden <br...@worden.org>.

> 1) Change the RealMatrix getEntry, getRow, getColumn methods to use
> 0-based indexing.

Looking at the implementation, I believe the current indexing is
satisfactory and I can't think of where using it with native arrays would be
overly burdensome or confusing.

As for letting the language dictate the indexing, I think this is a bad
practice for developing an API.  APIs are supposed to be language agnostic
and should exhibit the same behavior no matter the implementing language.
If we allow the language to dictate the behavior of an API method, its
possible the behavior will be different for other languages.  I feel these
situations should be avoided so the API is portable to a wide array of
languages, which I feel is a long-term goal of some of our developers.

> 2) Change the name of "BivariateRegression" to "UnivariateRegression" (or
> something else)

If we're bothering to change its name to make it less confusing, let's call
it what it is, SimpleLeastSquaresRegression.  If that is too long, then
SimpleRegression as least squares is the inferred method when one mentions
regression.

> 3) Change Variance to be configurable to generate the population
statistic.

Since population variance and sample variance are different statistics, they
should be different classes as that is the design we have chosen.

As for the static methods on the variance and standard deviation classes,
the javadoc should be changed to better explain the source of the mean
argument.  The comments should indicate the mean is pre-computed using the
same values that are going to be used to compute the variation estimate.
Any other mean passed in will result in the variation computation to be
unreliable.

> 4) Combine the univariate and multivariate packages, since it is confusing
> to separate statistics that focus on one variable and sometimes the word
> "univariate" is used in the context of multivariate techniques (e.g.
> "Univariate Anova").

"Regression is used to study relationships between measurable variables."
[Weisberg, 1985]

"Regression analysis is a statistical tool that utilizes the relations
between two or more quantitative variables..." [Neter, et al., 1985]

Both these statements indicate regression is a technique that involves more
than one variable.  Therefore, regression in general is a multivariate
technique.  The case where there is only one predictor is immaterial as
there are two variable quantities.  Would one call a model with one
predictor variable and two response variables a univariate technique?  I
wouldn't and I doubt if anyone else would.  The path we have chosen, by
placing procedures dealing with one variable in the univariate package and
all other procedures dealing with more than one variable is satisfactory and
makes for a good discriminant.

Brent Worden


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Summary proposed changes

Posted by Kim van der Linde <ki...@kimvdlinde.com>.


Phil Steitz wrote:

> Kim van der Linde wrote:
> 
>> Well, I had a discussion with several collegues (type science users, 
>> we went snorkeling) on several of these issues. The score of the day 
>> was the idea that the simple linear LS regression was considered a 
>> multvariate statistics.
> 
> 
> So it should stay where it is.

Huh, excuse me, they all disagreed completly with you, they all say is 
is univariate!

Kim
-- 
http://www.kimvdlinde.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Summary proposed changes

Posted by Phil Steitz <ph...@steitz.com>.

Kim van der Linde wrote:
> Well, I had a discussion with several collegues (type science users, we 
> went snorkeling) on several of these issues. The score of the day was 
> the idea that the simple linear LS regression was considered a 
> multvariate statistics.

So it should stay where it is.
> 
> Phil Steitz wrote:
> 
>> Nothing has been "put aside."  We make decisions by consensus.  You 
>> have provided input and we are considering it.  To make sure I have it 
>> all right, you have proposed four changes:


>> 2) Change the name of "BivariateRegression" to "UnivariateRegression" 
>> (or something else)
> 
> 
> Put it in univariate, name it LSRegression. (or better, 
> SimpleRegression, and bild in the option for RMA and MA regressions).

The placement in .univariate contradicts what you say both above and 
below. Even with just one independent variable, regression is a 
multivatiate technique.
> 
>> 3) Change Variance to be configurable to generate the population 
>> statistic.
> 
> 
> Yup, or even beter, configurable bias reduction (n = N-a default a = 1, 
> but settable by constuctor and specific methods to mantain the option of 
> getting both statistics from the same dataset without doing things 
> twice). The current situation actually introduces fundamental errors.

Huh?  The formula provides unbiased estimates -- "fundamental error" would 
be to use the biased estimator for sample statistics. As I stated in an 
earlier post, the statistics in the univariate package are all designed to 
produce unbiased estimates for (unknown) population parameters based on 
sample data. The "population variance" that you want to add is either a 
biased (therefore inappropriate) estimator for the population variance 
based on a sample, or an exact expression of the population variance of 
the discrete distribution whose mass points are the data (i.e., assuming 
that the data values *are* the population and not a sample from it -- 
which is why it is called "population variance").  In either case it is a 
different statistic and to keep our design consistent, we should not use 
the same univariate to compute different statistics.

>  From the JavaDoc for Variance and SD class:
> 
> - double evaluate(double[] values, double mean, int begin, int length)
>     Returns the variance of the entries in the specified portion of the 
> input array, using the precomputed mean value.
> 
> And in Variance only:
> - double evaluate(double[] values, double mean)
>     Returns the variance of the entries in the input array, using the 
> precomputed mean value.
> 
> If you compute the variance based on a already existing mean obtained 
> different from the sample you estblish the variance on, the population 
> variance should be used as there is no loss of "degree's of freedom" by 
>  first establishing the mean of the sample. IF the mean is based in the 
> same sample, than it is correct.

These methods, like Variance itself, assume that the mean and variance are 
being computed based on sample data.  This is why it says "precomputed" 
rather than "known population parameter". The methods are provided to save 
computation when the sample mean has already been computed.
> 
>> 4) Combine the univariate and multivariate packages, since it is 
>> confusing to separate statistics that focus on one variable and 
>> sometimes the word "univariate" is used in the context of multivariate 
>> techniques (e.g. "Univariate Anova").
> 
> 
> No, keep them separate, but just locate things where they belong and not 
> reinvent that simple LS regressions should be within the multivariate 
> package.

Contradicts above -- assuming you mean that regression belongs in 
.multivariate, which it does.
> 
> I have question for you. Where would you locate a Covariance class....?

I am not sure that we would define a covariance class; but if we did, it 
would certainly belong in .multivariate, since covariance is a property of 
the joint distribution of two variables rather than just one.  The basic 
idea is very simple: univariate is for statistics that characterize the 
distribution of just one random variable, multivariate is for analyses 
that involve joint distributions of multiple random variables.

Phil
> 
> Kim
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

[MATH] Summary proposed changes (was: Matrix Indices)

Posted by Kim van der Linde <ki...@kimvdlinde.com>.

Well, I had a discussion with several collegues (type science users, we 
went snorkeling) on several of these issues. The score of the day was 
the idea that the simple linear LS regression was considered a 
multvariate statistics.

Phil Steitz wrote:
> Nothing has been "put aside."  We make decisions by consensus.  You have 
> provided input and we are considering it.  To make sure I have it all 
> right, you have proposed four changes:
> 
> 1) Change the RealMatrix getEntry, getRow, getColumn methods to use 
> 0-based indexing.

Make it consistent with the underlying indexing to avoid confusion, 
programming overload, and CPU usage effectivety.

> 2) Change the name of "BivariateRegression" to "UnivariateRegression" 
> (or something else)

Put it in univariate, name it LSRegression. (or better, 
SimpleRegression, and bild in the option for RMA and MA regressions).

> 3) Change Variance to be configurable to generate the population statistic.

Yup, or even beter, configurable bias reduction (n = N-a default a = 1, 
but settable by constuctor and specific methods to mantain the option of 
getting both statistics from the same dataset without doing things 
twice). The current situation actually introduces fundamental errors. 
 From the JavaDoc for Variance and SD class:

- double evaluate(double[] values, double mean, int begin, int length)
     Returns the variance of the entries in the specified portion of the 
input array, using the precomputed mean value.

And in Variance only:
- double evaluate(double[] values, double mean)
     Returns the variance of the entries in the input array, using the 
precomputed mean value.

If you compute the variance based on a already existing mean obtained 
different from the sample you estblish the variance on, the population 
variance should be used as there is no loss of "degree's of freedom" by 
  first establishing the mean of the sample. IF the mean is based in the 
same sample, than it is correct.

> 4) Combine the univariate and multivariate packages, since it is 
> confusing to separate statistics that focus on one variable and 
> sometimes the word "univariate" is used in the context of multivariate 
> techniques (e.g. "Univariate Anova").

No, keep them separate, but just locate things where they belong and not 
reinvent that simple LS regressions should be within the multivariate 
package.

I have question for you. Where would you locate a Covariance class....?

Kim

-- 
http://www.kimvdlinde.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Stephen Colebourne <sc...@btopenworld.com>.

From: "Al Chou" <ho...@yahoo.com>
> > 1) Change the RealMatrix getEntry, getRow, getColumn methods to use
> > 0-based indexing.

FWIW, if I were using this class, I would expect it to be 0-based.

I can't comment on the other points as I don't hav the maths background. (If
I needed one, I would probably search for it or ask, thus the name is less
interesting to me)

Sorry for the late input.

Stephen


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Al Chou <ho...@yahoo.com>.

--- Phil Steitz <ph...@steitz.com> wrote:
> >> If we have not succeeded in keeping things simple, we are certainly
> >> open to improving documentation and / or providing wrappers or
> >> simplified interfaces.  If you have specific examples / suggestions
> >> for improvement, please share these.  We want to make the package as
> >> easy to use a possible, while still maintaining extensibility.
> > 
> > 
> > Well, I have indicated several things already, but all of them are put 
> > aside as "standard notation" or related arguments....
> 
> Nothing has been "put aside."  We make decisions by consensus.  You have 
> provided input and we are considering it.  To make sure I have it all 
> right, you have proposed four changes:
> 
> 1) Change the RealMatrix getEntry, getRow, getColumn methods to use 
> 0-based indexing.
> 
> 2) Change the name of "BivariateRegression" to "UnivariateRegression" (or 
> something else)
> 
> 3) Change Variance to be configurable to generate the population statistic.
> 
> 4) Combine the univariate and multivariate packages, since it is confusing 
> to separate statistics that focus on one variable and sometimes the word 
> "univariate" is used in the context of multivariate techniques (e.g. 
> "Univariate Anova").
> 
> My personal opinion is that none of these changes should be implemented, 
> but if consensus is that we should stop the release and make these 
> changes, then we will do that.  In the case of 3), I would strongly 
> suggest that if we really see this as necessary, we add a new statistic 
> (as we will with remedian) instead of trying to force one univariate to 
> compute two statistics (which runs counter to the design of the package).
> 
> Did I capture your suggestions correctly?  Is there anything else that you 
> find confusing or hard to use?
> 
> Thanks again for your feedback.
> 
> It would be great if other [math] committers could weigh in with simple 
> yes / no on each of the proposed changes above so that we can move forward 
> with the release.

Here are my votes on the above:

1) -1, keeping in mind that that's my personal bias; if real-world usage of
commons-math is mostly done by Java programmers who expect 0-based indexing and
they want it, then I am +1 for this change.
2) -0
3) +0
4) +0


Al

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Phil Steitz <ph...@steitz.com>.

> 
>> If we have not succeeded in keeping things simple, we are certainly
>> open to improving documentation and / or providing wrappers or
>> simplified interfaces.  If you have specific examples / suggestions
>> for improvement, please share these.  We want to make the package as
>> easy to use a possible, while still maintaining extensibility.
> 
> 
> Well, I have indicated several things already, but all of them are put 
> aside as "standard notation" or related arguments....

Nothing has been "put aside."  We make decisions by consensus.  You have 
provided input and we are considering it.  To make sure I have it all 
right, you have proposed four changes:

1) Change the RealMatrix getEntry, getRow, getColumn methods to use 
0-based indexing.

2) Change the name of "BivariateRegression" to "UnivariateRegression" (or 
something else)

3) Change Variance to be configurable to generate the population statistic.

4) Combine the univariate and multivariate packages, since it is confusing 
to separate statistics that focus on one variable and sometimes the word 
"univariate" is used in the context of multivariate techniques (e.g. 
"Univariate Anova").

My personal opinion is that none of these changes should be implemented, 
but if consensus is that we should stop the release and make these 
changes, then we will do that.  In the case of 3), I would strongly 
suggest that if we really see this as necessary, we add a new statistic 
(as we will with remedian) instead of trying to force one univariate to 
compute two statistics (which runs counter to the design of the package).

Did I capture your suggestions correctly?  Is there anything else that you 
find confusing or hard to use?

Thanks again for your feedback.

It would be great if other [math] committers could weigh in with simple 
yes / no on each of the proposed changes above so that we can move forward 
with the release.

Phil

> 
> Kim
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Kim van der Linde <ki...@kimvdlinde.com>.

Phil Steitz wrote:

> No, an array is not a mathematical object.

If you are a purist, yes, but for many people, they are roughly equivalent.

> Here again, the point is that the math object should expose
> properties consistent with its definition -- like any other Java
> object.  What is actually wrong here is to expose the double[][] and
> double[] properties. This was done for efficiency and is what is
> causing the confusion.

Well, in that case, eliminate all those and make a double[][] based
equivalent for easy calculations, but the current situation is that the
user has continiously to track were (s)he has to use the 1 based system,
and were the 0 based system. And cast back and forth while using so.

> Yes, this is the point above.  My experience working with matrices in
>  Java and other languages, however, is that there are times when you
> need to efficiently get at the data.  Tnat is why the
> double[][]-valued methods were provided. These should be rarely used.
> 
Well, for the classes that I build, I need them all the time. And so, it
is highly inefficient for me to use this package as curently 
implemented. I think I can better use JAMA or the GMatrix in de 3d api 
from SUN because I can work with those, contrary to this package.

> We can easily add a PopulationVariance statistic, which is the right
> way to handle this, as it is a different statistic.  The
> UnivariateStatistic interface and framework was designed to support
> this kind of thing.  I do not see the need to hold the release for
> this.

I do see the benifit of having it integrated into one single class, in
wich the default is the sample (co)variances, and the others can be
invoked specifically.

>> - BivariateRegression (Should be in univariate or in a new package
>>  bivariate, and called LSRegression.  I only today realized that is
>> was the Least square regression because of its confusing name and
>> location.
> 
> Did you look at the javadoc?

Yes.

> The term "bivariate regression" is more or less standard in
> (elementary) statistics and is used to distinguish the 2-variable
> case from the case where there are multiple independent variables,
> which is usually called "multiple regression."

Well my experience is that the bivariate is often used to distinguish a 
regression with two indepenent variables from the ones with one 
independent variable.

> Since bivariate regression involves 2 variables, it belongs in the 
> multivariate package.  The univariate package is for statistics 
> involving just one variable. This is also consistent with standard 
> statistical terminology.

Well, I probably had other textbooks than you.....

And I guess the SAS guys have it wrong also than....:

http://v8doc.sas.com/sashtml/stat/chap65/sect39.htm

(O, google gives over 4000 hits for "univariate regression" back,
http://www.google.com/search?q=%22Univariate+regression%22 )

And I guess that the covariance class should be included in the 
multivariate package?

> There are indeed lots of different kinds of bivariate regression
> models that can be fit.  BivariateRegression estimates the most
> common among these, ordinary least squares regression. I suppose we
> could call it "LeastSquaresBivariateRegression", but that is a bit
> long and since OLS is the most common model, I think it is fine to
> keep the name as it is. I would expect us to do the same thing, btw,
> when we add support for Multiple Regression (use the
> "MultipleRegression" name for the OLS version) All of this is
> specified in the javadoc and the user guide.

I am in favor of selfexplaining names, and bivariate is used for both
regression with one and two independent variable, so not clear what is
meant by it.

>> - Generally, I find the whole package rather user unfriendly, and 
>> feels for me as designed for hard-core mathematicians /programmers.
>> 
> 
> Did you look at the class javadoc and the user guide?

Yes.

> While we do expect users to be Java programmers, we certainly do not
> expect them to be "hard-core mathematicians."

Well, the longer that I am here, the more it looks like it....

> This is one reason that we need to stick to standard and elementary
> definitions and notation.

Well, to me, this package does not look like that at all. Actually, the
more I get to know the package, the less I like it because of what I
experience as deviations from the ordinary.

> If we have not succeeded in keeping things simple, we are certainly
> open to improving documentation and / or providing wrappers or
> simplified interfaces.  If you have specific examples / suggestions
> for improvement, please share these.  We want to make the package as
> easy to use a possible, while still maintaining extensibility.

Well, I have indicated several things already, but all of them are put 
aside as "standard notation" or related arguments....

Kim

-- 
http://www.kimvdlinde.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Phil Steitz <ph...@steitz.com>.

Kim van der Linde wrote:
> Phil,
> 
> Honestly, I think that if you make the argument that you want to use 
> Mathematical correct notations, you should start any array at 1.

No, an array is not a mathematical object.

  If
> everything (commons-math and JAVA itself) would be like that, fine, but 
> reality is that java is 0 based.

Here again, the point is that the math object should expose properties 
consistent with its definition -- like any other Java object.  What is 
actually wrong here is to expose the double[][] and double[] properties. 
This was done for efficiency and is what is causing the confusion.
> 
> As long as you use only one system (Matrices and Vectors or 1d and 2d 
> arrays) there is no issue, it is when things come together.

Yes, this is the point above.  My experience working with matrices in Java 
and other languages, however, is that there are times when you need to 
efficiently get at the data.  Tnat is why the double[][]-valued methods 
were provided. These should be rarely used.
> 
> On a more general point, I changing my vote for release to -1 because 
> the following reasons:
> 
> - Matrix indexing
> - Default sample (co-)variances (should be changeable)

We can easily add a PopulationVariance statistic, which is the right way 
to handle this, as it is a different statistic.  The UnivariateStatistic 
interface and framework was designed to support this kind of thing.  I do 
not see the need to hold the release for this.

> - BivariateRegression (Should be in univariate or in a new package 
> bivariate, and called LSRegression.  I only today realized that is was 
> the Least square regression because of its confusing name and location. 

Did you look at the javadoc?  The term "bivariate regression" is more or 
less standard in (elementary) statistics and is used to distinguish the 
2-variable case from the case where there are multiple independent 
variables, which is usually called "multiple regression."   Since 
bivariate regression involves 2 variables, it belongs in the multivariate 
package.  The univariate package is for statistics involving just one 
variable. This is also consistent with standard statistical terminology.

> Furthermore, there are at least four types of bivariate linear 
> regression so Bivariate is indicating a group rather than a specific 
> module.)

There are indeed lots of different kinds of bivariate regression models 
that can be fit.  BivariateRegression estimates the most common among 
these, ordinary least squares regression. I suppose we could call it 
"LeastSquaresBivariateRegression", but that is a bit long and since OLS is 
the most common model, I think it is fine to keep the name as it is. I 
would expect us to do the same thing, btw, when we add support for 
Multiple Regression (use the "MultipleRegression" name for the OLS 
version) All of this is specified in the javadoc and the user guide.

> - Generally, I find the whole package rather user unfriendly, and feels 
> for me as designed for hard-core mathematicians /programmers.

Did you look at the class javadoc and the user guide?  While we do expect 
users to be Java programmers, we certainly do not expect them to be 
"hard-core mathematicians."  This is one reason that we need to stick to 
standard and elementary definitions and notation.  If we have not 
succeeded in keeping things simple, we are certainly open to improving 
documentation and / or providing wrappers or simplified interfaces.  If 
you have specific examples / suggestions for improvement, please share 
these.  We want to make the package as easy to use a possible, while still 
maintaining extensibility.
> 
> BTW, I am sorry if I sound blunt, but Eclips deleted 10000 files 
> yesterday after inslaling it, including the work of the last days, so I 
> am not happy at all.

Thanks for the feedback.

Phil
> 
> Kim
> 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Kim van der Linde <ki...@kimvdlinde.com>.

Phil,

Honestly, I think that if you make the argument that you want to use 
Mathematical correct notations, you should start any array at 1. If 
everything (commons-math and JAVA itself) would be like that, fine, but 
reality is that java is 0 based.

As long as you use only one system (Matrices and Vectors or 1d and 2d 
arrays) there is no issue, it is when things come together.

Another even simpler solution is to have a Real2DArray which implements 
the same realMatrix Interface but gives back only 2d array all the time. 
   That one would than be 0 based, while the matrix itself is 1 based. 
The MatrixUtil then could deal with casting between them when necessary.

Anyway, seeing the huge overload in casting around that I need to do for 
the MVE (Minimum Volume Ellipsoid) module with the RealMatrix classes, I 
am not going to use them in their current form.

On a more general point, I changing my vote for release to -1 because 
the following reasons:

- Matrix indexing
- Default sample (co-)variances (should be changeable)
- BivariateRegression (Should be in univariate or in a new package 
bivariate, and called LSRegression.  I only today realized that is was 
the Least square regression because of its confusing name and location. 
Furthermore, there are at least four types of bivariate linear 
regression so Bivariate is indicating a group rather than a specific 
module.)
- Generally, I find the whole package rather user unfriendly, and feels 
for me as designed for hard-core mathematicians /programmers.

BTW, I am sorry if I sound blunt, but Eclips deleted 10000 files 
yesterday after inslaling it, including the work of the last days, so I 
am not happy at all.

Kim



-- 
http://www.kimvdlinde.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.


Phil Steitz wrote:

> Mark R. Diggory wrote:
> 
>> I can give a couple other examples of matrices in java using 0 to n-1.
>>
>> Colt:
>> http://dsd.lbl.gov/~hoschek/colt/api/cern/colt/matrix/DoubleMatrix2D.html
>>
>>> A matrix has a number of rows and columns, which are assigned upon 
>>> instance construction - The matrix's size is then rows()*columns(). 
>>> Elements are accessed via [row,column] coordinates. Legal coordinates
>>>  range from [0,0] to [rows()-1,columns()-1]. Any attempt to access an
>>>  element at a coordinate column<0 || column>=columns() || row<0 || 
>>> row>=rows() will throw an IndexOutOfBoundsException.
>>
>>
>>
>> Ninja:
>>
>>> 0 <= index[i] < size(i)
>>
>>
>>
>> Most importantly, the JSR 83 for Multiarray is indexed in this manner:
>>
>> JSR 83:
>> http://jcp.org/en/jsr/detail?id=83
>>
>>> Elements of a multiarray are identified by their indices along each
>>> axis. Let a d-dimensional array A of elemental type T have extent nj
>>> along its j-th axis, j = 0,...,d-1. Then, a valid index ij along the
>>> j-th axis must be greater than or equal to zero and less than nj. An
>>> attempt to reference an element A[i0,i1,...,id-1] with any invalid
>>> index ij causes an ArrayIndexOutOfBoundsException to be thrown.
>>
>>
>>
>> I would agree with Kim because I feel that the implementation language
>> should dictate the indexing strategy, not mathematical notation. It
>> makes integrating the package other API's more consistent and easier for
>> the user to implement.
> 
> 
> I disagree, precisely for the reason that a matrix is NOT a MultiArray.

True, its an interface which may one day exist on a multiarray 
implementation.

> I also disagree strongly that implementation details should determine 
> API definition.  

This is a grey area for me because I do agree with your point about 
Interfaces/Implementations, but, this is a java Interface in a java 
environment, where users are encouraged to reference arrays using 0 <= i 
< n, So I guess from my perspective, its an issue that is actually 
outside the whole Interface/Implementation subject.

> The fact that Colt, Jama, and JADE all use 0-based 
> indexing is troubling, however.  I would prefer to stick to the standard 
> math notation, but if other [math] committers feel strongly about this, 
> I could be talked into making this change.  Since it is an interface 
> change, this would need to be done before 1.0.  I was about to close the 
> 1.0-RC1 vote and send an annnouncment.  I will hold off until I hear 
> others weigh in on this.
> 
> Phil
> 

True, Lets give it a little room for discussion, will others in the 
group please verbalize their opinion on this issue?

thnx,
Mark

>>
>> -Mark
>>
>> Kim van der Linde wrote:
>>
>>> Hi Phil,
>>>
>>> With the 1 base system, I keep casting back and forth between the 0 
>>> based underlying java system and the 1 based matrix system. But only 
>>> in selected cases, not as a general rule. It also requires me to make
>>>  specific methods just to increase the row number by one, as the 
>>> method returns them at the 0 based system. This happens always as 
>>> soon as you use a matrix or array by itsself and with matrcies 
>>> combined, as I do. The fact that java uses the "0-based" notation 
>>> anyways conflicts with the mathematical notation.... Why not stay 
>>> with that....???
>>>
>>> As most methods in the RealMatrixImpl return a RealMatrix and NOT a 
>>> RealMatrixImpl, the methods need to be in the RealMAtrix itself also.
>>>  There is no option to create a RealMatrixImpl directly from a 
>>> RealMatrix, and requires casting around with new 
>>> RealMatrixImpl(RealMAtrix.getArray()); MatrixUtil depends on how... 
>>> But results in a lot of cating around also unless implemented as 
>>> static's....
>>>
>>> Kim
>>>
>>> Phil Steitz wrote:
>>>
>>>> I need to understand better exactly what the problem is with the 
>>>> indexing.  I don't see anything wrong with the current 
>>>> implementation.  The element accessor methods use standard matrix 
>>>> notation, which starts with index = 1 in each case.  The methods 
>>>> that provide access to copies of the underlying double[][] arrays 
>>>> are for efficiency and return arrays which are correctly sized to 
>>>> hold the matrix data.  I would be -1 to changing the matrix accessor 
>>>> methods (getEntry, setEntry, getRow, getColumn) to be "0-based" as 
>>>> this conflicts with standard mathematical notation.
>>>>
>>>> I am OK with adding the additional methods above to (post 1.0) 
>>>> RealMatrixImpl or a MatrixUtils class.
>>>>
>>>> Phil
>>>>
>>>> ---------------------------------------------------------------------
>>>>  To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>>>  For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>>
>>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Phil Steitz <ph...@steitz.com>.

Mark R. Diggory wrote:
> I can give a couple other examples of matrices in java using 0 to n-1.
> 
> Colt:
> http://dsd.lbl.gov/~hoschek/colt/api/cern/colt/matrix/DoubleMatrix2D.html
> 
>> A matrix has a number of rows and columns, which are assigned upon 
>> instance construction - The matrix's size is then rows()*columns(). 
>> Elements are accessed via [row,column] coordinates. Legal coordinates
>>  range from [0,0] to [rows()-1,columns()-1]. Any attempt to access an
>>  element at a coordinate column<0 || column>=columns() || row<0 || 
>> row>=rows() will throw an IndexOutOfBoundsException.
> 
> 
> Ninja:
> 
>> 0 <= index[i] < size(i)
> 
> 
> Most importantly, the JSR 83 for Multiarray is indexed in this manner:
> 
> JSR 83:
> http://jcp.org/en/jsr/detail?id=83
> 
>> Elements of a multiarray are identified by their indices along each
>> axis. Let a d-dimensional array A of elemental type T have extent nj
>> along its j-th axis, j = 0,...,d-1. Then, a valid index ij along the
>> j-th axis must be greater than or equal to zero and less than nj. An
>> attempt to reference an element A[i0,i1,...,id-1] with any invalid
>> index ij causes an ArrayIndexOutOfBoundsException to be thrown.
> 
> 
> I would agree with Kim because I feel that the implementation language
> should dictate the indexing strategy, not mathematical notation. It
> makes integrating the package other API's more consistent and easier for
> the user to implement.

I disagree, precisely for the reason that a matrix is NOT a MultiArray.  I 
also disagree strongly that implementation details should determine API 
definition.  The fact that Colt, Jama, and JADE all use 0-based indexing 
is troubling, however.  I would prefer to stick to the standard math 
notation, but if other [math] committers feel strongly about this, I could 
be talked into making this change.  Since it is an interface change, this 
would need to be done before 1.0.  I was about to close the 1.0-RC1 vote 
and send an annnouncment.  I will hold off until I hear others weigh in on 
this.

Phil

> 
> -Mark
> 
> Kim van der Linde wrote:
> 
>> Hi Phil,
>>
>> With the 1 base system, I keep casting back and forth between the 0 
>> based underlying java system and the 1 based matrix system. But only 
>> in selected cases, not as a general rule. It also requires me to make
>>  specific methods just to increase the row number by one, as the 
>> method returns them at the 0 based system. This happens always as soon 
>> as you use a matrix or array by itsself and with matrcies combined, as 
>> I do. The fact that java uses the "0-based" notation anyways conflicts 
>> with the mathematical notation.... Why not stay with that....???
>>
>> As most methods in the RealMatrixImpl return a RealMatrix and NOT a 
>> RealMatrixImpl, the methods need to be in the RealMAtrix itself also.
>>  There is no option to create a RealMatrixImpl directly from a 
>> RealMatrix, and requires casting around with new 
>> RealMatrixImpl(RealMAtrix.getArray()); MatrixUtil depends on how... 
>> But results in a lot of cating around also unless implemented as 
>> static's....
>>
>> Kim
>>
>> Phil Steitz wrote:
>>
>>> I need to understand better exactly what the problem is with the 
>>> indexing.  I don't see anything wrong with the current 
>>> implementation.  The element accessor methods use standard matrix 
>>> notation, which starts with index = 1 in each case.  The methods that 
>>> provide access to copies of the underlying double[][] arrays are for 
>>> efficiency and return arrays which are correctly sized to hold the 
>>> matrix data.  I would be -1 to changing the matrix accessor methods 
>>> (getEntry, setEntry, getRow, getColumn) to be "0-based" as this 
>>> conflicts with standard mathematical notation.
>>>
>>> I am OK with adding the additional methods above to (post 1.0) 
>>> RealMatrixImpl or a MatrixUtils class.
>>>
>>> Phil
>>>
>>> ---------------------------------------------------------------------
>>>  To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>>  For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

I can give a couple other examples of matrices in java using 0 to n-1.

Colt:
http://dsd.lbl.gov/~hoschek/colt/api/cern/colt/matrix/DoubleMatrix2D.html
> A matrix has a number of rows and columns, which are assigned upon 
> instance construction - The matrix's size is then rows()*columns(). 
> Elements are accessed via [row,column] coordinates. Legal coordinates
>  range from [0,0] to [rows()-1,columns()-1]. Any attempt to access an
>  element at a coordinate column<0 || column>=columns() || row<0 || 
> row>=rows() will throw an IndexOutOfBoundsException.

Ninja:
> 0 <= index[i] < size(i)

Most importantly, the JSR 83 for Multiarray is indexed in this manner:

JSR 83:
http://jcp.org/en/jsr/detail?id=83
> Elements of a multiarray are identified by their indices along each
> axis. Let a d-dimensional array A of elemental type T have extent nj
> along its j-th axis, j = 0,...,d-1. Then, a valid index ij along the
> j-th axis must be greater than or equal to zero and less than nj. An
> attempt to reference an element A[i0,i1,...,id-1] with any invalid
> index ij causes an ArrayIndexOutOfBoundsException to be thrown.

I would agree with Kim because I feel that the implementation language
should dictate the indexing strategy, not mathematical notation. It
makes integrating the package other API's more consistent and easier for
the user to implement.

-Mark

Kim van der Linde wrote:

> Hi Phil,
> 
> With the 1 base system, I keep casting back and forth between the 0 
> based underlying java system and the 1 based matrix system. But only 
> in selected cases, not as a general rule. It also requires me to make
>  specific methods just to increase the row number by one, as the 
> method returns them at the 0 based system. This happens always as 
> soon as you use a matrix or array by itsself and with matrcies 
> combined, as I do. The fact that java uses the "0-based" notation 
> anyways conflicts with the mathematical notation.... Why not stay 
> with that....???
> 
> As most methods in the RealMatrixImpl return a RealMatrix and NOT a 
> RealMatrixImpl, the methods need to be in the RealMAtrix itself also.
>  There is no option to create a RealMatrixImpl directly from a 
> RealMatrix, and requires casting around with new 
> RealMatrixImpl(RealMAtrix.getArray()); MatrixUtil depends on how... 
> But results in a lot of cating around also unless implemented as 
> static's....
> 
> Kim
> 
> Phil Steitz wrote:
> 
>> I need to understand better exactly what the problem is with the 
>> indexing.  I don't see anything wrong with the current 
>> implementation.  The element accessor methods use standard matrix 
>> notation, which starts with index = 1 in each case.  The methods 
>> that provide access to copies of the underlying double[][] arrays 
>> are for efficiency and return arrays which are correctly sized to 
>> hold the matrix data.  I would be -1 to changing the matrix 
>> accessor methods (getEntry, setEntry, getRow, getColumn) to be 
>> "0-based" as this conflicts with standard mathematical notation.
>> 
>> I am OK with adding the additional methods above to (post 1.0) 
>> RealMatrixImpl or a MatrixUtils class.
>> 
>> Phil
>> 
>> ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>  For additional commands, e-mail: 
>> commons-dev-help@jakarta.apache.org
>> 
> 

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Phil Steitz <ph...@steitz.com>.

Kim van der Linde wrote:
> Hi Phil,
> 
> With the 1 base system, I keep casting back and forth between the 0 
> based underlying java system and the 1 based matrix system. But only in 
> selected cases, not as a general rule. It also requires me to make 
> specific methods just to increase the row number by one, as the method 
> returns them at the 0 based system. This happens always as soon as you 
> use a matrix or array by itsself and with matrcies combined, as I do. 
> The fact that java uses the "0-based" notation anyways conflicts with 
> the mathematical notation.... Why not stay with that....???
> 
> As most methods in the RealMatrixImpl return a RealMatrix and NOT a 
> RealMatrixImpl, the methods need to be in the RealMAtrix itself also. 
> There is no option to create a RealMatrixImpl directly from a 
> RealMatrix, and requires casting around with new 
> RealMatrixImpl(RealMAtrix.getArray()); MatrixUtil depends on how... But 
> results in a lot of cating around also unless implemented as static's....

I think that the right place for the submatrix accessors is in a 
MatrixUtils class, which can grow to include lots of additional methods 
for manipulating matrices.  I generally view it as a bad sign when 
implementation details bleed into API design, and changing indexing as you 
suggest would be an example of that, IMHO. RealMatrices should model real 
matrices, not Java double[][] arrays.  By adding the necessary methods to 
MatrixUtils, or classes specific to your application, you can limit the 
amount of back-and-forth required between direct array manipulation and 
RealMatrix methods.

Phil
> 
> Kim
> 
> Phil Steitz wrote:
> 
>> I need to understand better exactly what the problem is with the 
>> indexing.  I don't see anything wrong with the current 
>> implementation.  The element accessor methods use standard matrix 
>> notation, which starts with index = 1 in each case.  The methods that 
>> provide access to copies of the underlying double[][] arrays are for 
>> efficiency and return arrays which are correctly sized to hold the 
>> matrix data.  I would be -1 to changing the matrix accessor methods 
>> (getEntry, setEntry, getRow, getColumn) to be "0-based" as this 
>> conflicts with standard mathematical notation.
>>
>> I am OK with adding the additional methods above to (post 1.0) 
>> RealMatrixImpl or a MatrixUtils class.
>>
>> Phil
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Kim van der Linde <ki...@kimvdlinde.com>.

Hi Phil,

With the 1 base system, I keep casting back and forth between the 0 
based underlying java system and the 1 based matrix system. But only in 
selected cases, not as a general rule. It also requires me to make 
specific methods just to increase the row number by one, as the method 
returns them at the 0 based system. This happens always as soon as you 
use a matrix or array by itsself and with matrcies combined, as I do. 
The fact that java uses the "0-based" notation anyways conflicts with 
the mathematical notation.... Why not stay with that....???

As most methods in the RealMatrixImpl return a RealMatrix and NOT a 
RealMatrixImpl, the methods need to be in the RealMAtrix itself also. 
There is no option to create a RealMatrixImpl directly from a 
RealMatrix, and requires casting around with new 
RealMatrixImpl(RealMAtrix.getArray()); MatrixUtil depends on how... But 
results in a lot of cating around also unless implemented as static's....

Kim

Phil Steitz wrote:
> I need to understand better exactly what the problem is with the 
> indexing.  I don't see anything wrong with the current implementation.  
> The element accessor methods use standard matrix notation, which starts 
> with index = 1 in each case.  The methods that provide access to copies 
> of the underlying double[][] arrays are for efficiency and return arrays 
> which are correctly sized to hold the matrix data.  I would be -1 to 
> changing the matrix accessor methods (getEntry, setEntry, getRow, 
> getColumn) to be "0-based" as this conflicts with standard mathematical 
> notation.
> 
> I am OK with adding the additional methods above to (post 1.0) 
> RealMatrixImpl or a MatrixUtils class.
> 
> Phil
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 

-- 
http://www.kimvdlinde.com

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Phil Steitz <ph...@steitz.com>.

Mark R. Diggory wrote:
> I would recommend submitting a patch to the java files into bugzilla 
> with the changes in it.
> 
> Review the contributing documentation on the developers page:
> http://jakarta.apache.org/commons/math/developers.html
> 
> Post any questions you may have about contributing a patch.
> 
> Cheers,
> -Mark
> 
> Kim van der Linde wrote:
> 
>>
>>
>> Mark R. Diggory wrote:
>>
>>> I agree entirely with your argument. I would feel more comfortable 
>>> with 0 to n-1.
>>
>>
>>
>> Ok, how do I update the inproved class, as I did that already 
>> yesterday evening. I would also like to add several new methods:
>>
>> RealMatrix getSubMatrix (int startRow, int endRow, int startColumn, 
>> int endColumn)  throws MatrixIndexException;
>>
>> RealMatrix getSubMatrix (int[] rows, int[] columns)  throws 
>> MatrixIndexException;
>>
>> RealMatrix getSubMatrix (int startRow, int endRow, int[] columns) 
>> throws MatrixIndexException;
>>
>> RealMatrix getSubMatrix (int[] rows, int startColumn, int endColumn) 
>> throws MatrixIndexException;
>>
>> RealMatrix getRowMatrix(int row) throws MatrixIndexException;
>>
>> RealMatrix getColumnMatrix(int col) throws MatrixIndexException;
>>
>> double[] columnMeans();
>>
>> double[] rowMeans();
>>
>> getRowMatrix and getColumnMatrix could be excluded as they are special 
>> cases of the getSubMatrix methods.
>>
>> Objections against these?
>>
>> Cheers,
>>
>> Kim
>>
> 
> 
I need to understand better exactly what the problem is with the indexing. 
  I don't see anything wrong with the current implementation.  The element 
accessor methods use standard matrix notation, which starts with index = 1 
in each case.  The methods that provide access to copies of the underlying 
double[][] arrays are for efficiency and return arrays which are correctly 
sized to hold the matrix data.  I would be -1 to changing the matrix 
accessor methods (getEntry, setEntry, getRow, getColumn) to be "0-based" 
as this conflicts with standard mathematical notation.

I am OK with adding the additional methods above to (post 1.0) 
RealMatrixImpl or a MatrixUtils class.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

I would recommend submitting a patch to the java files into bugzilla 
with the changes in it.

Review the contributing documentation on the developers page:
http://jakarta.apache.org/commons/math/developers.html

Post any questions you may have about contributing a patch.

Cheers,
-Mark

Kim van der Linde wrote:

>
>
> Mark R. Diggory wrote:
>
>> I agree entirely with your argument. I would feel more comfortable 
>> with 0 to n-1.
>
>
> Ok, how do I update the inproved class, as I did that already 
> yesterday evening. I would also like to add several new methods:
>
> RealMatrix getSubMatrix (int startRow, int endRow, int startColumn, 
> int endColumn)  throws MatrixIndexException;
>
> RealMatrix getSubMatrix (int[] rows, int[] columns)  throws 
> MatrixIndexException;
>
> RealMatrix getSubMatrix (int startRow, int endRow, int[] columns) 
> throws MatrixIndexException;
>
> RealMatrix getSubMatrix (int[] rows, int startColumn, int endColumn) 
> throws MatrixIndexException;
>
> RealMatrix getRowMatrix(int row) throws MatrixIndexException;
>
> RealMatrix getColumnMatrix(int col) throws MatrixIndexException;
>
> double[] columnMeans();
>
> double[] rowMeans();
>
> getRowMatrix and getColumnMatrix could be excluded as they are special 
> cases of the getSubMatrix methods.
>
> Objections against these?
>
> Cheers,
>
> Kim
>


-- 
Mark R. Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by Kim van der Linde <ki...@kimvdlinde.com>.


Mark R. Diggory wrote:
> I agree entirely with your argument. I would feel more comfortable with 
> 0 to n-1.

Ok, how do I update the inproved class, as I did that already yesterday 
evening. I would also like to add several new methods:

RealMatrix getSubMatrix (int startRow, int endRow, int startColumn, int 
endColumn)  throws MatrixIndexException;

RealMatrix getSubMatrix (int[] rows, int[] columns)  throws 
MatrixIndexException;

RealMatrix getSubMatrix (int startRow, int endRow, int[] columns) 
throws MatrixIndexException;

RealMatrix getSubMatrix (int[] rows, int startColumn, int endColumn) 
throws MatrixIndexException;

RealMatrix getRowMatrix(int row) throws MatrixIndexException;

RealMatrix getColumnMatrix(int col) throws MatrixIndexException;

double[] columnMeans();

double[] rowMeans();

getRowMatrix and getColumnMatrix could be excluded as they are special 
cases of the getSubMatrix methods.

Objections against these?

Cheers,

Kim

-- 
http://www.kimvdlinde.com


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [MATH] Matrix indices

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

I agree entirely with your argument. I would feel more comfortable with 
0 to n-1.

-Mark

Kim van der Linde wrote:
> Hi All,
> 
> I ran into a problem with the RealMatrixImpl class. The class is 
> designed such that it uses the default 1 to n counting for the rows and 
> columns. However, JAVA has as a default the 0 tot n-1 system. i now run 
> into the problem that some standard methods return an array of n (0 tot 
> n-1) while the methods require n+1 cells. I could of course write a 
> method to create a n+1 array, but I do not see much problems with using 
> the 0 to n-1 system of Java for the matrices. However, I could be 
> completly wrong. BTW, I saw that the JAMA packages uses the underlying 
> JAVA indexes, not the 1 to n system.
> 
> I have no problem changing all the methods if you all agree that we 
> should change it....
> 
> Cheers,
> 
> Kim

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org