You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/06/22 21:01:26 UTC

[math] Some issues with DoubleArrays

Hey Tim,

I've got two big concerns right now with DoubleArrays:

(1) To take advantage of any of the StatUtil methods on any of the 
DoubleArray objects, one has to use "getElements()" to get a double[] 
from the DoubleArray object to pass on to the StatUtils method, 
unfortunately DoubleArray.getElements() needs to generate a "copy" of 
the internal storage, so this often isn't efficient to do every time one 
is calling a statistic. I have a couple proposals that can resolve this 
issue.

(a) Have methods in StatUtils that accept both double[] and DoubleArray 
as input paramters. and have the StatUtil.xxx(double []) methods 
actually only wrap the double[] in a DoubleArray wrapper and then 
delegate to the StatUtil.xxx(DoubleArray) methods.

(b) Then add a constructor to FixedDoubleArray so it can be easily used 
to wrap the double[], or write a thin wrapper implementation of 
DoubleArray for this specific case.

(2) Some of the methods in DoubleArray are questionable as they are 
statistical in nature and replicated in the Univariate Interface, 
specifically DoubleArray.getMin() and DoubleArray.getMax(), and I can't 
find anywhere that these ever actually get used, I recommend we remove 
these methods from the Interface and Implementations.

(3) Following our same philosophy of not having methods in the interface 
that can't be supported across all implementations, 
DoubleArray.discardFrontElements seems problematic as not all 
DoubleArrays may support it. I do understand the usage, requirement and 
need for this method. I wonder if there is some way to internalize the 
discarding or provide a more generic sort of DoubleArray.trim() method. 
Discarding really only comes into play when working with 
ContractableDoubleArrays, maybe it should be exposed at that level 
instead of in the interface. Any thoughts?

-Mark

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Some issues with DoubleArrays

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Another thought to consider is that there is a plan to implement 
primitive based Collections on the Commons Collections subproject. While 
our DoubleArray wrappers are currently unique implementations for the 
Statistics package and do not implement java.util.Collection, they are 
somewhat Collection like in nature and could be easily adapted. There 
would be some interesting crossover between Collections and Math in 
terms of the Collections/Primitive Collections that the Math package 
could provide capabilities for.

-Mark

Mark R. Diggory wrote:
> Hey Tim,
> 
> I've got two big concerns right now with DoubleArrays:
> 
> (1) To take advantage of any of the StatUtil methods on any of the 
> DoubleArray objects, one has to use "getElements()" to get a double[] 
> from the DoubleArray object to pass on to the StatUtils method, 
> unfortunately DoubleArray.getElements() needs to generate a "copy" of 
> the internal storage, so this often isn't efficient to do every time one 
> is calling a statistic. I have a couple proposals that can resolve this 
> issue.
> 
> (a) Have methods in StatUtils that accept both double[] and DoubleArray 
> as input paramters. and have the StatUtil.xxx(double []) methods 
> actually only wrap the double[] in a DoubleArray wrapper and then 
> delegate to the StatUtil.xxx(DoubleArray) methods.
> 
> (b) Then add a constructor to FixedDoubleArray so it can be easily used 
> to wrap the double[], or write a thin wrapper implementation of 
> DoubleArray for this specific case.
> 
> (2) Some of the methods in DoubleArray are questionable as they are 
> statistical in nature and replicated in the Univariate Interface, 
> specifically DoubleArray.getMin() and DoubleArray.getMax(), and I can't 
> find anywhere that these ever actually get used, I recommend we remove 
> these methods from the Interface and Implementations.
> 
> (3) Following our same philosophy of not having methods in the interface 
> that can't be supported across all implementations, 
> DoubleArray.discardFrontElements seems problematic as not all 
> DoubleArrays may support it. I do understand the usage, requirement and 
> need for this method. I wonder if there is some way to internalize the 
> discarding or provide a more generic sort of DoubleArray.trim() method. 
> Discarding really only comes into play when working with 
> ContractableDoubleArrays, maybe it should be exposed at that level 
> instead of in the interface. Any thoughts?
> 
> -Mark
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Some issues with DoubleArrays

Posted by Phil Steitz <ph...@steitz.com>.
Mark R. Diggory wrote:
> Phil Steitz wrote:
> 
>> Tim O'Brien wrote:
>>
>>> What about this possibility.  we could easily have DoubleArray return 
>>> a reference to the internalStorageArray.  I know this would violate 
>>> encapsulation, but if we expose the interal array, the start and end 
>>> index then there is no need to copy the contents of the array.  
>>> Instead we pass a reference to an existing array - aka, no need to 
>>> copy our element array.
>>
>>
>>
>> +1 -- it *is* after all an array and if this is not exposed, you are 
>> always going to be stuck with using ArrayCopy to get at the underlying 
>> data, which makes efficient computation using large arrays impossible. 
>> I agonized over this same decision vis a vis RealMatrixImpl, where I 
>> ended up "breaking encapsulation" (similarly to other double[][]-based 
>> implementations) and exposing a getDataRef method that returns a 
>> reference to the underlying double[][] array.
> 
> 
> I like it too, since I've been in looking at/messing with these classes 
> I be glad to make the changes for us and add the static methods to the 
> StatUtils. One note, I think we should retain a method that does copy 
> the array as well as create one that exposes it, this is because the 
> copy veriosn can provide us with an array copy that is trimmed down to 
> the size of the actual content, because the internal store inceases 
> "incrimentally" in the windowless case, there is the case that there are 
> unitialized/unused sections at the end of the array (as well, in the 
> windowed case, if the array isn't filled yet, there are unused 
> sections). Providing an interface to retrieve a "cleaned" array is a 
> useful option if one wants to retieve the data to manipulate it 
> elsewhere. This would be usefull in both Fixed and Exp/Cont DoubleArrays.

Yes.  I would certainly not recommend dropping the existing 
getElements() or replacing it with reference semantics.  What I did in 
RealMatrix was to provide both getData and getDataRef, with the latter 
returning a reference.  I would reserve getElements() for copy semantics 
and call the reference version something else.

> 
>>
>>
>>>
>>> Now, every method that takes a double[] in StatUtil, would be altered 
>>> to take a (double[], int start, int length).   So,
>>>
>>> public static double sum(double[] values);
>>>
>>> would delegate to a more "generic"
>>>
>>> public static double sum(double[] values, int startIndex, int length);
>>
>>
>>
>> I agree -- I think that Brent suggested this improvement already.
> 
> 
> On the topic of StatUtils, what are the opinions about adding the 
> following methods from my discussion with the lang group to provide 
> alternate primitive implementations? These would be for short, long, 
> int, float for now.

I don't see any harm in adding these; but I would not put a high 
priority on implementing them and I agree with Stephen that there is no 
harm in lang including the min/max functions directly in lang.math as 
well.  Some duplication across packages is OK, IMHO.  Also, I would not 
want lang -- or any other component -- to depend on anything in math 
until we have successfully emerged from the sandbox with a release. 
What may actually make more sense is for lang.math to add the min, max 
stuff and us to use their implementations of these in place of our own. 
  But, once again, these are trivial functions and I see nothing wrong 
with implementing them in both places.  Note that in any case, we will 
want to implement these with array offset arguments, which lang may not 
be interested in.

One more note on the min-max stuff: the implementation in StatUtils 
calls Math.min/max each time through the comparison loop. The loop 
should probably be rewritten to just keep track of the min/max and do a 
straight compare each time through (similar to what UnivariateImpl does) 
to avoid the unecessary function call within the loop.

> 
> primitive <-- min(primitive[])
> primitive <-- max(primitive[])
> primitive <-- sum(primitive[])
> primitive <-- sumSq(primitive[])
> 
> in terms of other stat methods the theme would be more like:
> 
> double <-- mean(primitive[])
> double <-- var(primitive[])
> double <-- std(primitive[])
> 
> possibly similar methods for other stat methods, these all would involve 
> casting the elements to double prior to calculating?

Yes, you would have to cast before computation, which sort of blows away 
the value of the array-based implementation.  May be better to add 
addValue(primitive[]) to Univariate.  I have been meaning to suggest 
addValue(double[]) for a while now.

Phil

> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Some issues with DoubleArrays

Posted by Tim O'Brien <to...@discursive.com>.
On Mon, 23 Jun 2003, Mark R. Diggory wrote:

> Phil Steitz wrote:
> 
> > Tim O'Brien wrote:
> >
<snip/>
> >
> > +1 -- it *is* after all an array and if this is not exposed, you are 
> > always going to be stuck with using ArrayCopy to get at the underlying 
> > data, which makes efficient computation using large arrays impossible. 
<snip/>
> 
> I like it too, since I've been in looking at/messing with these classes 
> I be glad to make the changes for us and add the static methods to the 
> StatUtils. 

Sounds good, I'm mostly offline this week.  One note, I like the idea of 
keeping the "array copy" function which returns the "trimmed" element 
array, if we expose the internal array, the 
Javadoc should have some stern warnings for end users - "Warning, this is 
a reference to the internal storage array, please use with care... do not
modify the contents of this array..."

> 
> On the topic of StatUtils, what are the opinions about adding the 
> following methods from my discussion with the lang group to provide 
> alternate primitive implementations? These would be for short, long, 
> int, float for now.
> 

I'm +0 on this, I'd be happy with just providing functions for double 
primitives, but if you can think of a compelling reason, go for it.


----------------------
Tim O'Brien
Evanston, IL
(847) 863-7045
tobrien@discursive.com



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Some issues with DoubleArrays

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Phil Steitz wrote:

> Tim O'Brien wrote:
>
>> What about this possibility.  we could easily have DoubleArray return 
>> a reference to the internalStorageArray.  I know this would violate 
>> encapsulation, but if we expose the interal array, the start and end 
>> index then there is no need to copy the contents of the array.  
>> Instead we pass a reference to an existing array - aka, no need to 
>> copy our element array.
>
>
> +1 -- it *is* after all an array and if this is not exposed, you are 
> always going to be stuck with using ArrayCopy to get at the underlying 
> data, which makes efficient computation using large arrays impossible. 
> I agonized over this same decision vis a vis RealMatrixImpl, where I 
> ended up "breaking encapsulation" (similarly to other double[][]-based 
> implementations) and exposing a getDataRef method that returns a 
> reference to the underlying double[][] array.

I like it too, since I've been in looking at/messing with these classes 
I be glad to make the changes for us and add the static methods to the 
StatUtils. One note, I think we should retain a method that does copy 
the array as well as create one that exposes it, this is because the 
copy veriosn can provide us with an array copy that is trimmed down to 
the size of the actual content, because the internal store inceases 
"incrimentally" in the windowless case, there is the case that there are 
unitialized/unused sections at the end of the array (as well, in the 
windowed case, if the array isn't filled yet, there are unused 
sections). Providing an interface to retrieve a "cleaned" array is a 
useful option if one wants to retieve the data to manipulate it 
elsewhere. This would be usefull in both Fixed and Exp/Cont DoubleArrays.

>
>
>>
>> Now, every method that takes a double[] in StatUtil, would be altered 
>> to take a (double[], int start, int length).   So,
>>
>> public static double sum(double[] values);
>>
>> would delegate to a more "generic"
>>
>> public static double sum(double[] values, int startIndex, int length);
>
>
> I agree -- I think that Brent suggested this improvement already.

On the topic of StatUtils, what are the opinions about adding the 
following methods from my discussion with the lang group to provide 
alternate primitive implementations? These would be for short, long, 
int, float for now.

primitive <-- min(primitive[])
primitive <-- max(primitive[])
primitive <-- sum(primitive[])
primitive <-- sumSq(primitive[])

in terms of other stat methods the theme would be more like:

double <-- mean(primitive[])
double <-- var(primitive[])
double <-- std(primitive[])

possibly similar methods for other stat methods, these all would involve 
casting the elements to double prior to calculating?

-- 
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Some issues with DoubleArrays

Posted by Phil Steitz <ph...@steitz.com>.
Tim O'Brien wrote:

> What about this possibility.  we could easily have DoubleArray return a 
> reference to the internalStorageArray.  I know this would violate 
> encapsulation, but if we expose the interal array, the start and end index 
> then there is no need to copy the contents of the array.  Instead we pass 
> a reference to an existing array - aka, no need to copy our element array.

+1 -- it *is* after all an array and if this is not exposed, you are 
always going to be stuck with using ArrayCopy to get at the underlying 
data, which makes efficient computation using large arrays impossible. 
I agonized over this same decision vis a vis RealMatrixImpl, where I 
ended up "breaking encapsulation" (similarly to other double[][]-based 
implementations) and exposing a getDataRef method that returns a 
reference to the underlying double[][] array.

> 
> Now, every method that takes a double[] in StatUtil, would be altered to 
> take a (double[], int start, int length).   So,
> 
> public static double sum(double[] values);
> 
> would delegate to a more "generic"
> 
> public static double sum(double[] values, int startIndex, int length);

I agree -- I think that Brent suggested this improvement already.

>
>>
>>(2) Some of the methods in DoubleArray are questionable as they are 
>>statistical in nature and replicated in the Univariate Interface, 
>>specifically DoubleArray.getMin() and DoubleArray.getMax(), and I can't 
>>find anywhere that these ever actually get used, I recommend we remove 
>>these methods from the Interface and Implementations.
> 
> 
> 100% agreed.  There is really no need to calulate min and max in these 
> classes.  It seems very redundant.

I agree here as well.

> 
> 
>>(3) Following our same philosophy of not having methods in the interface 
>>that can't be supported across all implementations, 
>>DoubleArray.discardFrontElements seems problematic as not all 
>>DoubleArrays may support it. I do understand the usage, requirement and 
>>need for this method. I wonder if there is some way to internalize the 
>>discarding or provide a more generic sort of DoubleArray.trim() method. 
>>Discarding really only comes into play when working with 
>>ContractableDoubleArrays, maybe it should be exposed at that level 
>>instead of in the interface. Any thoughts?
> 
> 
> I noticed this as well.  It would make sense to remove method from the 
> interface completely.

+1

Phil
> 
> 
>>-Mark
>>
>>
> 
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math] Some issues with DoubleArrays

Posted by Tim O'Brien <to...@discursive.com>.
On Sun, 22 Jun 2003, Mark R. Diggory wrote:

> Hey Tim,
> 
> I've got two big concerns right now with DoubleArrays:
> 
> (1) To take advantage of any of the StatUtil methods on any of the 
> DoubleArray objects, one has to use "getElements()" to get a double[] 
> from the DoubleArray object to pass on to the StatUtils method, 
> unfortunately DoubleArray.getElements() needs to generate a "copy" of 
> the internal storage, so this often isn't efficient to do every time one 
> is calling a statistic. I have a couple proposals that can resolve this 
> issue.

Agreed 100%.  Calling System.arrayCopy() is fast, but requiring it for 
every calculation seems misguided.

> (a) Have methods in StatUtils that accept both double[] and DoubleArray 
> as input paramters. and have the StatUtil.xxx(double []) methods 
> actually only wrap the double[] in a DoubleArray wrapper and then 
> delegate to the StatUtil.xxx(DoubleArray) methods.
> 

Mmmm.....I'm not sure why, but that seems like a code smell to me.  No 
reason to involve DoubleArray if you just have a double[].

What about this possibility.  we could easily have DoubleArray return a 
reference to the internalStorageArray.  I know this would violate 
encapsulation, but if we expose the interal array, the start and end index 
then there is no need to copy the contents of the array.  Instead we pass 
a reference to an existing array - aka, no need to copy our element array.

Now, every method that takes a double[] in StatUtil, would be altered to 
take a (double[], int start, int length).   So,

public static double sum(double[] values);

would delegate to a more "generic"

public static double sum(double[] values, int startIndex, int length);

Mark, do you see the value here?

> (b) Then add a constructor to FixedDoubleArray so it can be easily used 
> to wrap the double[], or write a thin wrapper implementation of 
> DoubleArray for this specific case.
> 
> (2) Some of the methods in DoubleArray are questionable as they are 
> statistical in nature and replicated in the Univariate Interface, 
> specifically DoubleArray.getMin() and DoubleArray.getMax(), and I can't 
> find anywhere that these ever actually get used, I recommend we remove 
> these methods from the Interface and Implementations.

100% agreed.  There is really no need to calulate min and max in these 
classes.  It seems very redundant.

> 
> (3) Following our same philosophy of not having methods in the interface 
> that can't be supported across all implementations, 
> DoubleArray.discardFrontElements seems problematic as not all 
> DoubleArrays may support it. I do understand the usage, requirement and 
> need for this method. I wonder if there is some way to internalize the 
> discarding or provide a more generic sort of DoubleArray.trim() method. 
> Discarding really only comes into play when working with 
> ContractableDoubleArrays, maybe it should be exposed at that level 
> instead of in the interface. Any thoughts?

I noticed this as well.  It would make sense to remove method from the 
interface completely.  


> 
> -Mark
> 
> 

-- 
----------------------
Tim O'Brien
Evanston, IL
(847) 863-7045
tobrien@discursive.com



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org