You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/06/22 21:01:26 UTC
[math] Some issues with DoubleArrays
Hey Tim,
I've got two big concerns right now with DoubleArrays:
(1) To take advantage of any of the StatUtil methods on any of the
DoubleArray objects, one has to use "getElements()" to get a double[]
from the DoubleArray object to pass on to the StatUtils method,
unfortunately DoubleArray.getElements() needs to generate a "copy" of
the internal storage, so this often isn't efficient to do every time one
is calling a statistic. I have a couple proposals that can resolve this
issue.
(a) Have methods in StatUtils that accept both double[] and DoubleArray
as input paramters. and have the StatUtil.xxx(double []) methods
actually only wrap the double[] in a DoubleArray wrapper and then
delegate to the StatUtil.xxx(DoubleArray) methods.
(b) Then add a constructor to FixedDoubleArray so it can be easily used
to wrap the double[], or write a thin wrapper implementation of
DoubleArray for this specific case.
(2) Some of the methods in DoubleArray are questionable as they are
statistical in nature and replicated in the Univariate Interface,
specifically DoubleArray.getMin() and DoubleArray.getMax(), and I can't
find anywhere that these ever actually get used, I recommend we remove
these methods from the Interface and Implementations.
(3) Following our same philosophy of not having methods in the interface
that can't be supported across all implementations,
DoubleArray.discardFrontElements seems problematic as not all
DoubleArrays may support it. I do understand the usage, requirement and
need for this method. I wonder if there is some way to internalize the
discarding or provide a more generic sort of DoubleArray.trim() method.
Discarding really only comes into play when working with
ContractableDoubleArrays, maybe it should be exposed at that level
instead of in the interface. Any thoughts?
-Mark
--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Some issues with DoubleArrays
Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Another thought to consider is that there is a plan to implement
primitive based Collections on the Commons Collections subproject. While
our DoubleArray wrappers are currently unique implementations for the
Statistics package and do not implement java.util.Collection, they are
somewhat Collection like in nature and could be easily adapted. There
would be some interesting crossover between Collections and Math in
terms of the Collections/Primitive Collections that the Math package
could provide capabilities for.
-Mark
Mark R. Diggory wrote:
> Hey Tim,
>
> I've got two big concerns right now with DoubleArrays:
>
> (1) To take advantage of any of the StatUtil methods on any of the
> DoubleArray objects, one has to use "getElements()" to get a double[]
> from the DoubleArray object to pass on to the StatUtils method,
> unfortunately DoubleArray.getElements() needs to generate a "copy" of
> the internal storage, so this often isn't efficient to do every time one
> is calling a statistic. I have a couple proposals that can resolve this
> issue.
>
> (a) Have methods in StatUtils that accept both double[] and DoubleArray
> as input paramters. and have the StatUtil.xxx(double []) methods
> actually only wrap the double[] in a DoubleArray wrapper and then
> delegate to the StatUtil.xxx(DoubleArray) methods.
>
> (b) Then add a constructor to FixedDoubleArray so it can be easily used
> to wrap the double[], or write a thin wrapper implementation of
> DoubleArray for this specific case.
>
> (2) Some of the methods in DoubleArray are questionable as they are
> statistical in nature and replicated in the Univariate Interface,
> specifically DoubleArray.getMin() and DoubleArray.getMax(), and I can't
> find anywhere that these ever actually get used, I recommend we remove
> these methods from the Interface and Implementations.
>
> (3) Following our same philosophy of not having methods in the interface
> that can't be supported across all implementations,
> DoubleArray.discardFrontElements seems problematic as not all
> DoubleArrays may support it. I do understand the usage, requirement and
> need for this method. I wonder if there is some way to internalize the
> discarding or provide a more generic sort of DoubleArray.trim() method.
> Discarding really only comes into play when working with
> ContractableDoubleArrays, maybe it should be exposed at that level
> instead of in the interface. Any thoughts?
>
> -Mark
>
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Some issues with DoubleArrays
Posted by Phil Steitz <ph...@steitz.com>.
Mark R. Diggory wrote:
> Phil Steitz wrote:
>
>> Tim O'Brien wrote:
>>
>>> What about this possibility. we could easily have DoubleArray return
>>> a reference to the internalStorageArray. I know this would violate
>>> encapsulation, but if we expose the interal array, the start and end
>>> index then there is no need to copy the contents of the array.
>>> Instead we pass a reference to an existing array - aka, no need to
>>> copy our element array.
>>
>>
>>
>> +1 -- it *is* after all an array and if this is not exposed, you are
>> always going to be stuck with using ArrayCopy to get at the underlying
>> data, which makes efficient computation using large arrays impossible.
>> I agonized over this same decision vis a vis RealMatrixImpl, where I
>> ended up "breaking encapsulation" (similarly to other double[][]-based
>> implementations) and exposing a getDataRef method that returns a
>> reference to the underlying double[][] array.
>
>
> I like it too, since I've been in looking at/messing with these classes
> I be glad to make the changes for us and add the static methods to the
> StatUtils. One note, I think we should retain a method that does copy
> the array as well as create one that exposes it, this is because the
> copy veriosn can provide us with an array copy that is trimmed down to
> the size of the actual content, because the internal store inceases
> "incrimentally" in the windowless case, there is the case that there are
> unitialized/unused sections at the end of the array (as well, in the
> windowed case, if the array isn't filled yet, there are unused
> sections). Providing an interface to retrieve a "cleaned" array is a
> useful option if one wants to retieve the data to manipulate it
> elsewhere. This would be usefull in both Fixed and Exp/Cont DoubleArrays.
Yes. I would certainly not recommend dropping the existing
getElements() or replacing it with reference semantics. What I did in
RealMatrix was to provide both getData and getDataRef, with the latter
returning a reference. I would reserve getElements() for copy semantics
and call the reference version something else.
>
>>
>>
>>>
>>> Now, every method that takes a double[] in StatUtil, would be altered
>>> to take a (double[], int start, int length). So,
>>>
>>> public static double sum(double[] values);
>>>
>>> would delegate to a more "generic"
>>>
>>> public static double sum(double[] values, int startIndex, int length);
>>
>>
>>
>> I agree -- I think that Brent suggested this improvement already.
>
>
> On the topic of StatUtils, what are the opinions about adding the
> following methods from my discussion with the lang group to provide
> alternate primitive implementations? These would be for short, long,
> int, float for now.
I don't see any harm in adding these; but I would not put a high
priority on implementing them and I agree with Stephen that there is no
harm in lang including the min/max functions directly in lang.math as
well. Some duplication across packages is OK, IMHO. Also, I would not
want lang -- or any other component -- to depend on anything in math
until we have successfully emerged from the sandbox with a release.
What may actually make more sense is for lang.math to add the min, max
stuff and us to use their implementations of these in place of our own.
But, once again, these are trivial functions and I see nothing wrong
with implementing them in both places. Note that in any case, we will
want to implement these with array offset arguments, which lang may not
be interested in.
One more note on the min-max stuff: the implementation in StatUtils
calls Math.min/max each time through the comparison loop. The loop
should probably be rewritten to just keep track of the min/max and do a
straight compare each time through (similar to what UnivariateImpl does)
to avoid the unecessary function call within the loop.
>
> primitive <-- min(primitive[])
> primitive <-- max(primitive[])
> primitive <-- sum(primitive[])
> primitive <-- sumSq(primitive[])
>
> in terms of other stat methods the theme would be more like:
>
> double <-- mean(primitive[])
> double <-- var(primitive[])
> double <-- std(primitive[])
>
> possibly similar methods for other stat methods, these all would involve
> casting the elements to double prior to calculating?
Yes, you would have to cast before computation, which sort of blows away
the value of the array-based implementation. May be better to add
addValue(primitive[]) to Univariate. I have been meaning to suggest
addValue(double[]) for a while now.
Phil
>
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Some issues with DoubleArrays
Posted by Tim O'Brien <to...@discursive.com>.
On Mon, 23 Jun 2003, Mark R. Diggory wrote:
> Phil Steitz wrote:
>
> > Tim O'Brien wrote:
> >
<snip/>
> >
> > +1 -- it *is* after all an array and if this is not exposed, you are
> > always going to be stuck with using ArrayCopy to get at the underlying
> > data, which makes efficient computation using large arrays impossible.
<snip/>
>
> I like it too, since I've been in looking at/messing with these classes
> I be glad to make the changes for us and add the static methods to the
> StatUtils.
Sounds good, I'm mostly offline this week. One note, I like the idea of
keeping the "array copy" function which returns the "trimmed" element
array, if we expose the internal array, the
Javadoc should have some stern warnings for end users - "Warning, this is
a reference to the internal storage array, please use with care... do not
modify the contents of this array..."
>
> On the topic of StatUtils, what are the opinions about adding the
> following methods from my discussion with the lang group to provide
> alternate primitive implementations? These would be for short, long,
> int, float for now.
>
I'm +0 on this, I'd be happy with just providing functions for double
primitives, but if you can think of a compelling reason, go for it.
----------------------
Tim O'Brien
Evanston, IL
(847) 863-7045
tobrien@discursive.com
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Some issues with DoubleArrays
Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Phil Steitz wrote:
> Tim O'Brien wrote:
>
>> What about this possibility. we could easily have DoubleArray return
>> a reference to the internalStorageArray. I know this would violate
>> encapsulation, but if we expose the interal array, the start and end
>> index then there is no need to copy the contents of the array.
>> Instead we pass a reference to an existing array - aka, no need to
>> copy our element array.
>
>
> +1 -- it *is* after all an array and if this is not exposed, you are
> always going to be stuck with using ArrayCopy to get at the underlying
> data, which makes efficient computation using large arrays impossible.
> I agonized over this same decision vis a vis RealMatrixImpl, where I
> ended up "breaking encapsulation" (similarly to other double[][]-based
> implementations) and exposing a getDataRef method that returns a
> reference to the underlying double[][] array.
I like it too, since I've been in looking at/messing with these classes
I be glad to make the changes for us and add the static methods to the
StatUtils. One note, I think we should retain a method that does copy
the array as well as create one that exposes it, this is because the
copy veriosn can provide us with an array copy that is trimmed down to
the size of the actual content, because the internal store inceases
"incrimentally" in the windowless case, there is the case that there are
unitialized/unused sections at the end of the array (as well, in the
windowed case, if the array isn't filled yet, there are unused
sections). Providing an interface to retrieve a "cleaned" array is a
useful option if one wants to retieve the data to manipulate it
elsewhere. This would be usefull in both Fixed and Exp/Cont DoubleArrays.
>
>
>>
>> Now, every method that takes a double[] in StatUtil, would be altered
>> to take a (double[], int start, int length). So,
>>
>> public static double sum(double[] values);
>>
>> would delegate to a more "generic"
>>
>> public static double sum(double[] values, int startIndex, int length);
>
>
> I agree -- I think that Brent suggested this improvement already.
On the topic of StatUtils, what are the opinions about adding the
following methods from my discussion with the lang group to provide
alternate primitive implementations? These would be for short, long,
int, float for now.
primitive <-- min(primitive[])
primitive <-- max(primitive[])
primitive <-- sum(primitive[])
primitive <-- sumSq(primitive[])
in terms of other stat methods the theme would be more like:
double <-- mean(primitive[])
double <-- var(primitive[])
double <-- std(primitive[])
possibly similar methods for other stat methods, these all would involve
casting the elements to double prior to calculating?
--
Mark Diggory
Software Developer
Harvard MIT Data Center
http://www.hmdc.harvard.edu
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Some issues with DoubleArrays
Posted by Phil Steitz <ph...@steitz.com>.
Tim O'Brien wrote:
> What about this possibility. we could easily have DoubleArray return a
> reference to the internalStorageArray. I know this would violate
> encapsulation, but if we expose the interal array, the start and end index
> then there is no need to copy the contents of the array. Instead we pass
> a reference to an existing array - aka, no need to copy our element array.
+1 -- it *is* after all an array and if this is not exposed, you are
always going to be stuck with using ArrayCopy to get at the underlying
data, which makes efficient computation using large arrays impossible.
I agonized over this same decision vis a vis RealMatrixImpl, where I
ended up "breaking encapsulation" (similarly to other double[][]-based
implementations) and exposing a getDataRef method that returns a
reference to the underlying double[][] array.
>
> Now, every method that takes a double[] in StatUtil, would be altered to
> take a (double[], int start, int length). So,
>
> public static double sum(double[] values);
>
> would delegate to a more "generic"
>
> public static double sum(double[] values, int startIndex, int length);
I agree -- I think that Brent suggested this improvement already.
>
>>
>>(2) Some of the methods in DoubleArray are questionable as they are
>>statistical in nature and replicated in the Univariate Interface,
>>specifically DoubleArray.getMin() and DoubleArray.getMax(), and I can't
>>find anywhere that these ever actually get used, I recommend we remove
>>these methods from the Interface and Implementations.
>
>
> 100% agreed. There is really no need to calulate min and max in these
> classes. It seems very redundant.
I agree here as well.
>
>
>>(3) Following our same philosophy of not having methods in the interface
>>that can't be supported across all implementations,
>>DoubleArray.discardFrontElements seems problematic as not all
>>DoubleArrays may support it. I do understand the usage, requirement and
>>need for this method. I wonder if there is some way to internalize the
>>discarding or provide a more generic sort of DoubleArray.trim() method.
>>Discarding really only comes into play when working with
>>ContractableDoubleArrays, maybe it should be exposed at that level
>>instead of in the interface. Any thoughts?
>
>
> I noticed this as well. It would make sense to remove method from the
> interface completely.
+1
Phil
>
>
>>-Mark
>>
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org
Re: [math] Some issues with DoubleArrays
Posted by Tim O'Brien <to...@discursive.com>.
On Sun, 22 Jun 2003, Mark R. Diggory wrote:
> Hey Tim,
>
> I've got two big concerns right now with DoubleArrays:
>
> (1) To take advantage of any of the StatUtil methods on any of the
> DoubleArray objects, one has to use "getElements()" to get a double[]
> from the DoubleArray object to pass on to the StatUtils method,
> unfortunately DoubleArray.getElements() needs to generate a "copy" of
> the internal storage, so this often isn't efficient to do every time one
> is calling a statistic. I have a couple proposals that can resolve this
> issue.
Agreed 100%. Calling System.arrayCopy() is fast, but requiring it for
every calculation seems misguided.
> (a) Have methods in StatUtils that accept both double[] and DoubleArray
> as input paramters. and have the StatUtil.xxx(double []) methods
> actually only wrap the double[] in a DoubleArray wrapper and then
> delegate to the StatUtil.xxx(DoubleArray) methods.
>
Mmmm.....I'm not sure why, but that seems like a code smell to me. No
reason to involve DoubleArray if you just have a double[].
What about this possibility. we could easily have DoubleArray return a
reference to the internalStorageArray. I know this would violate
encapsulation, but if we expose the interal array, the start and end index
then there is no need to copy the contents of the array. Instead we pass
a reference to an existing array - aka, no need to copy our element array.
Now, every method that takes a double[] in StatUtil, would be altered to
take a (double[], int start, int length). So,
public static double sum(double[] values);
would delegate to a more "generic"
public static double sum(double[] values, int startIndex, int length);
Mark, do you see the value here?
> (b) Then add a constructor to FixedDoubleArray so it can be easily used
> to wrap the double[], or write a thin wrapper implementation of
> DoubleArray for this specific case.
>
> (2) Some of the methods in DoubleArray are questionable as they are
> statistical in nature and replicated in the Univariate Interface,
> specifically DoubleArray.getMin() and DoubleArray.getMax(), and I can't
> find anywhere that these ever actually get used, I recommend we remove
> these methods from the Interface and Implementations.
100% agreed. There is really no need to calulate min and max in these
classes. It seems very redundant.
>
> (3) Following our same philosophy of not having methods in the interface
> that can't be supported across all implementations,
> DoubleArray.discardFrontElements seems problematic as not all
> DoubleArrays may support it. I do understand the usage, requirement and
> need for this method. I wonder if there is some way to internalize the
> discarding or provide a more generic sort of DoubleArray.trim() method.
> Discarding really only comes into play when working with
> ContractableDoubleArrays, maybe it should be exposed at that level
> instead of in the interface. Any thoughts?
I noticed this as well. It would make sense to remove method from the
interface completely.
>
> -Mark
>
>
--
----------------------
Tim O'Brien
Evanston, IL
(847) 863-7045
tobrien@discursive.com
---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org