You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by "Mark R. Diggory" <md...@latte.harvard.edu> on 2003/05/14 16:24:47 UTC

[math][PATCH] was Re: [math] exceptions or NaN from Univariate

Thought I'd try creating a patch for this, let me know what you think.

-Mark

Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.
Just to let you know, I noticed a number of errors in my posted patch, 
I'm cooking up a better one.

-Mark

p.s. If you do decide to provide separate implementations of Univar, 
thats ok, I'm not for it, but I'll still work with it. ;-)


O'brien, Tim wrote:
> I can see why someone might want to use the Univariate implementation as
> implemented currently, it is fast and efficient and requires no
> storage.  If I'm trying to get Univariate stats for a group of 1000
> longs in J2ME I might be interested in a storage-less implementation of
> this. 
> 
> I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
> wondering if we might want to put this into another implementation -
> this implementation should also provide Mode.
> 
> I'd like to get a sense from [math] of whether we should modify
> Univariate in place or make Univariate an interface and provide multiple
> implementations. 
> 
> Also, using Integer.MAX_VALUE makes practical sense, but it might be
> better to choose a more "meaningless" default value that signifies
> infinity.  Double has the concept of POSITIVE_INFINITY, but integers do
> not.  "-1" is a common signal that a process has no positive upper
> limit.  I know this is a little bit of hair splitting, but I'd like to
> see what people think about this one.  I cannot forsee anyone needing to
> collect Univariate statistics on more than 2^31 - 1 elements, but I
> don't want to get in the business of introducing an arbitrary constant
> that causes some catastrophic failure.
> 
> 
> On Wed, 2003-05-14 at 09:24, Mark R. Diggory wrote:
> 
>>Thought I'd try creating a patch for this, let me know what you think.
>>
>>-Mark
>>----
>>
> 
> 
>>Index: Univariate.java
>>===================================================================
>>RCS file: /home/cvspublic/jakarta-commons-sandbox/math/src/java/org/apache/commons/math/Univariate.java,v
>>retrieving revision 1.1
>>diff -u -r1.1 Univariate.java
>>--- Univariate.java	12 May 2003 19:04:10 -0000	1.1
>>+++ Univariate.java	14 May 2003 14:22:37 -0000
>>@@ -85,6 +85,12 @@
>>     /** display name */
>>     private String name = "";
>> 
>>+	/** Array of values for rolling */ 
>>+	private double[] values = null;
>>+	
>>+	/** Array of values for rolling */ 
>>+	private int window = Integer.MAX_VALUE;
>>+		
>>     /** Creates new univariate */
>>     public Univariate() {
>>         clear();
>>@@ -96,6 +102,18 @@
>>         clear();
>>     }
>> 
>>+	/** Creates new univariate */
>>+	public Univariate(int window) {
>>+		this();
>>+		this.window = window;
>>+	}
>>+		
>>+	/** Creates a new univariate with the given name */
>>+	public Univariate(java.lang.String name, int window) {
>>+		this(name);
>>+		this.window = window;
>>+	}
>>+	
>>     /**
>>      * Adds the value, updating running sums.<br>
>>      * Converts value to a double before adding.
>>@@ -167,11 +185,24 @@
>>      * @param v the value to be added 
>>      */
>>     private void insertValue(double v) {
>>-        n += 1.0;
>>+        
>>         if (v < min) min = v;
>>         if (v > max) max = v;
>>         sum += v;
>>         sumsq += v*v;
>>+        
>>+        if(window != Integer.MAX_VALUE){
>>+			n = Math.min(n+=1.0, values.length );
>>+			sum -= values[window];
>>+			sumsq -= values[window]*values[window];
>>+			for(int i = window; i > 0 ;i--){
>>+				values[i] = values[i-1];
>>+			}
>>+			values[0] = v;
>>+        }else{
>>+			n += 1.0;
>>+        }
>>+		
>>     }
>> 
>>     /** Getter for property max.
>>
>>----
>>
> 
> 
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

O'brien, Tim wrote:
> On Wed, 2003-05-14 at 10:31, Mark R. Diggory wrote:
> <snip/> 
> 
>>>I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
>>>wondering if we might want to put this into another implementation -
>>>this implementation should also provide Mode.
>>>
>>
>>Possibly even higher order moments like kurtosis and skew.
> 
> 
> I guess my suggestion is that (if my reasoning is right) there are
> pieces of information such as Mode which demand knowledge of each
> element. If we are going to implement something with storage, it might
> make more sense to keep it separate from the existing implementation. 
> Also, one implementation should delegate to an existing Array or List.  
> 
> Think about someone maintaining a JMX MBean which maintains a List of
> one usage statistic.  I could image that a Univariate that takes a
> reference to an existing List or array could come in handy.  Someone
> makes some changes to the a List or array, and calculations are
> performed every time a Median, Mode, etc. is needed.  Someone who has
> this requirement clearly has different performance and storage needs
> than someone building a system with very limited memory limitations.  
> 

True...

> 
>>>I'd like to get a sense from [math] of whether we should modify
>>>Univariate in place or make Univariate an interface and provide multiple
>>>implementations. 
>>>
>>
>>In my opinion, I'm not sure there would be enough other implmentations 
>>to warrant this.
> 
> 
> There are many ways to skin a cat. (such a horrible image)

What a terrible thing to say ;-)

> Having a
> Univariate interface and corresponding implementations would leave the
> door open to people who might have different approaches or ideas.  It
> would also adhere to principle of "Maximizing abstractions to maximize
> stability".
> 
> Also, let's think about a utility that would calculate the covariance of
> two sets of numbers.  I'd rather have that operate on Two
> StoredUnivariate interfaces than have to worry about operating on
> concrete classes.
> 

Again, very legitamate.

> 
>>>Also, using Integer.MAX_VALUE makes practical sense, but <snip/>
>>
>>Theres a limitation here on the size of the array itself we're dealing 
>>with. whats the largest int[] you can have in Java? This is a cap on 
>>"int" and array capabilities, having a Window of 
>>"Double.POSITIVE_INFINITY - 1" is impossible from an array size 
>>standpoint, even having a Window of Integer.MAX_VALUE + 1 is impossible, 
>>an array "Integer.MAX_VALUE - 1" is theoretically possible. 
>>Integer.MAX_VALUE is the cap (although difficult to achieve with todays 
>>memory constraints).
>>
> 
> 
> I understand that array size limitations may get in the way, but today's
> exabyte is tomorrow's kilobyte.  My point was conceptual, but I do think
> it important to shy away from choosing constants that could conceivably
> attain real meaning - even if that meaning is currently impractical.

Point taken. I've actually eliminated the constant in my implementation 
in favor of testing the "null" state of values array.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "O'brien, Tim" <to...@transolutions.net>.
On Wed, 2003-05-14 at 10:31, Mark R. Diggory wrote:
<snip/> 
> > I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
> > wondering if we might want to put this into another implementation -
> > this implementation should also provide Mode.
> > 
> 
> Possibly even higher order moments like kurtosis and skew.

I guess my suggestion is that (if my reasoning is right) there are
pieces of information such as Mode which demand knowledge of each
element. If we are going to implement something with storage, it might
make more sense to keep it separate from the existing implementation. 
Also, one implementation should delegate to an existing Array or List.  

Think about someone maintaining a JMX MBean which maintains a List of
one usage statistic.  I could image that a Univariate that takes a
reference to an existing List or array could come in handy.  Someone
makes some changes to the a List or array, and calculations are
performed every time a Median, Mode, etc. is needed.  Someone who has
this requirement clearly has different performance and storage needs
than someone building a system with very limited memory limitations.  

> > I'd like to get a sense from [math] of whether we should modify
> > Univariate in place or make Univariate an interface and provide multiple
> > implementations. 
> > 
> 
> In my opinion, I'm not sure there would be enough other implmentations 
> to warrant this.

There are many ways to skin a cat. (such a horrible image)   Having a
Univariate interface and corresponding implementations would leave the
door open to people who might have different approaches or ideas.  It
would also adhere to principle of "Maximizing abstractions to maximize
stability".

Also, let's think about a utility that would calculate the covariance of
two sets of numbers.  I'd rather have that operate on Two
StoredUnivariate interfaces than have to worry about operating on
concrete classes.

> > Also, using Integer.MAX_VALUE makes practical sense, but <snip/>
> 
> Theres a limitation here on the size of the array itself we're dealing 
> with. whats the largest int[] you can have in Java? This is a cap on 
> "int" and array capabilities, having a Window of 
> "Double.POSITIVE_INFINITY - 1" is impossible from an array size 
> standpoint, even having a Window of Integer.MAX_VALUE + 1 is impossible, 
> an array "Integer.MAX_VALUE - 1" is theoretically possible. 
> Integer.MAX_VALUE is the cap (although difficult to achieve with todays 
> memory constraints).
> 

I understand that array size limitations may get in the way, but today's
exabyte is tomorrow's kilobyte.  My point was conceptual, but I do think
it important to shy away from choosing constants that could conceivably
attain real meaning - even if that meaning is currently impractical.



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

O'brien, Tim wrote:
> I can see why someone might want to use the Univariate implementation as
> implemented currently, it is fast and efficient and requires no
> storage.  If I'm trying to get Univariate stats for a group of 1000
> longs in J2ME I might be interested in a storage-less implementation of
> this. 
> 

You can should still be able to get this as default behavior, even with 
the changes I've proposed.

> I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
> wondering if we might want to put this into another implementation -
> this implementation should also provide Mode.
> 

Possibly even higher order moments like kurtosis and skew.


This is a tough call, is it so big a difference in implementation that 
it requires its own class, or is the window simply a feature of a 
Rolling Univariate Stat. It is a conceptual argument. I say, if everying 
other than that one decision on storage is the same in the two 
hypothetical implmentations, that its probibly not a great enough 
difference to warrant two different implmentations. However, if it is a 
feature the effects the performance of a significant number of 
properties in the Class, maybe it should be separate. so far I only see 
it effecting one method computationally "insertValue".



> I'd like to get a sense from [math] of whether we should modify
> Univariate in place or make Univariate an interface and provide multiple
> implementations. 
> 

In my opinion, I'm not sure there would be enough other implmentations 
to warrant this.

> Also, using Integer.MAX_VALUE makes practical sense, but it might be
> better to choose a more "meaningless" default value that signifies
> infinity.  Double has the concept of POSITIVE_INFINITY, but integers do
> not.  "-1" is a common signal that a process has no positive upper
> limit.  I know this is a little bit of hair splitting, but I'd like to
> see what people think about this one.  I cannot forsee anyone needing to
> collect Univariate statistics on more than 2^31 - 1 elements, but I
> don't want to get in the business of introducing an arbitrary constant
> that causes some catastrophic failure.

Theres a limitation here on the size of the array itself we're dealing 
with. whats the largest int[] you can have in Java? This is a cap on 
"int" and array capabilities, having a Window of 
"Double.POSITIVE_INFINITY - 1" is impossible from an array size 
standpoint, even having a Window of Integer.MAX_VALUE + 1 is impossible, 
an array "Integer.MAX_VALUE - 1" is theoretically possible. 
Integer.MAX_VALUE is the cap (although difficult to achieve with todays 
memory constraints).


On a side note:

I also think I can save "computational effort" during the array rolling 
by tracking an index to start from and looping the forloop around the 
ends of the array with a modulus.

-Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "O'brien, Tim" <to...@transolutions.net>.
I can see why someone might want to use the Univariate implementation as
implemented currently, it is fast and efficient and requires no
storage.  If I'm trying to get Univariate stats for a group of 1000
longs in J2ME I might be interested in a storage-less implementation of
this. 

I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
wondering if we might want to put this into another implementation -
this implementation should also provide Mode.

I'd like to get a sense from [math] of whether we should modify
Univariate in place or make Univariate an interface and provide multiple
implementations. 

Also, using Integer.MAX_VALUE makes practical sense, but it might be
better to choose a more "meaningless" default value that signifies
infinity.  Double has the concept of POSITIVE_INFINITY, but integers do
not.  "-1" is a common signal that a process has no positive upper
limit.  I know this is a little bit of hair splitting, but I'd like to
see what people think about this one.  I cannot forsee anyone needing to
collect Univariate statistics on more than 2^31 - 1 elements, but I
don't want to get in the business of introducing an arbitrary constant
that causes some catastrophic failure.


On Wed, 2003-05-14 at 09:24, Mark R. Diggory wrote:
> Thought I'd try creating a patch for this, let me know what you think.
> 
> -Mark
> ----
> 

> Index: Univariate.java
> ===================================================================
> RCS file: /home/cvspublic/jakarta-commons-sandbox/math/src/java/org/apache/commons/math/Univariate.java,v
> retrieving revision 1.1
> diff -u -r1.1 Univariate.java
> --- Univariate.java	12 May 2003 19:04:10 -0000	1.1
> +++ Univariate.java	14 May 2003 14:22:37 -0000
> @@ -85,6 +85,12 @@
>      /** display name */
>      private String name = "";
>  
> +	/** Array of values for rolling */ 
> +	private double[] values = null;
> +	
> +	/** Array of values for rolling */ 
> +	private int window = Integer.MAX_VALUE;
> +		
>      /** Creates new univariate */
>      public Univariate() {
>          clear();
> @@ -96,6 +102,18 @@
>          clear();
>      }
>  
> +	/** Creates new univariate */
> +	public Univariate(int window) {
> +		this();
> +		this.window = window;
> +	}
> +		
> +	/** Creates a new univariate with the given name */
> +	public Univariate(java.lang.String name, int window) {
> +		this(name);
> +		this.window = window;
> +	}
> +	
>      /**
>       * Adds the value, updating running sums.<br>
>       * Converts value to a double before adding.
> @@ -167,11 +185,24 @@
>       * @param v the value to be added 
>       */
>      private void insertValue(double v) {
> -        n += 1.0;
> +        
>          if (v < min) min = v;
>          if (v > max) max = v;
>          sum += v;
>          sumsq += v*v;
> +        
> +        if(window != Integer.MAX_VALUE){
> +			n = Math.min(n+=1.0, values.length );
> +			sum -= values[window];
> +			sumsq -= values[window]*values[window];
> +			for(int i = window; i > 0 ;i--){
> +				values[i] = values[i-1];
> +			}
> +			values[0] = v;
> +        }else{
> +			n += 1.0;
> +        }
> +		
>      }
>  
>      /** Getter for property max.
> 
> ----
> 

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org