You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@commons.apache.org by "O'brien, Tim" <to...@transolutions.net> on 2003/05/14 00:55:57 UTC

[math] exceptions or NaN from Univariate

Univariate getMean(), getVariance(), and getStandardDeviation() all
contain a note to throw an exception if n = 0.  It should be noted that
getVariance() and getStandardDeviation() don't make sense until n = 2.

The mean of an empty set is not a number, and currently calling (new
Univariate("blah")).getMean() returns NaN.  I'm just wondering if
throwing an exception is worth the trouble?  Any thoughts?
 
Tim
 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Just to let you know, I noticed a number of errors in my posted patch, 
I'm cooking up a better one.

-Mark

p.s. If you do decide to provide separate implementations of Univar, 
thats ok, I'm not for it, but I'll still work with it. ;-)


O'brien, Tim wrote:
> I can see why someone might want to use the Univariate implementation as
> implemented currently, it is fast and efficient and requires no
> storage.  If I'm trying to get Univariate stats for a group of 1000
> longs in J2ME I might be interested in a storage-less implementation of
> this. 
> 
> I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
> wondering if we might want to put this into another implementation -
> this implementation should also provide Mode.
> 
> I'd like to get a sense from [math] of whether we should modify
> Univariate in place or make Univariate an interface and provide multiple
> implementations. 
> 
> Also, using Integer.MAX_VALUE makes practical sense, but it might be
> better to choose a more "meaningless" default value that signifies
> infinity.  Double has the concept of POSITIVE_INFINITY, but integers do
> not.  "-1" is a common signal that a process has no positive upper
> limit.  I know this is a little bit of hair splitting, but I'd like to
> see what people think about this one.  I cannot forsee anyone needing to
> collect Univariate statistics on more than 2^31 - 1 elements, but I
> don't want to get in the business of introducing an arbitrary constant
> that causes some catastrophic failure.
> 
> 
> On Wed, 2003-05-14 at 09:24, Mark R. Diggory wrote:
> 
>>Thought I'd try creating a patch for this, let me know what you think.
>>
>>-Mark
>>----
>>
> 
> 
>>Index: Univariate.java
>>===================================================================
>>RCS file: /home/cvspublic/jakarta-commons-sandbox/math/src/java/org/apache/commons/math/Univariate.java,v
>>retrieving revision 1.1
>>diff -u -r1.1 Univariate.java
>>--- Univariate.java	12 May 2003 19:04:10 -0000	1.1
>>+++ Univariate.java	14 May 2003 14:22:37 -0000
>>@@ -85,6 +85,12 @@
>>     /** display name */
>>     private String name = "";
>> 
>>+	/** Array of values for rolling */ 
>>+	private double[] values = null;
>>+	
>>+	/** Array of values for rolling */ 
>>+	private int window = Integer.MAX_VALUE;
>>+		
>>     /** Creates new univariate */
>>     public Univariate() {
>>         clear();
>>@@ -96,6 +102,18 @@
>>         clear();
>>     }
>> 
>>+	/** Creates new univariate */
>>+	public Univariate(int window) {
>>+		this();
>>+		this.window = window;
>>+	}
>>+		
>>+	/** Creates a new univariate with the given name */
>>+	public Univariate(java.lang.String name, int window) {
>>+		this(name);
>>+		this.window = window;
>>+	}
>>+	
>>     /**
>>      * Adds the value, updating running sums.<br>
>>      * Converts value to a double before adding.
>>@@ -167,11 +185,24 @@
>>      * @param v the value to be added 
>>      */
>>     private void insertValue(double v) {
>>-        n += 1.0;
>>+        
>>         if (v < min) min = v;
>>         if (v > max) max = v;
>>         sum += v;
>>         sumsq += v*v;
>>+        
>>+        if(window != Integer.MAX_VALUE){
>>+			n = Math.min(n+=1.0, values.length );
>>+			sum -= values[window];
>>+			sumsq -= values[window]*values[window];
>>+			for(int i = window; i > 0 ;i--){
>>+				values[i] = values[i-1];
>>+			}
>>+			values[0] = v;
>>+        }else{
>>+			n += 1.0;
>>+        }
>>+		
>>     }
>> 
>>     /** Getter for property max.
>>
>>----
>>
> 
> 
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.


O'brien, Tim wrote:
> On Wed, 2003-05-14 at 10:31, Mark R. Diggory wrote:
> <snip/> 
> 
>>>I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
>>>wondering if we might want to put this into another implementation -
>>>this implementation should also provide Mode.
>>>
>>
>>Possibly even higher order moments like kurtosis and skew.
> 
> 
> I guess my suggestion is that (if my reasoning is right) there are
> pieces of information such as Mode which demand knowledge of each
> element. If we are going to implement something with storage, it might
> make more sense to keep it separate from the existing implementation. 
> Also, one implementation should delegate to an existing Array or List.  
> 
> Think about someone maintaining a JMX MBean which maintains a List of
> one usage statistic.  I could image that a Univariate that takes a
> reference to an existing List or array could come in handy.  Someone
> makes some changes to the a List or array, and calculations are
> performed every time a Median, Mode, etc. is needed.  Someone who has
> this requirement clearly has different performance and storage needs
> than someone building a system with very limited memory limitations.  
> 

True...

> 
>>>I'd like to get a sense from [math] of whether we should modify
>>>Univariate in place or make Univariate an interface and provide multiple
>>>implementations. 
>>>
>>
>>In my opinion, I'm not sure there would be enough other implmentations 
>>to warrant this.
> 
> 
> There are many ways to skin a cat. (such a horrible image)

What a terrible thing to say ;-)

> Having a
> Univariate interface and corresponding implementations would leave the
> door open to people who might have different approaches or ideas.  It
> would also adhere to principle of "Maximizing abstractions to maximize
> stability".
> 
> Also, let's think about a utility that would calculate the covariance of
> two sets of numbers.  I'd rather have that operate on Two
> StoredUnivariate interfaces than have to worry about operating on
> concrete classes.
> 

Again, very legitamate.

> 
>>>Also, using Integer.MAX_VALUE makes practical sense, but <snip/>
>>
>>Theres a limitation here on the size of the array itself we're dealing 
>>with. whats the largest int[] you can have in Java? This is a cap on 
>>"int" and array capabilities, having a Window of 
>>"Double.POSITIVE_INFINITY - 1" is impossible from an array size 
>>standpoint, even having a Window of Integer.MAX_VALUE + 1 is impossible, 
>>an array "Integer.MAX_VALUE - 1" is theoretically possible. 
>>Integer.MAX_VALUE is the cap (although difficult to achieve with todays 
>>memory constraints).
>>
> 
> 
> I understand that array size limitations may get in the way, but today's
> exabyte is tomorrow's kilobyte.  My point was conceptual, but I do think
> it important to shy away from choosing constants that could conceivably
> attain real meaning - even if that meaning is currently impractical.

Point taken. I've actually eliminated the constant in my implementation 
in favor of testing the "null" state of values array.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "O'brien, Tim" <to...@transolutions.net>.

On Wed, 2003-05-14 at 10:31, Mark R. Diggory wrote:
<snip/> 
> > I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
> > wondering if we might want to put this into another implementation -
> > this implementation should also provide Mode.
> > 
> 
> Possibly even higher order moments like kurtosis and skew.

I guess my suggestion is that (if my reasoning is right) there are
pieces of information such as Mode which demand knowledge of each
element. If we are going to implement something with storage, it might
make more sense to keep it separate from the existing implementation. 
Also, one implementation should delegate to an existing Array or List.  

Think about someone maintaining a JMX MBean which maintains a List of
one usage statistic.  I could image that a Univariate that takes a
reference to an existing List or array could come in handy.  Someone
makes some changes to the a List or array, and calculations are
performed every time a Median, Mode, etc. is needed.  Someone who has
this requirement clearly has different performance and storage needs
than someone building a system with very limited memory limitations.  

> > I'd like to get a sense from [math] of whether we should modify
> > Univariate in place or make Univariate an interface and provide multiple
> > implementations. 
> > 
> 
> In my opinion, I'm not sure there would be enough other implmentations 
> to warrant this.

There are many ways to skin a cat. (such a horrible image)   Having a
Univariate interface and corresponding implementations would leave the
door open to people who might have different approaches or ideas.  It
would also adhere to principle of "Maximizing abstractions to maximize
stability".

Also, let's think about a utility that would calculate the covariance of
two sets of numbers.  I'd rather have that operate on Two
StoredUnivariate interfaces than have to worry about operating on
concrete classes.

> > Also, using Integer.MAX_VALUE makes practical sense, but <snip/>
> 
> Theres a limitation here on the size of the array itself we're dealing 
> with. whats the largest int[] you can have in Java? This is a cap on 
> "int" and array capabilities, having a Window of 
> "Double.POSITIVE_INFINITY - 1" is impossible from an array size 
> standpoint, even having a Window of Integer.MAX_VALUE + 1 is impossible, 
> an array "Integer.MAX_VALUE - 1" is theoretically possible. 
> Integer.MAX_VALUE is the cap (although difficult to achieve with todays 
> memory constraints).
> 

I understand that array size limitations may get in the way, but today's
exabyte is tomorrow's kilobyte.  My point was conceptual, but I do think
it important to shy away from choosing constants that could conceivably
attain real meaning - even if that meaning is currently impractical.

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

O'brien, Tim wrote:
> I can see why someone might want to use the Univariate implementation as
> implemented currently, it is fast and efficient and requires no
> storage.  If I'm trying to get Univariate stats for a group of 1000
> longs in J2ME I might be interested in a storage-less implementation of
> this. 
> 

You can should still be able to get this as default behavior, even with 
the changes I've proposed.

> I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
> wondering if we might want to put this into another implementation -
> this implementation should also provide Mode.
> 

Possibly even higher order moments like kurtosis and skew.

This is a tough call, is it so big a difference in implementation that 
it requires its own class, or is the window simply a feature of a 
Rolling Univariate Stat. It is a conceptual argument. I say, if everying 
other than that one decision on storage is the same in the two 
hypothetical implmentations, that its probibly not a great enough 
difference to warrant two different implmentations. However, if it is a 
feature the effects the performance of a significant number of 
properties in the Class, maybe it should be separate. so far I only see 
it effecting one method computationally "insertValue".

> I'd like to get a sense from [math] of whether we should modify
> Univariate in place or make Univariate an interface and provide multiple
> implementations. 
> 

In my opinion, I'm not sure there would be enough other implmentations 
to warrant this.

> Also, using Integer.MAX_VALUE makes practical sense, but it might be
> better to choose a more "meaningless" default value that signifies
> infinity.  Double has the concept of POSITIVE_INFINITY, but integers do
> not.  "-1" is a common signal that a process has no positive upper
> limit.  I know this is a little bit of hair splitting, but I'd like to
> see what people think about this one.  I cannot forsee anyone needing to
> collect Univariate statistics on more than 2^31 - 1 elements, but I
> don't want to get in the business of introducing an arbitrary constant
> that causes some catastrophic failure.

Theres a limitation here on the size of the array itself we're dealing 
with. whats the largest int[] you can have in Java? This is a cap on 
"int" and array capabilities, having a Window of 
"Double.POSITIVE_INFINITY - 1" is impossible from an array size 
standpoint, even having a Window of Integer.MAX_VALUE + 1 is impossible, 
an array "Integer.MAX_VALUE - 1" is theoretically possible. 
Integer.MAX_VALUE is the cap (although difficult to achieve with todays 
memory constraints).

On a side note:

I also think I can save "computational effort" during the array rolling 
by tracking an index to start from and looping the forloop around the 
ends of the array with a modulus.

-Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "O'brien, Tim" <to...@transolutions.net>.

I can see why someone might want to use the Univariate implementation as
implemented currently, it is fast and efficient and requires no
storage.  If I'm trying to get Univariate stats for a group of 1000
longs in J2ME I might be interested in a storage-less implementation of
this. 

I do see that if window == Integer.MAX_VALUE no storage is used, but I'm
wondering if we might want to put this into another implementation -
this implementation should also provide Mode.

I'd like to get a sense from [math] of whether we should modify
Univariate in place or make Univariate an interface and provide multiple
implementations. 

Also, using Integer.MAX_VALUE makes practical sense, but it might be
better to choose a more "meaningless" default value that signifies
infinity.  Double has the concept of POSITIVE_INFINITY, but integers do
not.  "-1" is a common signal that a process has no positive upper
limit.  I know this is a little bit of hair splitting, but I'd like to
see what people think about this one.  I cannot forsee anyone needing to
collect Univariate statistics on more than 2^31 - 1 elements, but I
don't want to get in the business of introducing an arbitrary constant
that causes some catastrophic failure.

On Wed, 2003-05-14 at 09:24, Mark R. Diggory wrote:
> Thought I'd try creating a patch for this, let me know what you think.
> 
> -Mark
> ----
> 

> Index: Univariate.java
> ===================================================================
> RCS file: /home/cvspublic/jakarta-commons-sandbox/math/src/java/org/apache/commons/math/Univariate.java,v
> retrieving revision 1.1
> diff -u -r1.1 Univariate.java
> --- Univariate.java	12 May 2003 19:04:10 -0000	1.1
> +++ Univariate.java	14 May 2003 14:22:37 -0000
> @@ -85,6 +85,12 @@
>      /** display name */
>      private String name = "";
>  
> +	/** Array of values for rolling */ 
> +	private double[] values = null;
> +	
> +	/** Array of values for rolling */ 
> +	private int window = Integer.MAX_VALUE;
> +		
>      /** Creates new univariate */
>      public Univariate() {
>          clear();
> @@ -96,6 +102,18 @@
>          clear();
>      }
>  
> +	/** Creates new univariate */
> +	public Univariate(int window) {
> +		this();
> +		this.window = window;
> +	}
> +		
> +	/** Creates a new univariate with the given name */
> +	public Univariate(java.lang.String name, int window) {
> +		this(name);
> +		this.window = window;
> +	}
> +	
>      /**
>       * Adds the value, updating running sums.<br>
>       * Converts value to a double before adding.
> @@ -167,11 +185,24 @@
>       * @param v the value to be added 
>       */
>      private void insertValue(double v) {
> -        n += 1.0;
> +        
>          if (v < min) min = v;
>          if (v > max) max = v;
>          sum += v;
>          sumsq += v*v;
> +        
> +        if(window != Integer.MAX_VALUE){
> +			n = Math.min(n+=1.0, values.length );
> +			sum -= values[window];
> +			sumsq -= values[window]*values[window];
> +			for(int i = window; i > 0 ;i--){
> +				values[i] = values[i-1];
> +			}
> +			values[0] = v;
> +        }else{
> +			n += 1.0;
> +        }
> +		
>      }
>  
>      /** Getter for property max.
> 
> ----
> 

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

[math][PATCH] was Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

Thought I'd try creating a patch for this, let me know what you think.

-Mark

Re: [math] exceptions or NaN from Univariate

Posted by Phil Steitz <ph...@steitz.com>.

O'brien, Tim wrote:
> On Wed, 2003-05-14 at 09:04, Mark R. Diggory wrote:
> 
>>So, from what I can see, if the window isn't infinite, some storage 
>>needs to occur. (But again, I could easily be missing something really 
>>obvious here).
> 
> 
> there is no way to remove the influence of element[window+1] without
> knowing what that element is.
> 
> I think there might be a need for two Univariate implementations.  The
> current implementation would remain and not store raw values, and the
> second implementation would store a configurable number of values in a
> (optionally finite) FIFO.  In other words, the "StoredUnivariate" would
> have a no-arg constructor which would take into account every element,
> as well as a constructor with an int argument for the implementation of
> this Univariate "window".  Remembering raw values would allow us to use
> Freq internally to calculate the mode of a statistical population.
> 

Seems to me that we should be able to maintain just _one_ "lagged" value 
(the next one to be rolled off) and then adjust sums for the difference 
between the contribution of the new value and that of the value being 
rolled off.  I will play with this and submit a patch this eve. (unless 
someone else does first ;-)

Assuming that I am not out of my mind, a "bonus" question is what is the 
minimal number of values that we would need to retain to keep additional 
order statistics up to date (median, quartiles)?

I *strongly* agree that we should not include the need to store all of 
the values in this implementation, since I also use this thing for 
*large* datasets.

Phil

> Tim
> 
> 
> 
>>-Mark
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by "O'brien, Tim" <to...@transolutions.net>.

On Wed, 2003-05-14 at 09:04, Mark R. Diggory wrote:
> So, from what I can see, if the window isn't infinite, some storage 
> needs to occur. (But again, I could easily be missing something really 
> obvious here).

there is no way to remove the influence of element[window+1] without
knowing what that element is.

I think there might be a need for two Univariate implementations.  The
current implementation would remain and not store raw values, and the
second implementation would store a configurable number of values in a
(optionally finite) FIFO.  In other words, the "StoredUnivariate" would
have a no-arg constructor which would take into account every element,
as well as a constructor with an int argument for the implementation of
this Univariate "window".  Remembering raw values would allow us to use
Freq internally to calculate the mode of a statistical population.

Tim

> 
> -Mark
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.


Phil Steitz wrote:
> Mark R. Diggory wrote:
> 
>>
>>
>> Phil Steitz wrote:
>>
>>>
>>> A useful extension to Univariate would be to support a "rolling" 
>>> capability, as follows:
>>>
>>> Add a property called something like "windowSize" and change the 
>>> contract to mean that getMean(), getVariance, etc. always return 
>>> statistics on values {n, n-1, ... n-windowSize+1}.  Have the default 
>>> windowSize "infinity" (i.e., no restriction).  This would be useful 
>>> in applications (e.g simulation monitors) that need to compute 
>>> "rolling averages".
>>>
>>> Obviously, we would want to do this without storing all of the values 
>>> {n, n-1, ... n-windowSize+1}.
>>>
>>> Phil
>>>
>> Is there a strategy to do this without storing all the values of the 
>> current window? if your just storing sum and sumsq, I'm not sure how 
>> you would "back out" of the computation at the end of the window. Of 
>> course, I may be missing something.
>>
> 
> I have not actually implemented this, so of course *I* may be missing 
> something, but is seems to me that you could just hold on to value 
> number n-windowSize+1 and adjust the sums as new values get added. You 
> would also have to hold onto some additional values for min, max to 
> handle the case where the value being "rolled off" is the current min or 
> max.
> 
> Phil
> 

This is where I draw a blank, to hold on to value number n-windowSize+1, 
it would seem you need to retain {n, n-1, ... n-windowSize+1} this is 
because in the next roll {n --> n-1 , n-1 --> n-2, ... n-windowSize  --> 
n-windowsize+1}. this means that an array of values {n to 
n-windowsize+1} would need to be retained so you would be able to do the 
following calculations

sum = sum - values[windowsize];

sumsq = sumsq - Math.pow(values[windowsize], 2);

Then the array would get rolled where

for(int i = windowsize; i > 0 ;i--){
	values[i] = values[i-1]
}

and the new value added to it:

values[0] = n;

sum = sum + values[0];

sumsq = sumsq + Math.pow(values[0], 2);

So, from what I can see, if the window isn't infinite, some storage 
needs to occur. (But again, I could easily be missing something really 
obvious here).

-Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by Phil Steitz <ph...@steitz.com>.

Mark R. Diggory wrote:
> 
> 
> Phil Steitz wrote:
> 
>>
>> A useful extension to Univariate would be to support a "rolling" 
>> capability, as follows:
>>
>> Add a property called something like "windowSize" and change the 
>> contract to mean that getMean(), getVariance, etc. always return 
>> statistics on values {n, n-1, ... n-windowSize+1}.  Have the default 
>> windowSize "infinity" (i.e., no restriction).  This would be useful in 
>> applications (e.g simulation monitors) that need to compute "rolling 
>> averages".
>>
>> Obviously, we would want to do this without storing all of the values 
>> {n, n-1, ... n-windowSize+1}.
>>
>> Phil
>>
> Is there a strategy to do this without storing all the values of the 
> current window? if your just storing sum and sumsq, I'm not sure how you 
> would "back out" of the computation at the end of the window. Of course, 
> I may be missing something.
> 

I have not actually implemented this, so of course *I* may be missing 
something, but is seems to me that you could just hold on to value 
number n-windowSize+1 and adjust the sums as new values get added. You 
would also have to hold onto some additional values for min, max to 
handle the case where the value being "rolled off" is the current min or 
max.

Phil

> -Mark
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.


Phil Steitz wrote:
> 
> A useful extension to Univariate would be to support a "rolling" 
> capability, as follows:
> 
> Add a property called something like "windowSize" and change the 
> contract to mean that getMean(), getVariance, etc. always return 
> statistics on values {n, n-1, ... n-windowSize+1}.  Have the default 
> windowSize "infinity" (i.e., no restriction).  This would be useful in 
> applications (e.g simulation monitors) that need to compute "rolling 
> averages".
> 
> Obviously, we would want to do this without storing all of the values 
> {n, n-1, ... n-windowSize+1}.
> 
> Phil
> 
Is there a strategy to do this without storing all the values of the 
current window? if your just storing sum and sumsq, I'm not sure how you 
would "back out" of the computation at the end of the window. Of course, 
I may be missing something.

-Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by Phil Steitz <ph...@steitz.com>.

O'brien, Tim wrote:
> Univariate getMean(), getVariance(), and getStandardDeviation() all
> contain a note to throw an exception if n = 0.  It should be noted that
> getVariance() and getStandardDeviation() don't make sense until n = 2.
> 
> The mean of an empty set is not a number, and currently calling (new
> Univariate("blah")).getMean() returns NaN.  I'm just wondering if
> throwing an exception is worth the trouble?  Any thoughts?
>  
> Tim
>  

A useful extension to Univariate would be to support a "rolling" 
capability, as follows:

Add a property called something like "windowSize" and change the 
contract to mean that getMean(), getVariance, etc. always return 
statistics on values {n, n-1, ... n-windowSize+1}.  Have the default 
windowSize "infinity" (i.e., no restriction).  This would be useful in 
applications (e.g simulation monitors) that need to compute "rolling 
averages".

Obviously, we would want to do this without storing all of the values 
{n, n-1, ... n-windowSize+1}.

Phil

> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by Phil Steitz <ph...@steitz.com>.

Mark R. Diggory wrote:
> The real issue in the case of returning NaN vs. an Exception is the 
> following question:
> 
> Is there "any other case" where the same methods could return a NaN or 
> an Exception? And if so, is there any reason you would want to tell the 
> user that there is "a difference in state" when you can't return an 
> expected value from the method.
> 
> Another hypothetical NaN issue arises if the input to the calculation 
> contains a NaN. I think in such a case the whole computation would get 
> locked in a NaN state.
> 
> For example: all the following standard Java methods and operators 
> return NaN
> 
> System.out.println(6.567 + Double.NaN);
> System.out.println(6.567 / Double.NaN);
> System.out.println(6.567 * Double.NaN);
> System.out.println(6.567 % Double.NaN);
> System.out.println(6.567 - Double.NaN);
>        
> System.out.println(Double.NaN + 6.567);
> System.out.println(Double.NaN / 6.567);
> System.out.println(Double.NaN * 6.567);
> System.out.println(Double.NaN % 6.567);
> System.out.println(Double.NaN - 6.567);
>        
> System.out.println(Math.pow(Double.NaN,6.567));
> System.out.println(Math.sin(Double.NaN));
> System.out.println(Math.sqrt(Double.NaN));
> System.out.println(Math.ceil(Double.NaN));
> 
> If its the case that all these return NaN (especially the last 4), it 
> would be wise to do so only because it is consistent with the behavior 
> already present in java.

I agree. We should be consistent with the behaviour of the Math methods. 
  This needs to be documented carefully.

> 
> But, in a rolling calculation, once sum and sumsq turn into NaN's, I 
> think your stuck with the calculation always returning NaN. So you need 
> to deal with and document how the statistic will handle such a case.

Here again, I agree, and I agree we need to document the behavior.

> 
> -Mark
> 
> 
> 
> Phil Steitz wrote:
> 
>> O'brien, Tim wrote:
>>
>>> On Tue, 2003-05-13 at 23:13, Phil Steitz wrote:
>>> <snip/>
>>>
>>>> On second thought, I agree with you.  You are correct that if we 
>>>> really want to throw something for "insufficient data" situations, 
>>>> we should require n >= 1 for the mean and either force n >= 2 for 
>>>> variance, std dev or modify these to return 0 if n < 2.  May not be 
>>>> worth the trouble.
>>>
>>>
>>>
>>>
>>> I think NaN is a good answer for unanswerable questions like "What is
>>> the mean of an empty set?" or "What is the length of a point?"  Nothing
>>> prevents you from asking the question, and throwing an exception just
>>> seems unnecessary.
>>>
>>> Modifying variance and standard deviation to return 0 when n = 1 makes
>>> sense.  
>>
>>
>>
>> What about n = 0?
>>
>>>
>>> Tim
>>>
>>>
>>>> Phil
>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>>>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by "Mark R. Diggory" <md...@latte.harvard.edu>.

The real issue in the case of returning NaN vs. an Exception is the 
following question:

Is there "any other case" where the same methods could return a NaN or 
an Exception? And if so, is there any reason you would want to tell the 
user that there is "a difference in state" when you can't return an 
expected value from the method.

Another hypothetical NaN issue arises if the input to the calculation 
contains a NaN. I think in such a case the whole computation would get 
locked in a NaN state.

For example: all the following standard Java methods and operators 
return NaN

System.out.println(6.567 + Double.NaN);
System.out.println(6.567 / Double.NaN);
System.out.println(6.567 * Double.NaN);
System.out.println(6.567 % Double.NaN);
System.out.println(6.567 - Double.NaN);
		
System.out.println(Double.NaN + 6.567);
System.out.println(Double.NaN / 6.567);
System.out.println(Double.NaN * 6.567);
System.out.println(Double.NaN % 6.567);
System.out.println(Double.NaN - 6.567);
		
System.out.println(Math.pow(Double.NaN,6.567));
System.out.println(Math.sin(Double.NaN));
System.out.println(Math.sqrt(Double.NaN));
System.out.println(Math.ceil(Double.NaN));

If its the case that all these return NaN (especially the last 4), it 
would be wise to do so only because it is consistent with the behavior 
already present in java.

But, in a rolling calculation, once sum and sumsq turn into NaN's, I 
think your stuck with the calculation always returning NaN. So you need 
to deal with and document how the statistic will handle such a case.

-Mark



Phil Steitz wrote:
> O'brien, Tim wrote:
> 
>> On Tue, 2003-05-13 at 23:13, Phil Steitz wrote:
>> <snip/>
>>
>>> On second thought, I agree with you.  You are correct that if we 
>>> really want to throw something for "insufficient data" situations, we 
>>> should require n >= 1 for the mean and either force n >= 2 for 
>>> variance, std dev or modify these to return 0 if n < 2.  May not be 
>>> worth the trouble.
>>
>>
>>
>> I think NaN is a good answer for unanswerable questions like "What is
>> the mean of an empty set?" or "What is the length of a point?"  Nothing
>> prevents you from asking the question, and throwing an exception just
>> seems unnecessary.
>>
>> Modifying variance and standard deviation to return 0 when n = 1 makes
>> sense.  
> 
> 
> What about n = 0?
> 
>>
>> Tim
>>
>>
>>> Phil
>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>>
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by Phil Steitz <ph...@steitz.com>.

O'brien, Tim wrote:
> On Tue, 2003-05-13 at 23:13, Phil Steitz wrote:
> <snip/>
> 
>>On second thought, I agree with you.  You are correct that if we really 
>>want to throw something for "insufficient data" situations, we should 
>>require n >= 1 for the mean and either force n >= 2 for variance, std 
>>dev or modify these to return 0 if n < 2.  May not be worth the trouble.
> 
> 
> I think NaN is a good answer for unanswerable questions like "What is
> the mean of an empty set?" or "What is the length of a point?"  Nothing
> prevents you from asking the question, and throwing an exception just
> seems unnecessary.
> 
> Modifying variance and standard deviation to return 0 when n = 1 makes
> sense.  

What about n = 0?

> 
> Tim
> 
> 
>>Phil
>>
>>>
>>>---------------------------------------------------------------------
>>>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>>
>>
>>
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: commons-dev-help@jakarta.apache.org
>>
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by "O'brien, Tim" <to...@transolutions.net>.

On Tue, 2003-05-13 at 23:13, Phil Steitz wrote:
<snip/>

> On second thought, I agree with you.  You are correct that if we really 
> want to throw something for "insufficient data" situations, we should 
> require n >= 1 for the mean and either force n >= 2 for variance, std 
> dev or modify these to return 0 if n < 2.  May not be worth the trouble.

I think NaN is a good answer for unanswerable questions like "What is
the mean of an empty set?" or "What is the length of a point?"  Nothing
prevents you from asking the question, and throwing an exception just
seems unnecessary.

Modifying variance and standard deviation to return 0 when n = 1 makes
sense.  

Tim

> 
> Phil
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> > 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org

Re: [math] exceptions or NaN from Univariate

Posted by Phil Steitz <ph...@steitz.com>.

O'brien, Tim wrote:
> Univariate getMean(), getVariance(), and getStandardDeviation() all
> contain a note to throw an exception if n = 0.  It should be noted that
> getVariance() and getStandardDeviation() don't make sense until n = 2.
> 
> The mean of an empty set is not a number, and currently calling (new
> Univariate("blah")).getMean() returns NaN.  I'm just wondering if
> throwing an exception is worth the trouble?  Any thoughts?
>  
> Tim
>  
> 
On second thought, I agree with you.  You are correct that if we really 
want to throw something for "insufficient data" situations, we should 
require n >= 1 for the mean and either force n >= 2 for variance, std 
dev or modify these to return 0 if n < 2.  May not be worth the trouble.

Phil
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: commons-dev-help@jakarta.apache.org
> 




---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org