You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by sebb <se...@gmail.com> on 2013/07/16 03:55:22 UTC

[MATH] Should Frequency treat NaN specially?

At present the Frequency class seems to treat NaN as just another Comparable.
On my system it sorts as above POSITIVE_INFINITY.

So getMode() can return NaN entries, as can the iterators.

Is this reasonable behaviour?
Should it be documented and therefore tested?

Or should NaN be disallowed from entering the frequency table?

I suppose it would be easy enough to subclass Frequency and override
incrementValue(Comparable<?> v, long increment) to reject NaNs.

But then we would need to document that addValue(Comparable<?> v)
calls it - or the subclass would need to override both to be sure.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH] Should Frequency treat NaN specially?

Posted by Phil Steitz <ph...@gmail.com>.
On 7/16/13 2:07 AM, sebb wrote:
> On 16 July 2013 07:02, Phil Steitz <ph...@gmail.com> wrote:
>> On 7/15/13 6:55 PM, sebb wrote:
>>> At present the Frequency class seems to treat NaN as just another Comparable.
>>> On my system it sorts as above POSITIVE_INFINITY.
>>>
>>> So getMode() can return NaN entries, as can the iterators.
>>>
>>> Is this reasonable behaviour?
>>> Should it be documented and therefore tested?
>>>
>>> Or should NaN be disallowed from entering the frequency table?
>>>
>>> I suppose it would be easy enough to subclass Frequency and override
>>> incrementValue(Comparable<?> v, long increment) to reject NaNs.
>>>
>>> But then we would need to document that addValue(Comparable<?> v)
>>> calls it - or the subclass would need to override both to be sure.
>> My opinion is that Frequency should not treat Double.NaN specially.
>> This is because Frequency *requires* a total ordering of whatever
>> domain objects it works with and the semantics of all of its methods
>> are defined in terms of the domain and the specified comparator.
>> The (default) natural ordering (expressed by Double.compareTo) puts
>> NaN at the top of the ordering (making it inconsistent with equals,
>> but keeping it total).  I don't think it is a good idea to try to
>> put special code in to restrict the domain just for Doubles.   It
>> probably is a good idea to document and test the current behavior;
>> though I suspect that (other than the case below) Double will rarely
>> be used as the domain of a Frequency instance, as discrete
>> distribution values are most often mapped onto integers.
> OK, I'll add comments and tests for Double.NaN.
>
>> In StatUtils, on the other hand, I think it makes sense to drop NaNs
>> from mode(double[] sample), since in this case we know the arguments
>> are doubles and we know what NaN means (or more precisely, what it
>> does not mean :)
> So the StatUtils method must drop any NaN entries itself.
Right.

Phil
>
>> Phil
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH] Should Frequency treat NaN specially?

Posted by sebb <se...@gmail.com>.
On 16 July 2013 07:02, Phil Steitz <ph...@gmail.com> wrote:
> On 7/15/13 6:55 PM, sebb wrote:
>> At present the Frequency class seems to treat NaN as just another Comparable.
>> On my system it sorts as above POSITIVE_INFINITY.
>>
>> So getMode() can return NaN entries, as can the iterators.
>>
>> Is this reasonable behaviour?
>> Should it be documented and therefore tested?
>>
>> Or should NaN be disallowed from entering the frequency table?
>>
>> I suppose it would be easy enough to subclass Frequency and override
>> incrementValue(Comparable<?> v, long increment) to reject NaNs.
>>
>> But then we would need to document that addValue(Comparable<?> v)
>> calls it - or the subclass would need to override both to be sure.
>
> My opinion is that Frequency should not treat Double.NaN specially.
> This is because Frequency *requires* a total ordering of whatever
> domain objects it works with and the semantics of all of its methods
> are defined in terms of the domain and the specified comparator.
> The (default) natural ordering (expressed by Double.compareTo) puts
> NaN at the top of the ordering (making it inconsistent with equals,
> but keeping it total).  I don't think it is a good idea to try to
> put special code in to restrict the domain just for Doubles.   It
> probably is a good idea to document and test the current behavior;
> though I suspect that (other than the case below) Double will rarely
> be used as the domain of a Frequency instance, as discrete
> distribution values are most often mapped onto integers.

OK, I'll add comments and tests for Double.NaN.

> In StatUtils, on the other hand, I think it makes sense to drop NaNs
> from mode(double[] sample), since in this case we know the arguments
> are doubles and we know what NaN means (or more precisely, what it
> does not mean :)

So the StatUtils method must drop any NaN entries itself.

> Phil
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [MATH] Should Frequency treat NaN specially?

Posted by Phil Steitz <ph...@gmail.com>.
On 7/15/13 6:55 PM, sebb wrote:
> At present the Frequency class seems to treat NaN as just another Comparable.
> On my system it sorts as above POSITIVE_INFINITY.
>
> So getMode() can return NaN entries, as can the iterators.
>
> Is this reasonable behaviour?
> Should it be documented and therefore tested?
>
> Or should NaN be disallowed from entering the frequency table?
>
> I suppose it would be easy enough to subclass Frequency and override
> incrementValue(Comparable<?> v, long increment) to reject NaNs.
>
> But then we would need to document that addValue(Comparable<?> v)
> calls it - or the subclass would need to override both to be sure.

My opinion is that Frequency should not treat Double.NaN specially. 
This is because Frequency *requires* a total ordering of whatever
domain objects it works with and the semantics of all of its methods
are defined in terms of the domain and the specified comparator. 
The (default) natural ordering (expressed by Double.compareTo) puts
NaN at the top of the ordering (making it inconsistent with equals,
but keeping it total).  I don't think it is a good idea to try to
put special code in to restrict the domain just for Doubles.   It
probably is a good idea to document and test the current behavior;
though I suspect that (other than the case below) Double will rarely
be used as the domain of a Frequency instance, as discrete
distribution values are most often mapped onto integers.

In StatUtils, on the other hand, I think it makes sense to drop NaNs
from mode(double[] sample), since in this case we know the arguments
are doubles and we know what NaN means (or more precisely, what it
does not mean :)

Phil
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org