You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Luc Maisonobe <lu...@spaceroots.org> on 2014/06/23 20:38:13 UTC

[math] use case for NaNStrategy.FIXED?

Hi all,

While looking further in Percentile class for MATH-1120, I have found
another problem in the current implementation. NaNStrategy.FIXED should
leave the NaNs in place, but at the end of the KthSelector.select
method, a call to Arrays.sort moves the NaNs to the end of the small
sub-array. What is really implemented here is therefore closer to
NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
test cases because they use very short arrays, and we directly switch to
this part of the select method.

When I implemented the method, I explicitly avoided calling Arrays.sort
because it is a general purpose method that is know to be efficient only
for arrays of at least medium size. In most cases, when the array is
small one falls back to a non-recursive method like a very simple
insertion sort, which is faster for smaller arrays. In the select
operation, we know we have small sub-arrays at the call point. Going
back to the former insertionSort would recover the good behavior for
small arrays, but would in fact not be sufficient to really implement a
NaNStrategy.FIXED. I guess it would be simple to make it behave like
NaNStrategy.MAXIMAL but I did not try yet.

My point here is rather: can we really and should we really implement
NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
store the original position of the NaNs. It is quite cumbersome.

I wonder what is the use case for NaNStrategy.FIXED at all.

Going further and looking at the use cases for other NaNStrategy values,
the NaNs are replaced by +/- infinity before sorting, which is OK for
ranking as we only rely on the indices, but we use the values themselves
in Percentile. So sometimes, we end up with computing interpolations
like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
x[k+1] has been changed to +infinity, we get +infinity, instead of the
NaN we should have retrieved without replacement. So here again, I'm not
sure we are doing the right thing.

What do you think?

best regards,
Luc

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by venkatesha murthy <ve...@gmail.com>.
On Wed, Jun 25, 2014 at 12:45 PM, Luc Maisonobe <lu...@spaceroots.org> wrote:

> Le 25/06/2014 06:10, venkatesha murthy a écrit :
> > On Wed, Jun 25, 2014 at 9:35 AM, venkatesha murthy <
> > venkateshamurthyts@gmail.com> wrote:
> > Can i put a patch for this change?
>
> I think we should still discuss about FIXED.
>
> If you want to set up u patch for the documentation of NaNStrategy
> explaining the replacement done in MAXIMAL/MINIMAL, please go ahead.
> You should put it in a new JIRA issue for clarity.
>
While documenting this handling of NaN serves right; i was wondering if we
could *also* add a transform method in NaNStrategy that transforms an array
with included NaNs to somthing that Maximal/Minimal/Removable wanted.
Please let know

>
> best regards,
> Luc
>
> >
> >>
> >> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org>
> >> wrote:
> >>
> >>> Hi Venkat,
> >>>
> >>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
> >>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org>
> >>> wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> While looking further in Percentile class for MATH-1120, I have found
> >>>>> another problem in the current implementation. NaNStrategy.FIXED
> should
> >>>>> leave the NaNs in place, but at the end of the KthSelector.select
> >>>>> method, a call to Arrays.sort moves the NaNs to the end of the small
> >>>>> sub-array. What is really implemented here is therefore closer to
> >>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
> >>>>> test cases because they use very short arrays, and we directly switch
> >>> to
> >>>>> this part of the select method.
> >>>> Are NaNs considered higher than +Inf ?
> >>>> If MAXIMAL represent replacing for +inf ; you need something to
> >>>> indicate beyond this for NaN.
> >>>
> >>> Well, we can just keep the NaN themselves and handled them
> >>> appropriately, hoping not to trash performances too much.
> >>>
> >> Agreed.
> >>
> >>>
> >>>> What is the test input you see an issue and what is the actual error
> >>>> you are seeing. Please share the test case.
> >>>
> >>> Just look at PercentileTest.testReplaceNanInRange(). The first check in
> >>> the test corresponds to a Percentile configuration at 50% percentile,
> >>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
> >>> 50% percentile is exactly one of the entries: the one at index 5 in the
> >>> final array.
> >>>
> >>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
> >>> for the NaNStrategy.FIXED setting, it should not be modified at all in
> >>> the selection algorithm and the result for 50% should be the first NaN
> >>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
> >>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
> >>> the result is 5.
> >>>
> >>> Agreed. just verified by putting back the earlier insertionSort
> function.
> >>
> >>
> >>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
> >>> result Double.POSITIVE_INFINITY instead of Double.NaN.
> >>>
> >>> If we agree to leave FIXED as unchanged behaviour with your
> insertionSort
> >> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation
> of
> >> NaN to +/-Inf
> >>
> >>>  >>
> >>>>> When I implemented the method, I explicitly avoided calling
> Arrays.sort
> >>>>> because it is a general purpose method that is know to be efficient
> >>> only
> >>>>> for arrays of at least medium size. In most cases, when the array is
> >>>>> small one falls back to a non-recursive method like a very simple
> >>>>> insertion sort, which is faster for smaller arrays.
> >>>>
> >>>> Please help me understand here; even java primitive Arrays.sort does
> >>>> an insertion sort for less than 7 elements
> >>>> (Refer sort1(double x[], int off, int len))
> >>>> So what is it that the custom insertion sort that does differently or
> >>>> is advantageous. Is it the value 15 elements?
> >>>
> >>> I don't see a reference to 7 elements, neither in the Java6 nor in the
> >>> Java 7 doc
> >>
> >> Please take a look at the sort1 method where there is a first block in
> the
> >> code which clearly mentions len < 7
> >>     /**
> >>      * Sorts the specified sub-array of doubles into ascending order.
> >>      */
> >>     private static void sort1(double x[], int off, int len) {
> >>     // Insertion sort on smallest arrays
> >>     if (len < 7) {
> >>         for (int i=off; i<len+off; i++)
> >>         for (int j=i; j>off && x[j-1]>x[j]; j--)
> >>             swap(x, j, j-1);
> >>         return;
> >>     }
> >>     :
> >>     :
> >>     : code continues for the else part
> >>
> >> Also the grepcode url
> >> <
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29
> >
> >> indicates the same
> >>
> >> (and in any case the doc explicitly states the algorithms
> >>> explained are only implementation notes and are not part of the
> >>> specification).
> >>>
> >> Yes its a part of comments anyways.
> >>
> >>>
> >>> However, the most important part for now is the fact that we control it
> >>> and may be able to implement different NaN strategies. What we have
> >>> currently fails.
> >>>
> >>> I agree on this  and hence here is my take:
> >> Leave FIXED as-is and use the earlier insertionSort code (just change
> the
> >> name to sort rather than hardcoding it as insertionsort) to handle the
> case
> >> you were mentioning
> >> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way
> we
> >> have covered both Inf and nan cases.
> >> Use REMOVED as default for all Percentile Estimation Types. (mostly
> >> influenced by R here perhaps)
> >>
> >> best regards,
> >>> Luc
> >>>
> >>>>
> >>>>> In the select
> >>>>> operation, we know we have small sub-arrays at the call point. Going
> >>>>> back to the former insertionSort would recover the good behavior for
> >>>>> small arrays, but would in fact not be sufficient to really
> implement a
> >>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave like
> >>>>> NaNStrategy.MAXIMAL but I did not try yet.
> >>>>>
> >>>>> My point here is rather: can we really and should we really implement
> >>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
> >>>>> store the original position of the NaNs. It is quite cumbersome.
> >>>>>
> >>>>> I wonder what is the use case for NaNStrategy.FIXED at all.
> >>>>>
> >>>>> Going further and looking at the use cases for other NaNStrategy
> >>> values,
> >>>>> the NaNs are replaced by +/- infinity before sorting, which is OK for
> >>>>> ranking as we only rely on the indices, but we use the values
> >>> themselves
> >>>>> in Percentile. So sometimes, we end up with computing interpolations
> >>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
> >>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of
> the
> >>>>> NaN we should have retrieved without replacement. So here again, I'm
> >>> not
> >>>>> sure we are doing the right thing.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>> best regards,
> >>>>> Luc
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> >>>
> >>
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] use case for NaNStrategy.FIXED?

Posted by Luc Maisonobe <lu...@spaceroots.org>.
Le 25/06/2014 06:10, venkatesha murthy a écrit :
> On Wed, Jun 25, 2014 at 9:35 AM, venkatesha murthy <
> venkateshamurthyts@gmail.com> wrote:
> Can i put a patch for this change?

I think we should still discuss about FIXED.

If you want to set up u patch for the documentation of NaNStrategy
explaining the replacement done in MAXIMAL/MINIMAL, please go ahead.
You should put it in a new JIRA issue for clarity.

best regards,
Luc

> 
>>
>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org>
>> wrote:
>>
>>> Hi Venkat,
>>>
>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org>
>>> wrote:
>>>>> Hi all,
>>>>>
>>>>> While looking further in Percentile class for MATH-1120, I have found
>>>>> another problem in the current implementation. NaNStrategy.FIXED should
>>>>> leave the NaNs in place, but at the end of the KthSelector.select
>>>>> method, a call to Arrays.sort moves the NaNs to the end of the small
>>>>> sub-array. What is really implemented here is therefore closer to
>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
>>>>> test cases because they use very short arrays, and we directly switch
>>> to
>>>>> this part of the select method.
>>>> Are NaNs considered higher than +Inf ?
>>>> If MAXIMAL represent replacing for +inf ; you need something to
>>>> indicate beyond this for NaN.
>>>
>>> Well, we can just keep the NaN themselves and handled them
>>> appropriately, hoping not to trash performances too much.
>>>
>> Agreed.
>>
>>>
>>>> What is the test input you see an issue and what is the actual error
>>>> you are seeing. Please share the test case.
>>>
>>> Just look at PercentileTest.testReplaceNanInRange(). The first check in
>>> the test corresponds to a Percentile configuration at 50% percentile,
>>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
>>> 50% percentile is exactly one of the entries: the one at index 5 in the
>>> final array.
>>>
>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
>>> for the NaNStrategy.FIXED setting, it should not be modified at all in
>>> the selection algorithm and the result for 50% should be the first NaN
>>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
>>> the result is 5.
>>>
>>> Agreed. just verified by putting back the earlier insertionSort function.
>>
>>
>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>
>>> If we agree to leave FIXED as unchanged behaviour with your insertionSort
>> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation of
>> NaN to +/-Inf
>>
>>>  >>
>>>>> When I implemented the method, I explicitly avoided calling Arrays.sort
>>>>> because it is a general purpose method that is know to be efficient
>>> only
>>>>> for arrays of at least medium size. In most cases, when the array is
>>>>> small one falls back to a non-recursive method like a very simple
>>>>> insertion sort, which is faster for smaller arrays.
>>>>
>>>> Please help me understand here; even java primitive Arrays.sort does
>>>> an insertion sort for less than 7 elements
>>>> (Refer sort1(double x[], int off, int len))
>>>> So what is it that the custom insertion sort that does differently or
>>>> is advantageous. Is it the value 15 elements?
>>>
>>> I don't see a reference to 7 elements, neither in the Java6 nor in the
>>> Java 7 doc
>>
>> Please take a look at the sort1 method where there is a first block in the
>> code which clearly mentions len < 7
>>     /**
>>      * Sorts the specified sub-array of doubles into ascending order.
>>      */
>>     private static void sort1(double x[], int off, int len) {
>>     // Insertion sort on smallest arrays
>>     if (len < 7) {
>>         for (int i=off; i<len+off; i++)
>>         for (int j=i; j>off && x[j-1]>x[j]; j--)
>>             swap(x, j, j-1);
>>         return;
>>     }
>>     :
>>     :
>>     : code continues for the else part
>>
>> Also the grepcode url
>> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29>
>> indicates the same
>>
>> (and in any case the doc explicitly states the algorithms
>>> explained are only implementation notes and are not part of the
>>> specification).
>>>
>> Yes its a part of comments anyways.
>>
>>>
>>> However, the most important part for now is the fact that we control it
>>> and may be able to implement different NaN strategies. What we have
>>> currently fails.
>>>
>>> I agree on this  and hence here is my take:
>> Leave FIXED as-is and use the earlier insertionSort code (just change the
>> name to sort rather than hardcoding it as insertionsort) to handle the case
>> you were mentioning
>> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way we
>> have covered both Inf and nan cases.
>> Use REMOVED as default for all Percentile Estimation Types. (mostly
>> influenced by R here perhaps)
>>
>> best regards,
>>> Luc
>>>
>>>>
>>>>> In the select
>>>>> operation, we know we have small sub-arrays at the call point. Going
>>>>> back to the former insertionSort would recover the good behavior for
>>>>> small arrays, but would in fact not be sufficient to really implement a
>>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave like
>>>>> NaNStrategy.MAXIMAL but I did not try yet.
>>>>>
>>>>> My point here is rather: can we really and should we really implement
>>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
>>>>> store the original position of the NaNs. It is quite cumbersome.
>>>>>
>>>>> I wonder what is the use case for NaNStrategy.FIXED at all.
>>>>>
>>>>> Going further and looking at the use cases for other NaNStrategy
>>> values,
>>>>> the NaNs are replaced by +/- infinity before sorting, which is OK for
>>>>> ranking as we only rely on the indices, but we use the values
>>> themselves
>>>>> in Percentile. So sometimes, we end up with computing interpolations
>>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
>>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of the
>>>>> NaN we should have retrieved without replacement. So here again, I'm
>>> not
>>>>> sure we are doing the right thing.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> best regards,
>>>>> Luc
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by venkatesha murthy <ve...@gmail.com>.
On Wed, Jun 25, 2014 at 9:35 AM, venkatesha murthy <
venkateshamurthyts@gmail.com> wrote:
Can i put a patch for this change?

>
> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org>
> wrote:
>
>> Hi Venkat,
>>
>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>> > On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org>
>> wrote:
>> >> Hi all,
>> >>
>> >> While looking further in Percentile class for MATH-1120, I have found
>> >> another problem in the current implementation. NaNStrategy.FIXED should
>> >> leave the NaNs in place, but at the end of the KthSelector.select
>> >> method, a call to Arrays.sort moves the NaNs to the end of the small
>> >> sub-array. What is really implemented here is therefore closer to
>> >> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
>> >> test cases because they use very short arrays, and we directly switch
>> to
>> >> this part of the select method.
>> > Are NaNs considered higher than +Inf ?
>> > If MAXIMAL represent replacing for +inf ; you need something to
>> > indicate beyond this for NaN.
>>
>> Well, we can just keep the NaN themselves and handled them
>> appropriately, hoping not to trash performances too much.
>>
> Agreed.
>
>>
>> > What is the test input you see an issue and what is the actual error
>> > you are seeing. Please share the test case.
>>
>> Just look at PercentileTest.testReplaceNanInRange(). The first check in
>> the test corresponds to a Percentile configuration at 50% percentile,
>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
>> 50% percentile is exactly one of the entries: the one at index 5 in the
>> final array.
>>
>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
>> for the NaNStrategy.FIXED setting, it should not be modified at all in
>> the selection algorithm and the result for 50% should be the first NaN
>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
>> the result is 5.
>>
>> Agreed. just verified by putting back the earlier insertionSort function.
>
>
>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>
>> If we agree to leave FIXED as unchanged behaviour with your insertionSort
> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation of
> NaN to +/-Inf
>
>>  >>
>> >> When I implemented the method, I explicitly avoided calling Arrays.sort
>> >> because it is a general purpose method that is know to be efficient
>> only
>> >> for arrays of at least medium size. In most cases, when the array is
>> >> small one falls back to a non-recursive method like a very simple
>> >> insertion sort, which is faster for smaller arrays.
>> >
>> > Please help me understand here; even java primitive Arrays.sort does
>> > an insertion sort for less than 7 elements
>> > (Refer sort1(double x[], int off, int len))
>> > So what is it that the custom insertion sort that does differently or
>> > is advantageous. Is it the value 15 elements?
>>
>> I don't see a reference to 7 elements, neither in the Java6 nor in the
>> Java 7 doc
>
> Please take a look at the sort1 method where there is a first block in the
> code which clearly mentions len < 7
>     /**
>      * Sorts the specified sub-array of doubles into ascending order.
>      */
>     private static void sort1(double x[], int off, int len) {
>     // Insertion sort on smallest arrays
>     if (len < 7) {
>         for (int i=off; i<len+off; i++)
>         for (int j=i; j>off && x[j-1]>x[j]; j--)
>             swap(x, j, j-1);
>         return;
>     }
>     :
>     :
>     : code continues for the else part
>
> Also the grepcode url
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29>
> indicates the same
>
> (and in any case the doc explicitly states the algorithms
>> explained are only implementation notes and are not part of the
>> specification).
>>
> Yes its a part of comments anyways.
>
>>
>> However, the most important part for now is the fact that we control it
>> and may be able to implement different NaN strategies. What we have
>> currently fails.
>>
>> I agree on this  and hence here is my take:
> Leave FIXED as-is and use the earlier insertionSort code (just change the
> name to sort rather than hardcoding it as insertionsort) to handle the case
> you were mentioning
> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way we
> have covered both Inf and nan cases.
> Use REMOVED as default for all Percentile Estimation Types. (mostly
> influenced by R here perhaps)
>
> best regards,
>> Luc
>>
>> >
>> >> In the select
>> >> operation, we know we have small sub-arrays at the call point. Going
>> >> back to the former insertionSort would recover the good behavior for
>> >> small arrays, but would in fact not be sufficient to really implement a
>> >> NaNStrategy.FIXED. I guess it would be simple to make it behave like
>> >> NaNStrategy.MAXIMAL but I did not try yet.
>> >>
>> >> My point here is rather: can we really and should we really implement
>> >> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
>> >> store the original position of the NaNs. It is quite cumbersome.
>> >>
>> >> I wonder what is the use case for NaNStrategy.FIXED at all.
>> >>
>> >> Going further and looking at the use cases for other NaNStrategy
>> values,
>> >> the NaNs are replaced by +/- infinity before sorting, which is OK for
>> >> ranking as we only rely on the indices, but we use the values
>> themselves
>> >> in Percentile. So sometimes, we end up with computing interpolations
>> >> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
>> >> x[k+1] has been changed to +infinity, we get +infinity, instead of the
>> >> NaN we should have retrieved without replacement. So here again, I'm
>> not
>> >> sure we are doing the right thing.
>> >>
>> >> What do you think?
>> >>
>> >> best regards,
>> >> Luc
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> >> For additional commands, e-mail: dev-help@commons.apache.org
>> >>
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> > For additional commands, e-mail: dev-help@commons.apache.org
>> >
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>

Re: [math] use case for NaNStrategy.FIXED?

Posted by venkatesha murthy <ve...@gmail.com>.
On Sun, Jun 29, 2014 at 10:55 PM, Phil Steitz <ph...@gmail.com> wrote:

> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
> > On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz <ph...@gmail.com>
> wrote:
> >
> >> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
> >>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
> >>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org>
> >> wrote:
> >>>>> Hi Venkat,
> >>>>>
> >>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
> >>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <luc@spaceroots.org
> >
> >>>>> wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> While looking further in Percentile class for MATH-1120, I have
> found
> >>>>>>> another problem in the current implementation. NaNStrategy.FIXED
> >> should
> >>>>>>> leave the NaNs in place, but at the end of the KthSelector.select
> >>>>>>> method, a call to Arrays.sort moves the NaNs to the end of the
> small
> >>>>>>> sub-array. What is really implemented here is therefore closer to
> >>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in
> the
> >>>>>>> test cases because they use very short arrays, and we directly
> >> switch to
> >>>>>>> this part of the select method.
> >>>>>> Are NaNs considered higher than +Inf ?
> >>>>>> If MAXIMAL represent replacing for +inf ; you need something to
> >>>>>> indicate beyond this for NaN.
> >>>>> Well, we can just keep the NaN themselves and handled them
> >>>>> appropriately, hoping not to trash performances too much.
> >>>>>
> >>>> Agreed.
> >>>>
> >>>>>> What is the test input you see an issue and what is the actual error
> >>>>>> you are seeing. Please share the test case.
> >>>>> Just look at PercentileTest.testReplaceNanInRange(). The first check
> in
> >>>>> the test corresponds to a Percentile configuration at 50% percentile,
> >>>>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
> >>>>> 50% percentile is exactly one of the entries: the one at index 5 in
> the
> >>>>> final array.
> >>>>>
> >>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
> >>>>> for the NaNStrategy.FIXED setting, it should not be modified at all
> in
> >>>>> the selection algorithm and the result for 50% should be the first
> NaN
> >>>>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
> >>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
> >>>>> the result is 5.
> >>>>>
> >>>>> Agreed. just verified by putting back the earlier insertionSort
> >> function.
> >>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
> >>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
> >>>>>
> >>>>> If we agree to leave FIXED as unchanged behaviour with your
> >> insertionSort
> >>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
> transformation
> >> of
> >>>> NaN to +/-Inf
> >>> I'm fine with it, as long as we clearly state it in the NaNStrategy
> >>> documentation, saying somethig like:
> >>>
> >>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* by
> >>>  +/- infinity, hence the results will be computed as if infinities were
> >>>  there in the first place.
> >> -1 - this is mixing concerns.  NaNStrategy exists for one specific
> >> purpose - to turn extend the ordering on doubles a total ordering.
> >> Strategies prescribe only how NaNs are to be treated in the
> >> ordering.  Side effects on the input array don't make sense in this
> >> context.  The use case for which this class was created was rank
> >> transformations.  Returning infinities in rank transformations would
> >> blow up computations in these classes.  If a floating point
> >> computations need to transform NaNs, the specific stats / other
> >> classes that do this transformation should perform and document the
> >> behavior.
> >>
> >> Phil
> >>
> > OK. Agreed realized it later.  Hence i have not touched NaNStrategy in my
> > patch(MATH-1132) . Now i have added a separate transformer to transform
> NaNs
> > but it uses NanStrategy.  Please let me know on this as this
> trasnformation
> > itself
> > can be used in different places.
>
> Honestly, I don't see the value of this.  I would be more favorable
> to an addition to MathArrays supporting NaN (or infinity) filtering
> / transformation.  Several years back we made the decision to just
> let NaNs propagate in computations, which basically means that if
> you feed NaNs to [math] you are going to get NaNs out in results.
> If we do get into the business of filtering NaNs (or infinities for
> that matter), I think it is probably best to do that in MathArrays
> or somesuch - i.e., don't decorate APIs with NaN or
> infinity-handling handling descriptions.  That would be a huge task
> and would likely end up bleeding implementation detail into the
> APIs.  As I said above, NaNStrategy has to exist for rank
> transformations to be well-defined.  NaN-handling in floating point
> computations is defined and as long as we fully document what our
> algorithms do, I think it is fair enough to "let the NaNs fall where
> they may" - which basically means users need to do the filtering /
> transformation themselves.
>
> Phil
>
OK. i see the point.

> >
> >>>>>>> When I implemented the method, I explicitly avoided calling
> >> Arrays.sort
> >>>>>>> because it is a general purpose method that is know to be efficient
> >> only
> >>>>>>> for arrays of at least medium size. In most cases, when the array
> is
> >>>>>>> small one falls back to a non-recursive method like a very simple
> >>>>>>> insertion sort, which is faster for smaller arrays.
> >>>>>> Please help me understand here; even java primitive Arrays.sort does
> >>>>>> an insertion sort for less than 7 elements
> >>>>>> (Refer sort1(double x[], int off, int len))
> >>>>>> So what is it that the custom insertion sort that does differently
> or
> >>>>>> is advantageous. Is it the value 15 elements?
> >>>>> I don't see a reference to 7 elements, neither in the Java6 nor in
> the
> >>>>> Java 7 doc
> >>>> Please take a look at the sort1 method where there is a first block in
> >> the
> >>>> code which clearly mentions len < 7
> >>>>     /**
> >>>>      * Sorts the specified sub-array of doubles into ascending order.
> >>>>      */
> >>>>     private static void sort1(double x[], int off, int len) {
> >>>>     // Insertion sort on smallest arrays
> >>>>     if (len < 7) {
> >>>>         for (int i=off; i<len+off; i++)
> >>>>         for (int j=i; j>off && x[j-1]>x[j]; j--)
> >>>>             swap(x, j, j-1);
> >>>>         return;
> >>>>     }
> >>>>     :
> >>>>     :
> >>>>     : code continues for the else part
> >>>>
> >>>> Also the grepcode url
> >>>> <
> >>
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29
> >>>> indicates the same
> >>> OK, you were refering to a specific implementation. I was thinking in
> >>> the general case.
> >>>
> >>>> (and in any case the doc explicitly states the algorithms
> >>>>> explained are only implementation notes and are not part of the
> >>>>> specification).
> >>>>>
> >>>> Yes its a part of comments anyways.
> >>>>
> >>>>> However, the most important part for now is the fact that we control
> it
> >>>>> and may be able to implement different NaN strategies. What we have
> >>>>> currently fails.
> >>>>>
> >>>>> I agree on this  and hence here is my take:
> >>>> Leave FIXED as-is and use the earlier insertionSort code (just change
> >> the
> >>>> name to sort rather than hardcoding it as insertionsort) to handle the
> >> case
> >>>> you were mentioning
> >>> If we leave FIXED it as it is done know, even with insertionSort we do
> >>> not really control what happens. It is deterministic as the pivoting
> and
> >>> if/else structure is well defined (both in the selection part and in
> the
> >>> final sort for sub-arrays), but it is untrackable so we can't document
> >> it.
> >>>> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way
> >> we
> >>>> have covered both Inf and nan cases.
> >>> OK.
> >>>
> >>>> Use REMOVED as default for all Percentile Estimation Types. (mostly
> >>>> influenced by R here perhaps)
> >>> This would be fine with me. I have concerns with FIXED only, the other
> >>> strategies all seem quite realistic.
> >>>
> >>> Does anybody else has an advice for FIXED? What are the use case?
> >>>
> >>> best regards,
> >>> Luc
> >>>
> >>>> best regards,
> >>>>> Luc
> >>>>>
> >>>>>>> In the select
> >>>>>>> operation, we know we have small sub-arrays at the call point.
> Going
> >>>>>>> back to the former insertionSort would recover the good behavior
> for
> >>>>>>> small arrays, but would in fact not be sufficient to really
> >> implement a
> >>>>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave
> like
> >>>>>>> NaNStrategy.MAXIMAL but I did not try yet.
> >>>>>>>
> >>>>>>> My point here is rather: can we really and should we really
> implement
> >>>>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
> >>>>>>> store the original position of the NaNs. It is quite cumbersome.
> >>>>>>>
> >>>>>>> I wonder what is the use case for NaNStrategy.FIXED at all.
> >>>>>>>
> >>>>>>> Going further and looking at the use cases for other NaNStrategy
> >> values,
> >>>>>>> the NaNs are replaced by +/- infinity before sorting, which is OK
> for
> >>>>>>> ranking as we only rely on the indices, but we use the values
> >> themselves
> >>>>>>> in Percentile. So sometimes, we end up with computing
> interpolations
> >>>>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number
> and
> >>>>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of
> >> the
> >>>>>>> NaN we should have retrieved without replacement. So here again,
> I'm
> >> not
> >>>>>>> sure we are doing the right thing.
> >>>>>>>
> >>>>>>> What do you think?
> >>>>>>>
> >>>>>>> best regards,
> >>>>>>> Luc
> >>>>>>>
> >>>>>>>
> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>>>
> >>>>>>
> ---------------------------------------------------------------------
> >>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>
> >>>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> >>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] use case for NaNStrategy.FIXED?

Posted by Gilles <gi...@harfang.homelinux.org>.
On Mon, 30 Jun 2014 18:48:02 +0530, venkatesha murthy wrote:
> On Mon, Jun 30, 2014 at 4:11 AM, Gilles 
> <gi...@harfang.homelinux.org>
> wrote:
>
>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>
>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>
>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>>>
>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>>>
>>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>>> <ph...@gmail.com> wrote:
>>>>>>
>>>>>>  On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>>>
>>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>
>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>
>>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Venkat,
>>>>>>>>>>
>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> While looking further in Percentile class for MATH-1120, I
>>>>>>>>>>>> have found
>>>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>>>
>>>>>>>>>>> should
>>>>>>>
>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>>>> KthSelector.select
>>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of
>>>>>>>>>>>> the small
>>>>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>>>>> closer to
>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
>>>>>>>>>>>> occur in the
>>>>>>>>>>>> test cases because they use very short arrays, and we
>>>>>>>>>>>> directly
>>>>>>>>>>>>
>>>>>>>>>>> switch to
>>>>>>>
>>>>>>>> this part of the select method.
>>>>>>>>>>>>
>>>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>>>> something to
>>>>>>>>>>> indicate beyond this for NaN.
>>>>>>>>>>>
>>>>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>>>>
>>>>>>>>>>  Agreed.
>>>>>>>>>
>>>>>>>>>  What is the test input you see an issue and what is the
>>>>>>>>>>> actual error
>>>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>>>>>
>>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>>>>> first check in
>>>>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>>>>> percentile,
>>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>>>> entries, so the
>>>>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>>>>> index 5 in the
>>>>>>>>>> final array.
>>>>>>>>>>
>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>>>>> NaN, 8 }. So
>>>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>>>> at all in
>>>>>>>>>> the selection algorithm and the result for 50% should be the
>>>>>>>>>> first NaN
>>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>>>> we *do*
>>>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>>>>> NaN }, so
>>>>>>>>>> the result is 5.
>>>>>>>>>>
>>>>>>>>>> Agreed. just verified by putting back the earlier 
>>>>>>>>>> insertionSort
>>>>>>>>>>
>>>>>>>>> function.
>>>>>>>
>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>>>>> get as a
>>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>>>
>>>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>>>>>>>>
>>>>>>>>> insertionSort
>>>>>>>
>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>>>> transformation
>>>>>>>>>
>>>>>>>> of
>>>>>>>
>>>>>>>> NaN to +/-Inf
>>>>>>>>>
>>>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>>>> NaNStrategy
>>>>>>>> documentation, saying somethig like:
>>>>>>>>
>>>>>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>>>> *replaced* by
>>>>>>>>  +/- infinity, hence the results will be computed as if
>>>>>>>> infinities were
>>>>>>>>  there in the first place.
>>>>>>>>
>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one 
>>>>>>> specific
>>>>>>> purpose - to turn extend the ordering on doubles a total 
>>>>>>> ordering.
>>>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>>>> ordering.  Side effects on the input array don't make sense in
>>>>>>> this
>>>>>>> context.  The use case for which this class was created was 
>>>>>>> rank
>>>>>>> transformations.  Returning infinities in rank transformations
>>>>>>> would
>>>>>>> blow up computations in these classes.  If a floating point
>>>>>>> computations need to transform NaNs, the specific stats / other
>>>>>>> classes that do this transformation should perform and document
>>>>>>> the
>>>>>>> behavior.
>>>>>>>
>>>>>>> Phil
>>>>>>>
>>>>>>>  OK. Agreed realized it later.  Hence i have not touched
>>>>>> NaNStrategy in my
>>>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>>>> transform NaNs
>>>>>> but it uses NanStrategy.  Please let me know on this as this
>>>>>> trasnformation
>>>>>> itself
>>>>>> can be used in different places.
>>>>>>
>>>>>
>>>>> Honestly, I don't see the value of this.  I would be more 
>>>>> favorable
>>>>> to an addition to MathArrays supporting NaN (or infinity) 
>>>>> filtering
>>>>> / transformation.  Several years back we made the decision to 
>>>>> just
>>>>> let NaNs propagate in computations, which basically means that if
>>>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>>>> If we do get into the business of filtering NaNs (or infinities 
>>>>> for
>>>>> that matter), I think it is probably best to do that in 
>>>>> MathArrays
>>>>> or somesuch - i.e., don't decorate APIs with NaN or
>>>>> infinity-handling handling descriptions.  That would be a huge 
>>>>> task
>>>>> and would likely end up bleeding implementation detail into the
>>>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>>>> transformations to be well-defined.  NaN-handling in floating 
>>>>> point
>>>>> computations is defined and as long as we fully document what our
>>>>> algorithms do, I think it is fair enough to "let the NaNs fall 
>>>>> where
>>>>> they may" - which basically means users need to do the filtering 
>>>>> /
>>>>> transformation themselves.
>>>>>
>>>>
>>>> So, do I understand correctly that it's OK to add the "remove" and
>>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>>>> Or does this thread conclude that we don't have a use for this 
>>>> within
>>>> Commons Math?
>>>>
>>>
>>> You raise a good point here.  Personally, I don't see an internal
>>> use and if we are sticking with MathArrays including only things we
>>> have internal use for, I would say no, don't include them.
>>>
>>
> I feel IMO, replace and remove are too generic to be holed in 
> Percentile
> and see that they can have uses similar to copy or equals or any 
> other
> MathArrays function..
> (if not MathArrays; then can DoubleArray be an alternate home for 
> these?).

[That's not the point under discussion. Currently the advantage of 
putting
those utilities there is because we can modify or remove non-public 
code
without impacting users (backward compatiblity).]

The question is primarily whether we want to support those methods (in
the public API) even if they are not used _at all_ within Commons Math
(as would be the case if the answer to the last question below is in
the affirmative).

If the feature is deemed generally useful (to be confirmed by a 
motivated
feature request), the question would be how to best provide it. This 
comes
back to your proposal of a "Transformer" interface.


Regards,
Gilles


>>
>> A third way to look at it would be that since some algorithms 
>> require
>> being passed "clean" data (e.g. without NaNs), we could provide some
>> simple means to do so. A user could then write, e.g.
>> ---
>> double[] data = { 1, Double.NaN, 3, 6, 5 };
>>
>> Percentile p = new Percentile();
>> double q = p.evaluate(MathArrays.replace(data, Double.NaN,
>> Double.POSITIVE_INFINITY), 0.1);
>> ---
>>
>> Then, if we provide that utility, is it still necessary to 
>> complicate the
>> Percentile
>> code with a NaNStrategy parameter?
>>
>>
>> Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by venkatesha murthy <ve...@gmail.com>.
On Mon, Jun 30, 2014 at 4:11 AM, Gilles <gi...@harfang.homelinux.org>
wrote:

> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>
>> On 6/29/14, 2:30 PM, Gilles wrote:
>>
>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>>
>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>>
>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>> <ph...@gmail.com> wrote:
>>>>>
>>>>>  On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>>
>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>
>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>
>>>>>>> wrote:
>>>>>>
>>>>>>> Hi Venkat,
>>>>>>>>>
>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>>
>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> While looking further in Percentile class for MATH-1120, I
>>>>>>>>>>> have found
>>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>>
>>>>>>>>>> should
>>>>>>
>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>>> KthSelector.select
>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of
>>>>>>>>>>> the small
>>>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>>>> closer to
>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
>>>>>>>>>>> occur in the
>>>>>>>>>>> test cases because they use very short arrays, and we
>>>>>>>>>>> directly
>>>>>>>>>>>
>>>>>>>>>> switch to
>>>>>>
>>>>>>> this part of the select method.
>>>>>>>>>>>
>>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>>> something to
>>>>>>>>>> indicate beyond this for NaN.
>>>>>>>>>>
>>>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>>>
>>>>>>>>>  Agreed.
>>>>>>>>
>>>>>>>>  What is the test input you see an issue and what is the
>>>>>>>>>> actual error
>>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>>>>
>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>>>> first check in
>>>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>>>> percentile,
>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>>> entries, so the
>>>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>>>> index 5 in the
>>>>>>>>> final array.
>>>>>>>>>
>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>>>> NaN, 8 }. So
>>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>>> at all in
>>>>>>>>> the selection algorithm and the result for 50% should be the
>>>>>>>>> first NaN
>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>>> we *do*
>>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>>>> NaN }, so
>>>>>>>>> the result is 5.
>>>>>>>>>
>>>>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>>>>>>>>
>>>>>>>> function.
>>>>>>
>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>>>> get as a
>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>>
>>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>>>>>>>
>>>>>>>> insertionSort
>>>>>>
>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>>> transformation
>>>>>>>>
>>>>>>> of
>>>>>>
>>>>>>> NaN to +/-Inf
>>>>>>>>
>>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>>> NaNStrategy
>>>>>>> documentation, saying somethig like:
>>>>>>>
>>>>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>>> *replaced* by
>>>>>>>  +/- infinity, hence the results will be computed as if
>>>>>>> infinities were
>>>>>>>  there in the first place.
>>>>>>>
>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>>>>> purpose - to turn extend the ordering on doubles a total ordering.
>>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>>> ordering.  Side effects on the input array don't make sense in
>>>>>> this
>>>>>> context.  The use case for which this class was created was rank
>>>>>> transformations.  Returning infinities in rank transformations
>>>>>> would
>>>>>> blow up computations in these classes.  If a floating point
>>>>>> computations need to transform NaNs, the specific stats / other
>>>>>> classes that do this transformation should perform and document
>>>>>> the
>>>>>> behavior.
>>>>>>
>>>>>> Phil
>>>>>>
>>>>>>  OK. Agreed realized it later.  Hence i have not touched
>>>>> NaNStrategy in my
>>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>>> transform NaNs
>>>>> but it uses NanStrategy.  Please let me know on this as this
>>>>> trasnformation
>>>>> itself
>>>>> can be used in different places.
>>>>>
>>>>
>>>> Honestly, I don't see the value of this.  I would be more favorable
>>>> to an addition to MathArrays supporting NaN (or infinity) filtering
>>>> / transformation.  Several years back we made the decision to just
>>>> let NaNs propagate in computations, which basically means that if
>>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>>> If we do get into the business of filtering NaNs (or infinities for
>>>> that matter), I think it is probably best to do that in MathArrays
>>>> or somesuch - i.e., don't decorate APIs with NaN or
>>>> infinity-handling handling descriptions.  That would be a huge task
>>>> and would likely end up bleeding implementation detail into the
>>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>>> transformations to be well-defined.  NaN-handling in floating point
>>>> computations is defined and as long as we fully document what our
>>>> algorithms do, I think it is fair enough to "let the NaNs fall where
>>>> they may" - which basically means users need to do the filtering /
>>>> transformation themselves.
>>>>
>>>
>>> So, do I understand correctly that it's OK to add the "remove" and
>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>>> Or does this thread conclude that we don't have a use for this within
>>> Commons Math?
>>>
>>
>> You raise a good point here.  Personally, I don't see an internal
>> use and if we are sticking with MathArrays including only things we
>> have internal use for, I would say no, don't include them.
>>
>
I feel IMO, replace and remove are too generic to be holed in Percentile
and see that they can have uses similar to copy or equals or any other
MathArrays function..
(if not MathArrays; then can DoubleArray be an alternate home for these?).

>
> A third way to look at it would be that since some algorithms require
> being passed "clean" data (e.g. without NaNs), we could provide some
> simple means to do so. A user could then write, e.g.
> ---
> double[] data = { 1, Double.NaN, 3, 6, 5 };
>
> Percentile p = new Percentile();
> double q = p.evaluate(MathArrays.replace(data, Double.NaN,
> Double.POSITIVE_INFINITY), 0.1);
> ---
>
> Then, if we provide that utility, is it still necessary to complicate the
> Percentile
> code with a NaNStrategy parameter?
>
>
> Gilles
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] use case for NaNStrategy.FIXED?

Posted by Phil Steitz <ph...@gmail.com>.
On 7/13/14, 2:40 AM, Luc Maisonobe wrote:
> Hi all,
>
> Le 01/07/2014 08:52, Luc Maisonobe a écrit :
>> Le 30/06/2014 22:00, Phil Steitz a écrit :
>>> On 6/30/14, 12:33 PM, Luc Maisonobe wrote:
>>>> Le 30/06/2014 19:15, Phil Steitz a écrit :
>>>>>> On Jun 30, 2014, at 9:59 AM, Gilles <gi...@harfang.homelinux.org>
>>>>>> wrote:
>>>>>>
>>>>>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>>>>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>>>>>> <gi...@harfang.homelinux.org> wrote:
>>>>>>>>
>>>>>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: 
>>>>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun,
>>>>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz 
>>>>>>>>>>>> <ph...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le
>>>>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit
>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc
>>>>>>>>>>>>>>>>> Maisonobe <lu...@spaceroots.org>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> While looking further in Percentile class
>>>>>>>>>>>>>>>>>> for MATH-1120, I have found another problem
>>>>>>>>>>>>>>>>>> in the current implementation. 
>>>>>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>> leave the NaNs in place, but at the end of
>>>>>>>>>>>>>>>>>> the KthSelector.select method, a call to
>>>>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of 
>>>>>>>>>>>>>>>>>> the small sub-array. What is really
>>>>>>>>>>>>>>>>>> implemented here is therefore closer to 
>>>>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
>>>>>>>>>>>>>>>>>> This always occur in the test cases because
>>>>>>>>>>>>>>>>>> they use very short arrays, and we 
>>>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>>> switch to
>>>>>>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If
>>>>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you
>>>>>>>>>>>>>>>>> need something to indicate beyond this for
>>>>>>>>>>>>>>>>> NaN.
>>>>>>>>>>>>>>>> Well, we can just keep the NaN themselves and
>>>>>>>>>>>>>>>> handled them appropriately, hoping not to trash
>>>>>>>>>>>>>>>> performances too much.
>>>>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What is the test input you see an issue and
>>>>>>>>>>>>>>>>> what is the actual error you are seeing.
>>>>>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>>>>>> Just look at
>>>>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The 
>>>>>>>>>>>>>>>> first check in the test corresponds to a
>>>>>>>>>>>>>>>> Percentile configuration at 50% percentile, and
>>>>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number
>>>>>>>>>>>>>>>> of entries, so the 50% percentile is exactly
>>>>>>>>>>>>>>>> one of the entries: the one at index 5 in the 
>>>>>>>>>>>>>>>> final array.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN,
>>>>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be
>>>>>>>>>>>>>>>> modified at all in the selection algorithm and
>>>>>>>>>>>>>>>> the result for 50% should be the first NaN of
>>>>>>>>>>>>>>>> the array, at index 5. In fact, due to the
>>>>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into  {
>>>>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the
>>>>>>>>>>>>>>>> result is 5.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Agreed. just verified by putting back the
>>>>>>>>>>>>>>>> earlier insertionSort
>>>>>>>>>>>>> function.
>>>>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile
>>>>>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of
>>>>>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If we agree to leave FIXED as unchanged
>>>>>>>>>>>>>>>> behaviour with your
>>>>>>>>>>>>> insertionSort
>>>>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be
>>>>>>>>>>>>>>> allowed for transformation
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> NaN to +/-Inf
>>>>>>>>>>>>>> I'm fine with it, as long as we clearly state it in
>>>>>>>>>>>>>> the NaNStrategy documentation, saying somethig
>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are
>>>>>>>>>>>>>> effecively *replaced* by +/- infinity, hence the
>>>>>>>>>>>>>> results will be computed as if infinities were 
>>>>>>>>>>>>>> there in the first place.
>>>>>>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for
>>>>>>>>>>>>> one specific purpose - to turn extend the ordering on
>>>>>>>>>>>>> doubles a total ordering. Strategies prescribe only
>>>>>>>>>>>>> how NaNs are to be treated in the ordering.  Side
>>>>>>>>>>>>> effects on the input array don't make sense in this 
>>>>>>>>>>>>> context.  The use case for which this class was
>>>>>>>>>>>>> created was rank transformations.  Returning
>>>>>>>>>>>>> infinities in rank transformations would blow up
>>>>>>>>>>>>> computations in these classes.  If a floating point 
>>>>>>>>>>>>> computations need to transform NaNs, the specific
>>>>>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>>>>>> should perform and document the behavior.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Phil
>>>>>>>>>>>> OK. Agreed realized it later.  Hence i have not
>>>>>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have
>>>>>>>>>>>> added a separate transformer to transform NaNs but it
>>>>>>>>>>>> uses NanStrategy.  Please let me know on this as this 
>>>>>>>>>>>> trasnformation itself can be used in different places.
>>>>>>>>>>> Honestly, I don't see the value of this.  I would be more
>>>>>>>>>>> favorable to an addition to MathArrays supporting NaN (or
>>>>>>>>>>> infinity) filtering / transformation.  Several years back
>>>>>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>>>>>> computations, which basically means that if you feed NaNs
>>>>>>>>>>> to [math] you are going to get NaNs out in results. If we
>>>>>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>>>>>> for that matter), I think it is probably best to do that
>>>>>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs
>>>>>>>>>>> with NaN or infinity-handling handling descriptions.
>>>>>>>>>>> That would be a huge task and would likely end up
>>>>>>>>>>> bleeding implementation detail into the APIs.  As I said
>>>>>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>>>>>> to be well-defined.  NaN-handling in floating point 
>>>>>>>>>>> computations is defined and as long as we fully document
>>>>>>>>>>> what our algorithms do, I think it is fair enough to "let
>>>>>>>>>>> the NaNs fall where they may" - which basically means
>>>>>>>>>>> users need to do the filtering / transformation
>>>>>>>>>>> themselves.
>>>>>>>>>> So, do I understand correctly that it's OK to add the
>>>>>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in
>>>>>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>>>>>> have a use for this within Commons Math?
>>>>>>>>> You raise a good point here.  Personally, I don't see an
>>>>>>>>> internal use and if we are sticking with MathArrays including
>>>>>>>>> only things we have internal use for, I would say no, don't
>>>>>>>>> include them.
>>>>>>>> A third way to look at it would be that since some algorithms
>>>>>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>>>>>> provide some simple means to do so. A user could then write,
>>>>>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>>>>>>
>>>>>>>> Percentile p = new Percentile(); double q =
>>>>>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>>>>>> Double.POSITIVE_INFINITY), 0.1); ---
>>>>>>>>
>>>>>>>> Then, if we provide that utility, is it still necessary to
>>>>>>>> complicate the Percentile code with a NaNStrategy parameter?
>>>>>>> I may be missing something, but it seems to me that percentile
>>>>>>> does need a NaNStrategy in the presence if NaNs like the other
>>>>>>> order statistics do (basically to define their position In the
>>>>>>> ordering).  It doesn't logically need more than that.  I would
>>>>>>> be ok defaulting the strategy or even returning NaN in the
>>>>>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>>>>>> of this can be documented.
>>>>>> I am _surely_ missing something because I don't figure out how NaNs
>>>>>> can be useful when computing those statistics... :-}
>>>>> Good point.  Basically the different strategies allow you to
>>>>> effectively ignore or work around them so stats can be well-defined.
>>>>> Without NaNStrategy, order statistics including NaNs are undefined.
>>>> The problem with Percentile is that the elements are both reordered
>>>> *and* used when interpolating between two values. Tor the rank methods,
>>>> only reordering was needed so the various NaNStrategies could be
>>>> implemented by replacing at will with +/- infinity and a classical sort
>>>> would do the ordering for us.
>>>>
>>>> Here, we do interpolate on the values, so if we first replace NaN with
>>>> +infinity to put them at the end, we may interpolate between a finite
>>>> value and +inf and have +inf as a result, whereas if the NaN "value" was
>>>> preserved and some magic used to sort the NaN, we would end up
>>>> interpolating with a NaN and get a NaN as a result. Setting up the magic
>>>> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
>>>> have a specific sort (in fact not a sort but a selection algorithm,
>>>> which is the core of the class). Setting up the magic for FIXED seems
>>>> more difficult.
>>>>
>>>> So I guess implementing NaNStrategies may be done in different ways
>>>> depending on how they will be used by the calling algorithm. I don't yet
>>>> know if we can set up a general method to be used for all statistics.
>>>> Looking only at Percentile, replacement with +/- infinity already seems
>>>> to not be an appropriate solution.
>>> I don't think we should try to redefine algorithms so they can
>>> handle NaNs.  When you do floating point computations with NaNs, you
>>> get NaNs.  That should be understood to be the contract.  For rank
>>> statistics, such as Percentile, we need to extend the ordering on
>>> doubles to include them in ordering for the algorithm to be
>>> well-defined.  If we subsequently do floating point computations
>>> with the NaNs, the result should be NaN.  I think we should either
>>> just simplify things and say we *always* return NaN whenever NaNs
>>> are present for Percentile, or do the following:
>>>
>>> 0) use the prescribed NaN strategy to determine the ordering - so
>>> NaNs end up either at the bottom, at the top or in their original
>>> position (FIXED).
>>> 1) if interpolation involves a NaN element, the result returned is NaN.
>> This is exactly what I say. If NaNs are in, NaNs should be out. As for
>> now we do replace them, we get infinities instead of NaNs. I would
>> prefer getting NaNs there as it is more consistent with the purpose of
>> NaNs in floating point encoding.
>>
>> So I think we agree: NaNs should not be changed, but they should be
>> reordered. Perhaps we could have something simple like NaNStrategy
>> having a method:
>>
>>   boolean shouldSwap(int i1, double d1, int i2, double i2)
>>
>> that would use both the index and values of the element to check if they
>> should be swapped or not. This method would be used both for traditional
>> sorting for ranking and in selection algorithms for percentile. I first
>> thought we should simply have NaN implement Comparator<Double> but this
>> would not work for FIXED, whereas the above method would work (simply
>> returning false as soon as there is a NaN, regardless of it being at a
>> lower or upper index).
>>
>> I will take a look at this option.
> This does not work as is. The problem is lack of transitivity. Suppose
> that we have three indices i < j < k, with a[i] > a[k] and a[j] = NaN.
>
> Then if during the sort/selection we compare elements at indices i and k
> first, we will swap them, which is correct behaviour regardless of the
> NaN strategy used. If we compare elemntes at indices i and j first and
> then elements at indices j and k and we use FIXED NaNStrategy, then
> we won't swap the two elements and the middle NaN would induced a
> non-globally sorted array. I think FIXED should act as a river flowing
> over rocks : the water (non-NaN elements) would move freely around and
> be sorted, and the rocks (NaN elements) would stay were they are but do
> not prevent water flow.

That is what FIXED is supposed to mean.  If the original array is
x_0, ..., x_n and x_i is NaN, then after applying the FIXED
NaNStrategy to sort the array, x_i will still be NaN.  If this is
inconvenient to support for Percentile, I would be fine with making
selection of this NaNStrategy illegal - i.e, throwing IAE if it is
specified.  As I said before, I would also be OK with making the
presence of NaN values in the input trigger (documented) IAE.

Phil
>
> I have found another way to handle this, using an indirection array that
> could "jump" over NaNs for FIXED strategy or push them to either end for
> MINIMAL or MAXIMAL strategy.
>
> I will try this in the next few days, as time permit.
>
> best regards,
> Luc
>
>> best regards,
>> Luc
>>
>>> Phil
>>>> best regards,
>>>> Luc
>>>>
>>>>> Phil
>>>>>
>>>>>>> I still don't see a within-math use for recoding methods; though
>>>>>>> I do see the value from a user perspective.  The problem is if we
>>>>>>> open this door, we risk MathArrays becoming a mini commons lang.
>>>>>> If Commons Lang already provides a feature, then we indeed don't
>>>>>> need to offer the same within CM (if there is no internal use). Can
>>>>>> someone confirm?
>>>>>>
>>>>>>
>>>>>> Otherwise we could define a new inner interface in MathArrays,
>>>>>> similar to "MathArrays.Function": --- public class MathArrays {
>>>>>>
>>>>>> // ...
>>>>>>
>>>>>> public interface Transformer { /** Transform the whole array */ 
>>>>>> double[] transform(double[] input); /** Transform the array in the
>>>>>> interval [from, to) and return the whole array */ double[]
>>>>>> transform(double[] input, int from, int to); /** Transform the
>>>>>> array in the interval [from, to) and return the transformed
>>>>>> sub-array */ double[] transformAndSplice(double[] input, int from,
>>>>>> int to); } } ---
>>>>>>
>>>>>> Then we can provide "replace" and "remove" transformers through
>>>>>> factory methods (example/untested code): --- public class
>>>>>> MathArrays {
>>>>>>
>>>>>> // ...
>>>>>>
>>>>>> public static Transformer createReplaceTransformer(final double
>>>>>> old, final double replace) { return new Transformer() { public
>>>>>> double[] transform(double[] input) { final int len = input.length; 
>>>>>> final double[] output = new double[len]; for (int i = 0; i < len;
>>>>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? 
>>>>>> replace : input[i]; } return output; }
>>>>>>
>>>>>> // etc. }; } } ---
>>>>>>
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>>
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>>
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Luc Maisonobe <lu...@spaceroots.org>.
Hi all,

Le 01/07/2014 08:52, Luc Maisonobe a écrit :
> Le 30/06/2014 22:00, Phil Steitz a écrit :
>> On 6/30/14, 12:33 PM, Luc Maisonobe wrote:
>>> Le 30/06/2014 19:15, Phil Steitz a écrit :
>>>>
>>>>> On Jun 30, 2014, at 9:59 AM, Gilles <gi...@harfang.homelinux.org>
>>>>> wrote:
>>>>>
>>>>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>>>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>>>>> <gi...@harfang.homelinux.org> wrote:
>>>>>>>
>>>>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: 
>>>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun,
>>>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz 
>>>>>>>>>>> <ph...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le
>>>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc
>>>>>>>>>>>>>>>> Maisonobe <lu...@spaceroots.org>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> While looking further in Percentile class
>>>>>>>>>>>>>>>>> for MATH-1120, I have found another problem
>>>>>>>>>>>>>>>>> in the current implementation. 
>>>>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> leave the NaNs in place, but at the end of
>>>>>>>>>>>>>>>>> the KthSelector.select method, a call to
>>>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of 
>>>>>>>>>>>>>>>>> the small sub-array. What is really
>>>>>>>>>>>>>>>>> implemented here is therefore closer to 
>>>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
>>>>>>>>>>>>>>>>> This always occur in the test cases because
>>>>>>>>>>>>>>>>> they use very short arrays, and we 
>>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>> switch to
>>>>>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If
>>>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you
>>>>>>>>>>>>>>>> need something to indicate beyond this for
>>>>>>>>>>>>>>>> NaN.
>>>>>>>>>>>>>>> Well, we can just keep the NaN themselves and
>>>>>>>>>>>>>>> handled them appropriately, hoping not to trash
>>>>>>>>>>>>>>> performances too much.
>>>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What is the test input you see an issue and
>>>>>>>>>>>>>>>> what is the actual error you are seeing.
>>>>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>>>>> Just look at
>>>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The 
>>>>>>>>>>>>>>> first check in the test corresponds to a
>>>>>>>>>>>>>>> Percentile configuration at 50% percentile, and
>>>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number
>>>>>>>>>>>>>>> of entries, so the 50% percentile is exactly
>>>>>>>>>>>>>>> one of the entries: the one at index 5 in the 
>>>>>>>>>>>>>>> final array.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN,
>>>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be
>>>>>>>>>>>>>>> modified at all in the selection algorithm and
>>>>>>>>>>>>>>> the result for 50% should be the first NaN of
>>>>>>>>>>>>>>> the array, at index 5. In fact, due to the
>>>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into  {
>>>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the
>>>>>>>>>>>>>>> result is 5.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Agreed. just verified by putting back the
>>>>>>>>>>>>>>> earlier insertionSort
>>>>>>>>>>>> function.
>>>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile
>>>>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of
>>>>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we agree to leave FIXED as unchanged
>>>>>>>>>>>>>>> behaviour with your
>>>>>>>>>>>> insertionSort
>>>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be
>>>>>>>>>>>>>> allowed for transformation
>>>>>>>>>>>> of
>>>>>>>>>>>>>> NaN to +/-Inf
>>>>>>>>>>>>> I'm fine with it, as long as we clearly state it in
>>>>>>>>>>>>> the NaNStrategy documentation, saying somethig
>>>>>>>>>>>>> like:
>>>>>>>>>>>>>
>>>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are
>>>>>>>>>>>>> effecively *replaced* by +/- infinity, hence the
>>>>>>>>>>>>> results will be computed as if infinities were 
>>>>>>>>>>>>> there in the first place.
>>>>>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for
>>>>>>>>>>>> one specific purpose - to turn extend the ordering on
>>>>>>>>>>>> doubles a total ordering. Strategies prescribe only
>>>>>>>>>>>> how NaNs are to be treated in the ordering.  Side
>>>>>>>>>>>> effects on the input array don't make sense in this 
>>>>>>>>>>>> context.  The use case for which this class was
>>>>>>>>>>>> created was rank transformations.  Returning
>>>>>>>>>>>> infinities in rank transformations would blow up
>>>>>>>>>>>> computations in these classes.  If a floating point 
>>>>>>>>>>>> computations need to transform NaNs, the specific
>>>>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>>>>> should perform and document the behavior.
>>>>>>>>>>>>
>>>>>>>>>>>> Phil
>>>>>>>>>>> OK. Agreed realized it later.  Hence i have not
>>>>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have
>>>>>>>>>>> added a separate transformer to transform NaNs but it
>>>>>>>>>>> uses NanStrategy.  Please let me know on this as this 
>>>>>>>>>>> trasnformation itself can be used in different places.
>>>>>>>>>> Honestly, I don't see the value of this.  I would be more
>>>>>>>>>> favorable to an addition to MathArrays supporting NaN (or
>>>>>>>>>> infinity) filtering / transformation.  Several years back
>>>>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>>>>> computations, which basically means that if you feed NaNs
>>>>>>>>>> to [math] you are going to get NaNs out in results. If we
>>>>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>>>>> for that matter), I think it is probably best to do that
>>>>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs
>>>>>>>>>> with NaN or infinity-handling handling descriptions.
>>>>>>>>>> That would be a huge task and would likely end up
>>>>>>>>>> bleeding implementation detail into the APIs.  As I said
>>>>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>>>>> to be well-defined.  NaN-handling in floating point 
>>>>>>>>>> computations is defined and as long as we fully document
>>>>>>>>>> what our algorithms do, I think it is fair enough to "let
>>>>>>>>>> the NaNs fall where they may" - which basically means
>>>>>>>>>> users need to do the filtering / transformation
>>>>>>>>>> themselves.
>>>>>>>>> So, do I understand correctly that it's OK to add the
>>>>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in
>>>>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>>>>> have a use for this within Commons Math?
>>>>>>>> You raise a good point here.  Personally, I don't see an
>>>>>>>> internal use and if we are sticking with MathArrays including
>>>>>>>> only things we have internal use for, I would say no, don't
>>>>>>>> include them.
>>>>>>> A third way to look at it would be that since some algorithms
>>>>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>>>>> provide some simple means to do so. A user could then write,
>>>>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>>>>>
>>>>>>> Percentile p = new Percentile(); double q =
>>>>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>>>>> Double.POSITIVE_INFINITY), 0.1); ---
>>>>>>>
>>>>>>> Then, if we provide that utility, is it still necessary to
>>>>>>> complicate the Percentile code with a NaNStrategy parameter?
>>>>>> I may be missing something, but it seems to me that percentile
>>>>>> does need a NaNStrategy in the presence if NaNs like the other
>>>>>> order statistics do (basically to define their position In the
>>>>>> ordering).  It doesn't logically need more than that.  I would
>>>>>> be ok defaulting the strategy or even returning NaN in the
>>>>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>>>>> of this can be documented.
>>>>> I am _surely_ missing something because I don't figure out how NaNs
>>>>> can be useful when computing those statistics... :-}
>>>> Good point.  Basically the different strategies allow you to
>>>> effectively ignore or work around them so stats can be well-defined.
>>>> Without NaNStrategy, order statistics including NaNs are undefined.
>>> The problem with Percentile is that the elements are both reordered
>>> *and* used when interpolating between two values. Tor the rank methods,
>>> only reordering was needed so the various NaNStrategies could be
>>> implemented by replacing at will with +/- infinity and a classical sort
>>> would do the ordering for us.
>>>
>>> Here, we do interpolate on the values, so if we first replace NaN with
>>> +infinity to put them at the end, we may interpolate between a finite
>>> value and +inf and have +inf as a result, whereas if the NaN "value" was
>>> preserved and some magic used to sort the NaN, we would end up
>>> interpolating with a NaN and get a NaN as a result. Setting up the magic
>>> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
>>> have a specific sort (in fact not a sort but a selection algorithm,
>>> which is the core of the class). Setting up the magic for FIXED seems
>>> more difficult.
>>>
>>> So I guess implementing NaNStrategies may be done in different ways
>>> depending on how they will be used by the calling algorithm. I don't yet
>>> know if we can set up a general method to be used for all statistics.
>>> Looking only at Percentile, replacement with +/- infinity already seems
>>> to not be an appropriate solution.
>>
>> I don't think we should try to redefine algorithms so they can
>> handle NaNs.  When you do floating point computations with NaNs, you
>> get NaNs.  That should be understood to be the contract.  For rank
>> statistics, such as Percentile, we need to extend the ordering on
>> doubles to include them in ordering for the algorithm to be
>> well-defined.  If we subsequently do floating point computations
>> with the NaNs, the result should be NaN.  I think we should either
>> just simplify things and say we *always* return NaN whenever NaNs
>> are present for Percentile, or do the following:
>>
>> 0) use the prescribed NaN strategy to determine the ordering - so
>> NaNs end up either at the bottom, at the top or in their original
>> position (FIXED).
>> 1) if interpolation involves a NaN element, the result returned is NaN.
> 
> This is exactly what I say. If NaNs are in, NaNs should be out. As for
> now we do replace them, we get infinities instead of NaNs. I would
> prefer getting NaNs there as it is more consistent with the purpose of
> NaNs in floating point encoding.
> 
> So I think we agree: NaNs should not be changed, but they should be
> reordered. Perhaps we could have something simple like NaNStrategy
> having a method:
> 
>   boolean shouldSwap(int i1, double d1, int i2, double i2)
> 
> that would use both the index and values of the element to check if they
> should be swapped or not. This method would be used both for traditional
> sorting for ranking and in selection algorithms for percentile. I first
> thought we should simply have NaN implement Comparator<Double> but this
> would not work for FIXED, whereas the above method would work (simply
> returning false as soon as there is a NaN, regardless of it being at a
> lower or upper index).
> 
> I will take a look at this option.

This does not work as is. The problem is lack of transitivity. Suppose
that we have three indices i < j < k, with a[i] > a[k] and a[j] = NaN.

Then if during the sort/selection we compare elements at indices i and k
first, we will swap them, which is correct behaviour regardless of the
NaN strategy used. If we compare elemntes at indices i and j first and
then elements at indices j and k and we use FIXED NaNStrategy, then
we won't swap the two elements and the middle NaN would induced a
non-globally sorted array. I think FIXED should act as a river flowing
over rocks : the water (non-NaN elements) would move freely around and
be sorted, and the rocks (NaN elements) would stay were they are but do
not prevent water flow.

I have found another way to handle this, using an indirection array that
could "jump" over NaNs for FIXED strategy or push them to either end for
MINIMAL or MAXIMAL strategy.

I will try this in the next few days, as time permit.

best regards,
Luc

> 
> best regards,
> Luc
> 
>>
>> Phil
>>>
>>> best regards,
>>> Luc
>>>
>>>>
>>>> Phil
>>>>
>>>>>> I still don't see a within-math use for recoding methods; though
>>>>>> I do see the value from a user perspective.  The problem is if we
>>>>>> open this door, we risk MathArrays becoming a mini commons lang.
>>>>> If Commons Lang already provides a feature, then we indeed don't
>>>>> need to offer the same within CM (if there is no internal use). Can
>>>>> someone confirm?
>>>>>
>>>>>
>>>>> Otherwise we could define a new inner interface in MathArrays,
>>>>> similar to "MathArrays.Function": --- public class MathArrays {
>>>>>
>>>>> // ...
>>>>>
>>>>> public interface Transformer { /** Transform the whole array */ 
>>>>> double[] transform(double[] input); /** Transform the array in the
>>>>> interval [from, to) and return the whole array */ double[]
>>>>> transform(double[] input, int from, int to); /** Transform the
>>>>> array in the interval [from, to) and return the transformed
>>>>> sub-array */ double[] transformAndSplice(double[] input, int from,
>>>>> int to); } } ---
>>>>>
>>>>> Then we can provide "replace" and "remove" transformers through
>>>>> factory methods (example/untested code): --- public class
>>>>> MathArrays {
>>>>>
>>>>> // ...
>>>>>
>>>>> public static Transformer createReplaceTransformer(final double
>>>>> old, final double replace) { return new Transformer() { public
>>>>> double[] transform(double[] input) { final int len = input.length; 
>>>>> final double[] output = new double[len]; for (int i = 0; i < len;
>>>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? 
>>>>> replace : input[i]; } return output; }
>>>>>
>>>>> // etc. }; } } ---
>>>>>
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>>
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>>
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Luc Maisonobe <lu...@spaceroots.org>.
Le 30/06/2014 22:00, Phil Steitz a écrit :
> On 6/30/14, 12:33 PM, Luc Maisonobe wrote:
>> Le 30/06/2014 19:15, Phil Steitz a écrit :
>>>
>>>> On Jun 30, 2014, at 9:59 AM, Gilles <gi...@harfang.homelinux.org>
>>>> wrote:
>>>>
>>>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>>>> <gi...@harfang.homelinux.org> wrote:
>>>>>>
>>>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: 
>>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun,
>>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz 
>>>>>>>>>> <ph...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le
>>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit
>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc
>>>>>>>>>>>>>>> Maisonobe <lu...@spaceroots.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> While looking further in Percentile class
>>>>>>>>>>>>>>>> for MATH-1120, I have found another problem
>>>>>>>>>>>>>>>> in the current implementation. 
>>>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>> should
>>>>>>>>>>>>>>>> leave the NaNs in place, but at the end of
>>>>>>>>>>>>>>>> the KthSelector.select method, a call to
>>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of 
>>>>>>>>>>>>>>>> the small sub-array. What is really
>>>>>>>>>>>>>>>> implemented here is therefore closer to 
>>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
>>>>>>>>>>>>>>>> This always occur in the test cases because
>>>>>>>>>>>>>>>> they use very short arrays, and we 
>>>>>>>>>>>>>>>> directly
>>>>>>>>>>> switch to
>>>>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If
>>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you
>>>>>>>>>>>>>>> need something to indicate beyond this for
>>>>>>>>>>>>>>> NaN.
>>>>>>>>>>>>>> Well, we can just keep the NaN themselves and
>>>>>>>>>>>>>> handled them appropriately, hoping not to trash
>>>>>>>>>>>>>> performances too much.
>>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What is the test input you see an issue and
>>>>>>>>>>>>>>> what is the actual error you are seeing.
>>>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>>>> Just look at
>>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The 
>>>>>>>>>>>>>> first check in the test corresponds to a
>>>>>>>>>>>>>> Percentile configuration at 50% percentile, and
>>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number
>>>>>>>>>>>>>> of entries, so the 50% percentile is exactly
>>>>>>>>>>>>>> one of the entries: the one at index 5 in the 
>>>>>>>>>>>>>> final array.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN,
>>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be
>>>>>>>>>>>>>> modified at all in the selection algorithm and
>>>>>>>>>>>>>> the result for 50% should be the first NaN of
>>>>>>>>>>>>>> the array, at index 5. In fact, due to the
>>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into  {
>>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the
>>>>>>>>>>>>>> result is 5.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Agreed. just verified by putting back the
>>>>>>>>>>>>>> earlier insertionSort
>>>>>>>>>>> function.
>>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile
>>>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of
>>>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we agree to leave FIXED as unchanged
>>>>>>>>>>>>>> behaviour with your
>>>>>>>>>>> insertionSort
>>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be
>>>>>>>>>>>>> allowed for transformation
>>>>>>>>>>> of
>>>>>>>>>>>>> NaN to +/-Inf
>>>>>>>>>>>> I'm fine with it, as long as we clearly state it in
>>>>>>>>>>>> the NaNStrategy documentation, saying somethig
>>>>>>>>>>>> like:
>>>>>>>>>>>>
>>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are
>>>>>>>>>>>> effecively *replaced* by +/- infinity, hence the
>>>>>>>>>>>> results will be computed as if infinities were 
>>>>>>>>>>>> there in the first place.
>>>>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for
>>>>>>>>>>> one specific purpose - to turn extend the ordering on
>>>>>>>>>>> doubles a total ordering. Strategies prescribe only
>>>>>>>>>>> how NaNs are to be treated in the ordering.  Side
>>>>>>>>>>> effects on the input array don't make sense in this 
>>>>>>>>>>> context.  The use case for which this class was
>>>>>>>>>>> created was rank transformations.  Returning
>>>>>>>>>>> infinities in rank transformations would blow up
>>>>>>>>>>> computations in these classes.  If a floating point 
>>>>>>>>>>> computations need to transform NaNs, the specific
>>>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>>>> should perform and document the behavior.
>>>>>>>>>>>
>>>>>>>>>>> Phil
>>>>>>>>>> OK. Agreed realized it later.  Hence i have not
>>>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have
>>>>>>>>>> added a separate transformer to transform NaNs but it
>>>>>>>>>> uses NanStrategy.  Please let me know on this as this 
>>>>>>>>>> trasnformation itself can be used in different places.
>>>>>>>>> Honestly, I don't see the value of this.  I would be more
>>>>>>>>> favorable to an addition to MathArrays supporting NaN (or
>>>>>>>>> infinity) filtering / transformation.  Several years back
>>>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>>>> computations, which basically means that if you feed NaNs
>>>>>>>>> to [math] you are going to get NaNs out in results. If we
>>>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>>>> for that matter), I think it is probably best to do that
>>>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs
>>>>>>>>> with NaN or infinity-handling handling descriptions.
>>>>>>>>> That would be a huge task and would likely end up
>>>>>>>>> bleeding implementation detail into the APIs.  As I said
>>>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>>>> to be well-defined.  NaN-handling in floating point 
>>>>>>>>> computations is defined and as long as we fully document
>>>>>>>>> what our algorithms do, I think it is fair enough to "let
>>>>>>>>> the NaNs fall where they may" - which basically means
>>>>>>>>> users need to do the filtering / transformation
>>>>>>>>> themselves.
>>>>>>>> So, do I understand correctly that it's OK to add the
>>>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in
>>>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>>>> have a use for this within Commons Math?
>>>>>>> You raise a good point here.  Personally, I don't see an
>>>>>>> internal use and if we are sticking with MathArrays including
>>>>>>> only things we have internal use for, I would say no, don't
>>>>>>> include them.
>>>>>> A third way to look at it would be that since some algorithms
>>>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>>>> provide some simple means to do so. A user could then write,
>>>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>>>>
>>>>>> Percentile p = new Percentile(); double q =
>>>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>>>> Double.POSITIVE_INFINITY), 0.1); ---
>>>>>>
>>>>>> Then, if we provide that utility, is it still necessary to
>>>>>> complicate the Percentile code with a NaNStrategy parameter?
>>>>> I may be missing something, but it seems to me that percentile
>>>>> does need a NaNStrategy in the presence if NaNs like the other
>>>>> order statistics do (basically to define their position In the
>>>>> ordering).  It doesn't logically need more than that.  I would
>>>>> be ok defaulting the strategy or even returning NaN in the
>>>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>>>> of this can be documented.
>>>> I am _surely_ missing something because I don't figure out how NaNs
>>>> can be useful when computing those statistics... :-}
>>> Good point.  Basically the different strategies allow you to
>>> effectively ignore or work around them so stats can be well-defined.
>>> Without NaNStrategy, order statistics including NaNs are undefined.
>> The problem with Percentile is that the elements are both reordered
>> *and* used when interpolating between two values. Tor the rank methods,
>> only reordering was needed so the various NaNStrategies could be
>> implemented by replacing at will with +/- infinity and a classical sort
>> would do the ordering for us.
>>
>> Here, we do interpolate on the values, so if we first replace NaN with
>> +infinity to put them at the end, we may interpolate between a finite
>> value and +inf and have +inf as a result, whereas if the NaN "value" was
>> preserved and some magic used to sort the NaN, we would end up
>> interpolating with a NaN and get a NaN as a result. Setting up the magic
>> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
>> have a specific sort (in fact not a sort but a selection algorithm,
>> which is the core of the class). Setting up the magic for FIXED seems
>> more difficult.
>>
>> So I guess implementing NaNStrategies may be done in different ways
>> depending on how they will be used by the calling algorithm. I don't yet
>> know if we can set up a general method to be used for all statistics.
>> Looking only at Percentile, replacement with +/- infinity already seems
>> to not be an appropriate solution.
> 
> I don't think we should try to redefine algorithms so they can
> handle NaNs.  When you do floating point computations with NaNs, you
> get NaNs.  That should be understood to be the contract.  For rank
> statistics, such as Percentile, we need to extend the ordering on
> doubles to include them in ordering for the algorithm to be
> well-defined.  If we subsequently do floating point computations
> with the NaNs, the result should be NaN.  I think we should either
> just simplify things and say we *always* return NaN whenever NaNs
> are present for Percentile, or do the following:
> 
> 0) use the prescribed NaN strategy to determine the ordering - so
> NaNs end up either at the bottom, at the top or in their original
> position (FIXED).
> 1) if interpolation involves a NaN element, the result returned is NaN.

This is exactly what I say. If NaNs are in, NaNs should be out. As for
now we do replace them, we get infinities instead of NaNs. I would
prefer getting NaNs there as it is more consistent with the purpose of
NaNs in floating point encoding.

So I think we agree: NaNs should not be changed, but they should be
reordered. Perhaps we could have something simple like NaNStrategy
having a method:

  boolean shouldSwap(int i1, double d1, int i2, double i2)

that would use both the index and values of the element to check if they
should be swapped or not. This method would be used both for traditional
sorting for ranking and in selection algorithms for percentile. I first
thought we should simply have NaN implement Comparator<Double> but this
would not work for FIXED, whereas the above method would work (simply
returning false as soon as there is a NaN, regardless of it being at a
lower or upper index).

I will take a look at this option.

best regards,
Luc

> 
> Phil
>>
>> best regards,
>> Luc
>>
>>>
>>> Phil
>>>
>>>>> I still don't see a within-math use for recoding methods; though
>>>>> I do see the value from a user perspective.  The problem is if we
>>>>> open this door, we risk MathArrays becoming a mini commons lang.
>>>> If Commons Lang already provides a feature, then we indeed don't
>>>> need to offer the same within CM (if there is no internal use). Can
>>>> someone confirm?
>>>>
>>>>
>>>> Otherwise we could define a new inner interface in MathArrays,
>>>> similar to "MathArrays.Function": --- public class MathArrays {
>>>>
>>>> // ...
>>>>
>>>> public interface Transformer { /** Transform the whole array */ 
>>>> double[] transform(double[] input); /** Transform the array in the
>>>> interval [from, to) and return the whole array */ double[]
>>>> transform(double[] input, int from, int to); /** Transform the
>>>> array in the interval [from, to) and return the transformed
>>>> sub-array */ double[] transformAndSplice(double[] input, int from,
>>>> int to); } } ---
>>>>
>>>> Then we can provide "replace" and "remove" transformers through
>>>> factory methods (example/untested code): --- public class
>>>> MathArrays {
>>>>
>>>> // ...
>>>>
>>>> public static Transformer createReplaceTransformer(final double
>>>> old, final double replace) { return new Transformer() { public
>>>> double[] transform(double[] input) { final int len = input.length; 
>>>> final double[] output = new double[len]; for (int i = 0; i < len;
>>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? 
>>>> replace : input[i]; } return output; }
>>>>
>>>> // etc. }; } } ---
>>>>
>>>>
>>>> Gilles
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>>
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>> ---------------------------------------------------------------------
>>>
>>>
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Phil Steitz <ph...@gmail.com>.
On 6/30/14, 12:33 PM, Luc Maisonobe wrote:
> Le 30/06/2014 19:15, Phil Steitz a écrit :
>>
>>> On Jun 30, 2014, at 9:59 AM, Gilles <gi...@harfang.homelinux.org>
>>> wrote:
>>>
>>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>>> <gi...@harfang.homelinux.org> wrote:
>>>>>
>>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: 
>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun,
>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz 
>>>>>>>>> <ph...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le
>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit
>>>>>>>>>>>>> :
>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc
>>>>>>>>>>>>>> Maisonobe <lu...@spaceroots.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> While looking further in Percentile class
>>>>>>>>>>>>>>> for MATH-1120, I have found another problem
>>>>>>>>>>>>>>> in the current implementation. 
>>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>> should
>>>>>>>>>>>>>>> leave the NaNs in place, but at the end of
>>>>>>>>>>>>>>> the KthSelector.select method, a call to
>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of 
>>>>>>>>>>>>>>> the small sub-array. What is really
>>>>>>>>>>>>>>> implemented here is therefore closer to 
>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
>>>>>>>>>>>>>>> This always occur in the test cases because
>>>>>>>>>>>>>>> they use very short arrays, and we 
>>>>>>>>>>>>>>> directly
>>>>>>>>>> switch to
>>>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If
>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you
>>>>>>>>>>>>>> need something to indicate beyond this for
>>>>>>>>>>>>>> NaN.
>>>>>>>>>>>>> Well, we can just keep the NaN themselves and
>>>>>>>>>>>>> handled them appropriately, hoping not to trash
>>>>>>>>>>>>> performances too much.
>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>
>>>>>>>>>>>>>> What is the test input you see an issue and
>>>>>>>>>>>>>> what is the actual error you are seeing.
>>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>>> Just look at
>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The 
>>>>>>>>>>>>> first check in the test corresponds to a
>>>>>>>>>>>>> Percentile configuration at 50% percentile, and
>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number
>>>>>>>>>>>>> of entries, so the 50% percentile is exactly
>>>>>>>>>>>>> one of the entries: the one at index 5 in the 
>>>>>>>>>>>>> final array.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN,
>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be
>>>>>>>>>>>>> modified at all in the selection algorithm and
>>>>>>>>>>>>> the result for 50% should be the first NaN of
>>>>>>>>>>>>> the array, at index 5. In fact, due to the
>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into  {
>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the
>>>>>>>>>>>>> result is 5.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Agreed. just verified by putting back the
>>>>>>>>>>>>> earlier insertionSort
>>>>>>>>>> function.
>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile
>>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of
>>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we agree to leave FIXED as unchanged
>>>>>>>>>>>>> behaviour with your
>>>>>>>>>> insertionSort
>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be
>>>>>>>>>>>> allowed for transformation
>>>>>>>>>> of
>>>>>>>>>>>> NaN to +/-Inf
>>>>>>>>>>> I'm fine with it, as long as we clearly state it in
>>>>>>>>>>> the NaNStrategy documentation, saying somethig
>>>>>>>>>>> like:
>>>>>>>>>>>
>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are
>>>>>>>>>>> effecively *replaced* by +/- infinity, hence the
>>>>>>>>>>> results will be computed as if infinities were 
>>>>>>>>>>> there in the first place.
>>>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for
>>>>>>>>>> one specific purpose - to turn extend the ordering on
>>>>>>>>>> doubles a total ordering. Strategies prescribe only
>>>>>>>>>> how NaNs are to be treated in the ordering.  Side
>>>>>>>>>> effects on the input array don't make sense in this 
>>>>>>>>>> context.  The use case for which this class was
>>>>>>>>>> created was rank transformations.  Returning
>>>>>>>>>> infinities in rank transformations would blow up
>>>>>>>>>> computations in these classes.  If a floating point 
>>>>>>>>>> computations need to transform NaNs, the specific
>>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>>> should perform and document the behavior.
>>>>>>>>>>
>>>>>>>>>> Phil
>>>>>>>>> OK. Agreed realized it later.  Hence i have not
>>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have
>>>>>>>>> added a separate transformer to transform NaNs but it
>>>>>>>>> uses NanStrategy.  Please let me know on this as this 
>>>>>>>>> trasnformation itself can be used in different places.
>>>>>>>> Honestly, I don't see the value of this.  I would be more
>>>>>>>> favorable to an addition to MathArrays supporting NaN (or
>>>>>>>> infinity) filtering / transformation.  Several years back
>>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>>> computations, which basically means that if you feed NaNs
>>>>>>>> to [math] you are going to get NaNs out in results. If we
>>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>>> for that matter), I think it is probably best to do that
>>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs
>>>>>>>> with NaN or infinity-handling handling descriptions.
>>>>>>>> That would be a huge task and would likely end up
>>>>>>>> bleeding implementation detail into the APIs.  As I said
>>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>>> to be well-defined.  NaN-handling in floating point 
>>>>>>>> computations is defined and as long as we fully document
>>>>>>>> what our algorithms do, I think it is fair enough to "let
>>>>>>>> the NaNs fall where they may" - which basically means
>>>>>>>> users need to do the filtering / transformation
>>>>>>>> themselves.
>>>>>>> So, do I understand correctly that it's OK to add the
>>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in
>>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>>> have a use for this within Commons Math?
>>>>>> You raise a good point here.  Personally, I don't see an
>>>>>> internal use and if we are sticking with MathArrays including
>>>>>> only things we have internal use for, I would say no, don't
>>>>>> include them.
>>>>> A third way to look at it would be that since some algorithms
>>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>>> provide some simple means to do so. A user could then write,
>>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>>>
>>>>> Percentile p = new Percentile(); double q =
>>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>>> Double.POSITIVE_INFINITY), 0.1); ---
>>>>>
>>>>> Then, if we provide that utility, is it still necessary to
>>>>> complicate the Percentile code with a NaNStrategy parameter?
>>>> I may be missing something, but it seems to me that percentile
>>>> does need a NaNStrategy in the presence if NaNs like the other
>>>> order statistics do (basically to define their position In the
>>>> ordering).  It doesn't logically need more than that.  I would
>>>> be ok defaulting the strategy or even returning NaN in the
>>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>>> of this can be documented.
>>> I am _surely_ missing something because I don't figure out how NaNs
>>> can be useful when computing those statistics... :-}
>> Good point.  Basically the different strategies allow you to
>> effectively ignore or work around them so stats can be well-defined.
>> Without NaNStrategy, order statistics including NaNs are undefined.
> The problem with Percentile is that the elements are both reordered
> *and* used when interpolating between two values. Tor the rank methods,
> only reordering was needed so the various NaNStrategies could be
> implemented by replacing at will with +/- infinity and a classical sort
> would do the ordering for us.
>
> Here, we do interpolate on the values, so if we first replace NaN with
> +infinity to put them at the end, we may interpolate between a finite
> value and +inf and have +inf as a result, whereas if the NaN "value" was
> preserved and some magic used to sort the NaN, we would end up
> interpolating with a NaN and get a NaN as a result. Setting up the magic
> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
> have a specific sort (in fact not a sort but a selection algorithm,
> which is the core of the class). Setting up the magic for FIXED seems
> more difficult.
>
> So I guess implementing NaNStrategies may be done in different ways
> depending on how they will be used by the calling algorithm. I don't yet
> know if we can set up a general method to be used for all statistics.
> Looking only at Percentile, replacement with +/- infinity already seems
> to not be an appropriate solution.

I don't think we should try to redefine algorithms so they can
handle NaNs.  When you do floating point computations with NaNs, you
get NaNs.  That should be understood to be the contract.  For rank
statistics, such as Percentile, we need to extend the ordering on
doubles to include them in ordering for the algorithm to be
well-defined.  If we subsequently do floating point computations
with the NaNs, the result should be NaN.  I think we should either
just simplify things and say we *always* return NaN whenever NaNs
are present for Percentile, or do the following:

0) use the prescribed NaN strategy to determine the ordering - so
NaNs end up either at the bottom, at the top or in their original
position (FIXED).
1) if interpolation involves a NaN element, the result returned is NaN.

Phil
>
> best regards,
> Luc
>
>>
>> Phil
>>
>>>> I still don't see a within-math use for recoding methods; though
>>>> I do see the value from a user perspective.  The problem is if we
>>>> open this door, we risk MathArrays becoming a mini commons lang.
>>> If Commons Lang already provides a feature, then we indeed don't
>>> need to offer the same within CM (if there is no internal use). Can
>>> someone confirm?
>>>
>>>
>>> Otherwise we could define a new inner interface in MathArrays,
>>> similar to "MathArrays.Function": --- public class MathArrays {
>>>
>>> // ...
>>>
>>> public interface Transformer { /** Transform the whole array */ 
>>> double[] transform(double[] input); /** Transform the array in the
>>> interval [from, to) and return the whole array */ double[]
>>> transform(double[] input, int from, int to); /** Transform the
>>> array in the interval [from, to) and return the transformed
>>> sub-array */ double[] transformAndSplice(double[] input, int from,
>>> int to); } } ---
>>>
>>> Then we can provide "replace" and "remove" transformers through
>>> factory methods (example/untested code): --- public class
>>> MathArrays {
>>>
>>> // ...
>>>
>>> public static Transformer createReplaceTransformer(final double
>>> old, final double replace) { return new Transformer() { public
>>> double[] transform(double[] input) { final int len = input.length; 
>>> final double[] output = new double[len]; for (int i = 0; i < len;
>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? 
>>> replace : input[i]; } return output; }
>>>
>>> // etc. }; } } ---
>>>
>>>
>>> Gilles
>>>
>>>
>>> ---------------------------------------------------------------------
>>>
>>>
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>> ---------------------------------------------------------------------
>>
>>
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Luc Maisonobe <lu...@spaceroots.org>.
Le 30/06/2014 19:15, Phil Steitz a écrit :
> 
> 
>> On Jun 30, 2014, at 9:59 AM, Gilles <gi...@harfang.homelinux.org>
>> wrote:
>> 
>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>> <gi...@harfang.homelinux.org> wrote:
>>>> 
>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: 
>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun,
>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz 
>>>>>>>> <ph...@gmail.com> wrote:
>>>>>>>> 
>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le
>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>> wrote:
>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>> 
>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit
>>>>>>>>>>>> :
>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc
>>>>>>>>>>>>> Maisonobe <lu...@spaceroots.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> While looking further in Percentile class
>>>>>>>>>>>>>> for MATH-1120, I have found another problem
>>>>>>>>>>>>>> in the current implementation. 
>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>> should
>>>>>>>>>>>>>> leave the NaNs in place, but at the end of
>>>>>>>>>>>>>> the KthSelector.select method, a call to
>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of 
>>>>>>>>>>>>>> the small sub-array. What is really
>>>>>>>>>>>>>> implemented here is therefore closer to 
>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
>>>>>>>>>>>>>> This always occur in the test cases because
>>>>>>>>>>>>>> they use very short arrays, and we 
>>>>>>>>>>>>>> directly
>>>>>>>>> switch to
>>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If
>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you
>>>>>>>>>>>>> need something to indicate beyond this for
>>>>>>>>>>>>> NaN.
>>>>>>>>>>>> Well, we can just keep the NaN themselves and
>>>>>>>>>>>> handled them appropriately, hoping not to trash
>>>>>>>>>>>> performances too much.
>>>>>>>>>>> Agreed.
>>>>>>>>>>> 
>>>>>>>>>>>>> What is the test input you see an issue and
>>>>>>>>>>>>> what is the actual error you are seeing.
>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>> Just look at
>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The 
>>>>>>>>>>>> first check in the test corresponds to a
>>>>>>>>>>>> Percentile configuration at 50% percentile, and
>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number
>>>>>>>>>>>> of entries, so the 50% percentile is exactly
>>>>>>>>>>>> one of the entries: the one at index 5 in the 
>>>>>>>>>>>> final array.
>>>>>>>>>>>> 
>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN,
>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be
>>>>>>>>>>>> modified at all in the selection algorithm and
>>>>>>>>>>>> the result for 50% should be the first NaN of
>>>>>>>>>>>> the array, at index 5. In fact, due to the
>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into  {
>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the
>>>>>>>>>>>> result is 5.
>>>>>>>>>>>> 
>>>>>>>>>>>> Agreed. just verified by putting back the
>>>>>>>>>>>> earlier insertionSort
>>>>>>>>> function.
>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile
>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of
>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>> 
>>>>>>>>>>>> If we agree to leave FIXED as unchanged
>>>>>>>>>>>> behaviour with your
>>>>>>>>> insertionSort
>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be
>>>>>>>>>>> allowed for transformation
>>>>>>>>> of
>>>>>>>>>>> NaN to +/-Inf
>>>>>>>>>> I'm fine with it, as long as we clearly state it in
>>>>>>>>>> the NaNStrategy documentation, saying somethig
>>>>>>>>>> like:
>>>>>>>>>> 
>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are
>>>>>>>>>> effecively *replaced* by +/- infinity, hence the
>>>>>>>>>> results will be computed as if infinities were 
>>>>>>>>>> there in the first place.
>>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for
>>>>>>>>> one specific purpose - to turn extend the ordering on
>>>>>>>>> doubles a total ordering. Strategies prescribe only
>>>>>>>>> how NaNs are to be treated in the ordering.  Side
>>>>>>>>> effects on the input array don't make sense in this 
>>>>>>>>> context.  The use case for which this class was
>>>>>>>>> created was rank transformations.  Returning
>>>>>>>>> infinities in rank transformations would blow up
>>>>>>>>> computations in these classes.  If a floating point 
>>>>>>>>> computations need to transform NaNs, the specific
>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>> should perform and document the behavior.
>>>>>>>>> 
>>>>>>>>> Phil
>>>>>>>> OK. Agreed realized it later.  Hence i have not
>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have
>>>>>>>> added a separate transformer to transform NaNs but it
>>>>>>>> uses NanStrategy.  Please let me know on this as this 
>>>>>>>> trasnformation itself can be used in different places.
>>>>>>> 
>>>>>>> Honestly, I don't see the value of this.  I would be more
>>>>>>> favorable to an addition to MathArrays supporting NaN (or
>>>>>>> infinity) filtering / transformation.  Several years back
>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>> computations, which basically means that if you feed NaNs
>>>>>>> to [math] you are going to get NaNs out in results. If we
>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>> for that matter), I think it is probably best to do that
>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs
>>>>>>> with NaN or infinity-handling handling descriptions.
>>>>>>> That would be a huge task and would likely end up
>>>>>>> bleeding implementation detail into the APIs.  As I said
>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>> to be well-defined.  NaN-handling in floating point 
>>>>>>> computations is defined and as long as we fully document
>>>>>>> what our algorithms do, I think it is fair enough to "let
>>>>>>> the NaNs fall where they may" - which basically means
>>>>>>> users need to do the filtering / transformation
>>>>>>> themselves.
>>>>>> 
>>>>>> So, do I understand correctly that it's OK to add the
>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in
>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>> have a use for this within Commons Math?
>>>>> 
>>>>> You raise a good point here.  Personally, I don't see an
>>>>> internal use and if we are sticking with MathArrays including
>>>>> only things we have internal use for, I would say no, don't
>>>>> include them.
>>>> 
>>>> A third way to look at it would be that since some algorithms
>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>> provide some simple means to do so. A user could then write,
>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>> 
>>>> Percentile p = new Percentile(); double q =
>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>> Double.POSITIVE_INFINITY), 0.1); ---
>>>> 
>>>> Then, if we provide that utility, is it still necessary to
>>>> complicate the Percentile code with a NaNStrategy parameter?
>>> 
>>> I may be missing something, but it seems to me that percentile
>>> does need a NaNStrategy in the presence if NaNs like the other
>>> order statistics do (basically to define their position In the
>>> ordering).  It doesn't logically need more than that.  I would
>>> be ok defaulting the strategy or even returning NaN in the
>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>> of this can be documented.
>> 
>> I am _surely_ missing something because I don't figure out how NaNs
>> can be useful when computing those statistics... :-}
> 
> Good point.  Basically the different strategies allow you to
> effectively ignore or work around them so stats can be well-defined.
> Without NaNStrategy, order statistics including NaNs are undefined.

The problem with Percentile is that the elements are both reordered
*and* used when interpolating between two values. Tor the rank methods,
only reordering was needed so the various NaNStrategies could be
implemented by replacing at will with +/- infinity and a classical sort
would do the ordering for us.

Here, we do interpolate on the values, so if we first replace NaN with
+infinity to put them at the end, we may interpolate between a finite
value and +inf and have +inf as a result, whereas if the NaN "value" was
preserved and some magic used to sort the NaN, we would end up
interpolating with a NaN and get a NaN as a result. Setting up the magic
for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
have a specific sort (in fact not a sort but a selection algorithm,
which is the core of the class). Setting up the magic for FIXED seems
more difficult.

So I guess implementing NaNStrategies may be done in different ways
depending on how they will be used by the calling algorithm. I don't yet
know if we can set up a general method to be used for all statistics.
Looking only at Percentile, replacement with +/- infinity already seems
to not be an appropriate solution.

best regards,
Luc

> 
> 
> Phil
> 
>> 
>>> I still don't see a within-math use for recoding methods; though
>>> I do see the value from a user perspective.  The problem is if we
>>> open this door, we risk MathArrays becoming a mini commons lang.
>> 
>> If Commons Lang already provides a feature, then we indeed don't
>> need to offer the same within CM (if there is no internal use). Can
>> someone confirm?
>> 
>> 
>> Otherwise we could define a new inner interface in MathArrays,
>> similar to "MathArrays.Function": --- public class MathArrays {
>> 
>> // ...
>> 
>> public interface Transformer { /** Transform the whole array */ 
>> double[] transform(double[] input); /** Transform the array in the
>> interval [from, to) and return the whole array */ double[]
>> transform(double[] input, int from, int to); /** Transform the
>> array in the interval [from, to) and return the transformed
>> sub-array */ double[] transformAndSplice(double[] input, int from,
>> int to); } } ---
>> 
>> Then we can provide "replace" and "remove" transformers through
>> factory methods (example/untested code): --- public class
>> MathArrays {
>> 
>> // ...
>> 
>> public static Transformer createReplaceTransformer(final double
>> old, final double replace) { return new Transformer() { public
>> double[] transform(double[] input) { final int len = input.length; 
>> final double[] output = new double[len]; for (int i = 0; i < len;
>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? 
>> replace : input[i]; } return output; }
>> 
>> // etc. }; } } ---
>> 
>> 
>> Gilles
>> 
>> 
>> ---------------------------------------------------------------------
>>
>> 
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>> 
> 
> ---------------------------------------------------------------------
>
> 
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Phil Steitz <ph...@gmail.com>.

> On Jun 30, 2014, at 9:59 AM, Gilles <gi...@harfang.homelinux.org> wrote:
> 
> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>> On Jun 29, 2014, at 3:41 PM, Gilles <gi...@harfang.homelinux.org> wrote:
>>> 
>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>>>> <ph...@gmail.com> wrote:
>>>>>>> 
>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>> wrote:
>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>> 
>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> While looking further in Percentile class for MATH-1120, I
>>>>>>>>>>>>> have found
>>>>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>> should
>>>>>>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>>>>> KthSelector.select
>>>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of
>>>>>>>>>>>>> the small
>>>>>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>>>>>> closer to
>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
>>>>>>>>>>>>> occur in the
>>>>>>>>>>>>> test cases because they use very short arrays, and we
>>>>>>>>>>>>> directly
>>>>>>>> switch to
>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>>>>> something to
>>>>>>>>>>>> indicate beyond this for NaN.
>>>>>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>>>> Agreed.
>>>>>>>>>> 
>>>>>>>>>>>> What is the test input you see an issue and what is the
>>>>>>>>>>>> actual error
>>>>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>>>>>> first check in
>>>>>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>>>>>> percentile,
>>>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>>>>> entries, so the
>>>>>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>>>>>> index 5 in the
>>>>>>>>>>> final array.
>>>>>>>>>>> 
>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>>>>>> NaN, 8 }. So
>>>>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>>>>> at all in
>>>>>>>>>>> the selection algorithm and the result for 50% should be the
>>>>>>>>>>> first NaN
>>>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>>>>> we *do*
>>>>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>>>>>> NaN }, so
>>>>>>>>>>> the result is 5.
>>>>>>>>>>> 
>>>>>>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>>>>>>> function.
>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>>>>>> get as a
>>>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>>>> 
>>>>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>>>>>> insertionSort
>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>>>>> transformation
>>>>>>>> of
>>>>>>>>>> NaN to +/-Inf
>>>>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>>>>> NaNStrategy
>>>>>>>>> documentation, saying somethig like:
>>>>>>>>> 
>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>>>>> *replaced* by
>>>>>>>>> +/- infinity, hence the results will be computed as if
>>>>>>>>> infinities were
>>>>>>>>> there in the first place.
>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>>>>>>> purpose - to turn extend the ordering on doubles a total ordering.
>>>>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>>>>> ordering.  Side effects on the input array don't make sense in
>>>>>>>> this
>>>>>>>> context.  The use case for which this class was created was rank
>>>>>>>> transformations.  Returning infinities in rank transformations
>>>>>>>> would
>>>>>>>> blow up computations in these classes.  If a floating point
>>>>>>>> computations need to transform NaNs, the specific stats / other
>>>>>>>> classes that do this transformation should perform and document
>>>>>>>> the
>>>>>>>> behavior.
>>>>>>>> 
>>>>>>>> Phil
>>>>>>> OK. Agreed realized it later.  Hence i have not touched
>>>>>>> NaNStrategy in my
>>>>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>>>>> transform NaNs
>>>>>>> but it uses NanStrategy.  Please let me know on this as this
>>>>>>> trasnformation
>>>>>>> itself
>>>>>>> can be used in different places.
>>>>>> 
>>>>>> Honestly, I don't see the value of this.  I would be more favorable
>>>>>> to an addition to MathArrays supporting NaN (or infinity) filtering
>>>>>> / transformation.  Several years back we made the decision to just
>>>>>> let NaNs propagate in computations, which basically means that if
>>>>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>>>>> If we do get into the business of filtering NaNs (or infinities for
>>>>>> that matter), I think it is probably best to do that in MathArrays
>>>>>> or somesuch - i.e., don't decorate APIs with NaN or
>>>>>> infinity-handling handling descriptions.  That would be a huge task
>>>>>> and would likely end up bleeding implementation detail into the
>>>>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>>>>> transformations to be well-defined.  NaN-handling in floating point
>>>>>> computations is defined and as long as we fully document what our
>>>>>> algorithms do, I think it is fair enough to "let the NaNs fall where
>>>>>> they may" - which basically means users need to do the filtering /
>>>>>> transformation themselves.
>>>>> 
>>>>> So, do I understand correctly that it's OK to add the "remove" and
>>>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>>>>> Or does this thread conclude that we don't have a use for this within
>>>>> Commons Math?
>>>> 
>>>> You raise a good point here.  Personally, I don't see an internal
>>>> use and if we are sticking with MathArrays including only things we
>>>> have internal use for, I would say no, don't include them.
>>> 
>>> A third way to look at it would be that since some algorithms require
>>> being passed "clean" data (e.g. without NaNs), we could provide some
>>> simple means to do so. A user could then write, e.g.
>>> ---
>>> double[] data = { 1, Double.NaN, 3, 6, 5 };
>>> 
>>> Percentile p = new Percentile();
>>> double q = p.evaluate(MathArrays.replace(data, Double.NaN, Double.POSITIVE_INFINITY), 0.1);
>>> ---
>>> 
>>> Then, if we provide that utility, is it still necessary to complicate the Percentile
>>> code with a NaNStrategy parameter?
>> 
>> I may be missing something, but it seems to me that percentile does
>> need a NaNStrategy in the presence if NaNs like the other order
>> statistics do (basically to define their position In
>> the ordering).  It doesn't logically need more than that.  I would be
>> ok defaulting the strategy or even returning NaN in the presence of
>> NaNs (or whenever interpolation involves a NaN). All of this can be
>> documented.
> 
> I am _surely_ missing something because I don't figure out how NaNs can
> be useful when computing those statistics... :-}

Good point.  Basically the different strategies allow you to effectively ignore or work around them so stats can be well-defined.  Without NaNStrategy, order statistics including NaNs are undefined.  

Phil

> 
>> I still don't see a within-math use for recoding methods; though I do
>> see the value from a user perspective.  The problem is if we open this
>> door, we risk MathArrays becoming a mini commons lang.
> 
> If Commons Lang already provides a feature, then we indeed don't need
> to offer the same within CM (if there is no internal use).
> Can someone confirm?
> 
> 
> Otherwise we could define a new inner interface in MathArrays, similar
> to "MathArrays.Function":
> ---
> public class MathArrays {
> 
>    // ...
> 
>    public interface Transformer {
>        /** Transform the whole array */
>        double[] transform(double[] input);
>        /** Transform the array in the interval [from, to) and return the whole array */
>        double[] transform(double[] input, int from, int to);
>        /** Transform the array in the interval [from, to) and return the transformed sub-array */
>        double[] transformAndSplice(double[] input, int from, int to);
>    }
> }
> ---
> 
> Then we can provide "replace" and "remove" transformers through factory methods
> (example/untested code):
> ---
> public class MathArrays {
> 
>    // ...
> 
>    public static Transformer createReplaceTransformer(final double old,
>                                                       final double replace) {
>        return new Transformer() {
>            public double[] transform(double[] input) {
>                final int len = input.length;
>                final double[] output = new double[len];
>                for (int i = 0; i < len; i++) {
>                    output[i] = Precision.equalsIncludingNaN(input[i], old) ?
>                        replace : input[i];
>                }
>                return output;
>            }
> 
>            // etc.
>        };
>    }
> }
> ---
> 
> 
> Gilles
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Gilles <gi...@harfang.homelinux.org>.
On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>> On Jun 29, 2014, at 3:41 PM, Gilles <gi...@harfang.homelinux.org> 
>> wrote:
>>
>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>>> <ph...@gmail.com> wrote:
>>>>>>
>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>>> <lu...@spaceroots.org>
>>>>>>> wrote:
>>>>>>>>>> Hi Venkat,
>>>>>>>>>>
>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>> wrote:
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>
>>>>>>>>>>>> While looking further in Percentile class for MATH-1120, I
>>>>>>>>>>>> have found
>>>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>> should
>>>>>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>>>> KthSelector.select
>>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of
>>>>>>>>>>>> the small
>>>>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>>>>> closer to
>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
>>>>>>>>>>>> occur in the
>>>>>>>>>>>> test cases because they use very short arrays, and we
>>>>>>>>>>>> directly
>>>>>>> switch to
>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>>>> something to
>>>>>>>>>>> indicate beyond this for NaN.
>>>>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>>> Agreed.
>>>>>>>>>
>>>>>>>>>>> What is the test input you see an issue and what is the
>>>>>>>>>>> actual error
>>>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>>>>> first check in
>>>>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>>>>> percentile,
>>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>>>> entries, so the
>>>>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>>>>> index 5 in the
>>>>>>>>>> final array.
>>>>>>>>>>
>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>>>>> NaN, 8 }. So
>>>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>>>> at all in
>>>>>>>>>> the selection algorithm and the result for 50% should be the
>>>>>>>>>> first NaN
>>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>>>> we *do*
>>>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>>>>> NaN }, so
>>>>>>>>>> the result is 5.
>>>>>>>>>>
>>>>>>>>>> Agreed. just verified by putting back the earlier 
>>>>>>>>>> insertionSort
>>>>>>> function.
>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>>>>> get as a
>>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>>>
>>>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>>>>> insertionSort
>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>>>> transformation
>>>>>>> of
>>>>>>>>> NaN to +/-Inf
>>>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>>>> NaNStrategy
>>>>>>>> documentation, saying somethig like:
>>>>>>>>
>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>>>> *replaced* by
>>>>>>>> +/- infinity, hence the results will be computed as if
>>>>>>>> infinities were
>>>>>>>> there in the first place.
>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one 
>>>>>>> specific
>>>>>>> purpose - to turn extend the ordering on doubles a total 
>>>>>>> ordering.
>>>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>>>> ordering.  Side effects on the input array don't make sense in
>>>>>>> this
>>>>>>> context.  The use case for which this class was created was 
>>>>>>> rank
>>>>>>> transformations.  Returning infinities in rank transformations
>>>>>>> would
>>>>>>> blow up computations in these classes.  If a floating point
>>>>>>> computations need to transform NaNs, the specific stats / other
>>>>>>> classes that do this transformation should perform and document
>>>>>>> the
>>>>>>> behavior.
>>>>>>>
>>>>>>> Phil
>>>>>> OK. Agreed realized it later.  Hence i have not touched
>>>>>> NaNStrategy in my
>>>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>>>> transform NaNs
>>>>>> but it uses NanStrategy.  Please let me know on this as this
>>>>>> trasnformation
>>>>>> itself
>>>>>> can be used in different places.
>>>>>
>>>>> Honestly, I don't see the value of this.  I would be more 
>>>>> favorable
>>>>> to an addition to MathArrays supporting NaN (or infinity) 
>>>>> filtering
>>>>> / transformation.  Several years back we made the decision to 
>>>>> just
>>>>> let NaNs propagate in computations, which basically means that if
>>>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>>>> If we do get into the business of filtering NaNs (or infinities 
>>>>> for
>>>>> that matter), I think it is probably best to do that in 
>>>>> MathArrays
>>>>> or somesuch - i.e., don't decorate APIs with NaN or
>>>>> infinity-handling handling descriptions.  That would be a huge 
>>>>> task
>>>>> and would likely end up bleeding implementation detail into the
>>>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>>>> transformations to be well-defined.  NaN-handling in floating 
>>>>> point
>>>>> computations is defined and as long as we fully document what our
>>>>> algorithms do, I think it is fair enough to "let the NaNs fall 
>>>>> where
>>>>> they may" - which basically means users need to do the filtering 
>>>>> /
>>>>> transformation themselves.
>>>>
>>>> So, do I understand correctly that it's OK to add the "remove" and
>>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>>>> Or does this thread conclude that we don't have a use for this 
>>>> within
>>>> Commons Math?
>>>
>>> You raise a good point here.  Personally, I don't see an internal
>>> use and if we are sticking with MathArrays including only things we
>>> have internal use for, I would say no, don't include them.
>>
>> A third way to look at it would be that since some algorithms 
>> require
>> being passed "clean" data (e.g. without NaNs), we could provide some
>> simple means to do so. A user could then write, e.g.
>> ---
>> double[] data = { 1, Double.NaN, 3, 6, 5 };
>>
>> Percentile p = new Percentile();
>> double q = p.evaluate(MathArrays.replace(data, Double.NaN, 
>> Double.POSITIVE_INFINITY), 0.1);
>> ---
>>
>> Then, if we provide that utility, is it still necessary to 
>> complicate the Percentile
>> code with a NaNStrategy parameter?
>
> I may be missing something, but it seems to me that percentile does
> need a NaNStrategy in the presence if NaNs like the other order
> statistics do (basically to define their position In
> the ordering).  It doesn't logically need more than that.  I would be
> ok defaulting the strategy or even returning NaN in the presence of
> NaNs (or whenever interpolation involves a NaN). All of this can be
> documented.

I am _surely_ missing something because I don't figure out how NaNs can
be useful when computing those statistics... :-}

> I still don't see a within-math use for recoding methods; though I do
> see the value from a user perspective.  The problem is if we open 
> this
> door, we risk MathArrays becoming a mini commons lang.

If Commons Lang already provides a feature, then we indeed don't need
to offer the same within CM (if there is no internal use).
Can someone confirm?


Otherwise we could define a new inner interface in MathArrays, similar
to "MathArrays.Function":
---
public class MathArrays {

     // ...

     public interface Transformer {
         /** Transform the whole array */
         double[] transform(double[] input);
         /** Transform the array in the interval [from, to) and return 
the whole array */
         double[] transform(double[] input, int from, int to);
         /** Transform the array in the interval [from, to) and return 
the transformed sub-array */
         double[] transformAndSplice(double[] input, int from, int to);
     }
}
---

Then we can provide "replace" and "remove" transformers through factory 
methods
(example/untested code):
---
public class MathArrays {

     // ...

     public static Transformer createReplaceTransformer(final double 
old,
                                                        final double 
replace) {
         return new Transformer() {
             public double[] transform(double[] input) {
                 final int len = input.length;
                 final double[] output = new double[len];
                 for (int i = 0; i < len; i++) {
                     output[i] = Precision.equalsIncludingNaN(input[i], 
old) ?
                         replace : input[i];
                 }
                 return output;
             }

             // etc.
         };
     }
}
---


Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Phil Steitz <ph...@gmail.com>.

> On Jun 29, 2014, at 3:41 PM, Gilles <gi...@harfang.homelinux.org> wrote:
> 
>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>>> <ph...@gmail.com> wrote:
>>>>> 
>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>>> <lu...@spaceroots.org>
>>>>>> wrote:
>>>>>>>>> Hi Venkat,
>>>>>>>>> 
>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>> wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>> 
>>>>>>>>>>> While looking further in Percentile class for MATH-1120, I
>>>>>>>>>>> have found
>>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>>> NaNStrategy.FIXED
>>>>>> should
>>>>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>>> KthSelector.select
>>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of
>>>>>>>>>>> the small
>>>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>>>> closer to
>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
>>>>>>>>>>> occur in the
>>>>>>>>>>> test cases because they use very short arrays, and we
>>>>>>>>>>> directly
>>>>>> switch to
>>>>>>>>>>> this part of the select method.
>>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>>> something to
>>>>>>>>>> indicate beyond this for NaN.
>>>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>> Agreed.
>>>>>>>> 
>>>>>>>>>> What is the test input you see an issue and what is the
>>>>>>>>>> actual error
>>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>>>> first check in
>>>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>>>> percentile,
>>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>>> entries, so the
>>>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>>>> index 5 in the
>>>>>>>>> final array.
>>>>>>>>> 
>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>>>> NaN, 8 }. So
>>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>>> at all in
>>>>>>>>> the selection algorithm and the result for 50% should be the
>>>>>>>>> first NaN
>>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>>> we *do*
>>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>>>> NaN }, so
>>>>>>>>> the result is 5.
>>>>>>>>> 
>>>>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>>>>> function.
>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>>>> get as a
>>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>> 
>>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>>>> insertionSort
>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>>> transformation
>>>>>> of
>>>>>>>> NaN to +/-Inf
>>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>>> NaNStrategy
>>>>>>> documentation, saying somethig like:
>>>>>>> 
>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>>> *replaced* by
>>>>>>> +/- infinity, hence the results will be computed as if
>>>>>>> infinities were
>>>>>>> there in the first place.
>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>>>>> purpose - to turn extend the ordering on doubles a total ordering.
>>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>>> ordering.  Side effects on the input array don't make sense in
>>>>>> this
>>>>>> context.  The use case for which this class was created was rank
>>>>>> transformations.  Returning infinities in rank transformations
>>>>>> would
>>>>>> blow up computations in these classes.  If a floating point
>>>>>> computations need to transform NaNs, the specific stats / other
>>>>>> classes that do this transformation should perform and document
>>>>>> the
>>>>>> behavior.
>>>>>> 
>>>>>> Phil
>>>>> OK. Agreed realized it later.  Hence i have not touched
>>>>> NaNStrategy in my
>>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>>> transform NaNs
>>>>> but it uses NanStrategy.  Please let me know on this as this
>>>>> trasnformation
>>>>> itself
>>>>> can be used in different places.
>>>> 
>>>> Honestly, I don't see the value of this.  I would be more favorable
>>>> to an addition to MathArrays supporting NaN (or infinity) filtering
>>>> / transformation.  Several years back we made the decision to just
>>>> let NaNs propagate in computations, which basically means that if
>>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>>> If we do get into the business of filtering NaNs (or infinities for
>>>> that matter), I think it is probably best to do that in MathArrays
>>>> or somesuch - i.e., don't decorate APIs with NaN or
>>>> infinity-handling handling descriptions.  That would be a huge task
>>>> and would likely end up bleeding implementation detail into the
>>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>>> transformations to be well-defined.  NaN-handling in floating point
>>>> computations is defined and as long as we fully document what our
>>>> algorithms do, I think it is fair enough to "let the NaNs fall where
>>>> they may" - which basically means users need to do the filtering /
>>>> transformation themselves.
>>> 
>>> So, do I understand correctly that it's OK to add the "remove" and
>>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>>> Or does this thread conclude that we don't have a use for this within
>>> Commons Math?
>> 
>> You raise a good point here.  Personally, I don't see an internal
>> use and if we are sticking with MathArrays including only things we
>> have internal use for, I would say no, don't include them.
> 
> A third way to look at it would be that since some algorithms require
> being passed "clean" data (e.g. without NaNs), we could provide some
> simple means to do so. A user could then write, e.g.
> ---
> double[] data = { 1, Double.NaN, 3, 6, 5 };
> 
> Percentile p = new Percentile();
> double q = p.evaluate(MathArrays.replace(data, Double.NaN, Double.POSITIVE_INFINITY), 0.1);
> ---
> 
> Then, if we provide that utility, is it still necessary to complicate the Percentile
> code with a NaNStrategy parameter?

I may be missing something, but it seems to me that percentile does need a NaNStrategy in the presence if NaNs like the other order statistics do (basically to define their position In 
the ordering).  It doesn't logically need more than that.  I would be ok defaulting the strategy or even returning NaN in the presence of NaNs (or whenever interpolation involves a NaN). All of this can be documented.

I still don't see a within-math use for recoding methods; though I do see the value from a user perspective.  The problem is if we open this door, we risk MathArrays becoming a mini commons lang.

Phil
> 
> 
> Gilles
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Gilles <gi...@harfang.homelinux.org>.
On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
> On 6/29/14, 2:30 PM, Gilles wrote:
>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>>> <ph...@gmail.com> wrote:
>>>>
>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>>> <lu...@spaceroots.org>
>>>>> wrote:
>>>>>>>> Hi Venkat,
>>>>>>>>
>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>> wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> While looking further in Percentile class for MATH-1120, I
>>>>>>>>>> have found
>>>>>>>>>> another problem in the current implementation.
>>>>>>>>>> NaNStrategy.FIXED
>>>>> should
>>>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>>> KthSelector.select
>>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of
>>>>>>>>>> the small
>>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>>> closer to
>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
>>>>>>>>>> occur in the
>>>>>>>>>> test cases because they use very short arrays, and we
>>>>>>>>>> directly
>>>>> switch to
>>>>>>>>>> this part of the select method.
>>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>>> something to
>>>>>>>>> indicate beyond this for NaN.
>>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>>
>>>>>>> Agreed.
>>>>>>>
>>>>>>>>> What is the test input you see an issue and what is the
>>>>>>>>> actual error
>>>>>>>>> you are seeing. Please share the test case.
>>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>>> first check in
>>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>>> percentile,
>>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>>> entries, so the
>>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>>> index 5 in the
>>>>>>>> final array.
>>>>>>>>
>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>>> NaN, 8 }. So
>>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>>> at all in
>>>>>>>> the selection algorithm and the result for 50% should be the
>>>>>>>> first NaN
>>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>>> we *do*
>>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>>> NaN }, so
>>>>>>>> the result is 5.
>>>>>>>>
>>>>>>>> Agreed. just verified by putting back the earlier 
>>>>>>>> insertionSort
>>>>> function.
>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>>> get as a
>>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>>
>>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>>> insertionSort
>>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>>> transformation
>>>>> of
>>>>>>> NaN to +/-Inf
>>>>>> I'm fine with it, as long as we clearly state it in the
>>>>>> NaNStrategy
>>>>>> documentation, saying somethig like:
>>>>>>
>>>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>>> *replaced* by
>>>>>>  +/- infinity, hence the results will be computed as if
>>>>>> infinities were
>>>>>>  there in the first place.
>>>>> -1 - this is mixing concerns.  NaNStrategy exists for one 
>>>>> specific
>>>>> purpose - to turn extend the ordering on doubles a total 
>>>>> ordering.
>>>>> Strategies prescribe only how NaNs are to be treated in the
>>>>> ordering.  Side effects on the input array don't make sense in
>>>>> this
>>>>> context.  The use case for which this class was created was rank
>>>>> transformations.  Returning infinities in rank transformations
>>>>> would
>>>>> blow up computations in these classes.  If a floating point
>>>>> computations need to transform NaNs, the specific stats / other
>>>>> classes that do this transformation should perform and document
>>>>> the
>>>>> behavior.
>>>>>
>>>>> Phil
>>>>>
>>>> OK. Agreed realized it later.  Hence i have not touched
>>>> NaNStrategy in my
>>>> patch(MATH-1132) . Now i have added a separate transformer to
>>>> transform NaNs
>>>> but it uses NanStrategy.  Please let me know on this as this
>>>> trasnformation
>>>> itself
>>>> can be used in different places.
>>>
>>> Honestly, I don't see the value of this.  I would be more favorable
>>> to an addition to MathArrays supporting NaN (or infinity) filtering
>>> / transformation.  Several years back we made the decision to just
>>> let NaNs propagate in computations, which basically means that if
>>> you feed NaNs to [math] you are going to get NaNs out in results.
>>> If we do get into the business of filtering NaNs (or infinities for
>>> that matter), I think it is probably best to do that in MathArrays
>>> or somesuch - i.e., don't decorate APIs with NaN or
>>> infinity-handling handling descriptions.  That would be a huge task
>>> and would likely end up bleeding implementation detail into the
>>> APIs.  As I said above, NaNStrategy has to exist for rank
>>> transformations to be well-defined.  NaN-handling in floating point
>>> computations is defined and as long as we fully document what our
>>> algorithms do, I think it is fair enough to "let the NaNs fall 
>>> where
>>> they may" - which basically means users need to do the filtering /
>>> transformation themselves.
>>
>> So, do I understand correctly that it's OK to add the "remove" and
>> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
>> Or does this thread conclude that we don't have a use for this 
>> within
>> Commons Math?
>
> You raise a good point here.  Personally, I don't see an internal
> use and if we are sticking with MathArrays including only things we
> have internal use for, I would say no, don't include them.

A third way to look at it would be that since some algorithms require
being passed "clean" data (e.g. without NaNs), we could provide some
simple means to do so. A user could then write, e.g.
---
double[] data = { 1, Double.NaN, 3, 6, 5 };

Percentile p = new Percentile();
double q = p.evaluate(MathArrays.replace(data, Double.NaN, 
Double.POSITIVE_INFINITY), 0.1);
---

Then, if we provide that utility, is it still necessary to complicate 
the Percentile
code with a NaNStrategy parameter?


Gilles


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Phil Steitz <ph...@gmail.com>.
On 6/29/14, 2:30 PM, Gilles wrote:
> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
>> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz
>>> <ph...@gmail.com> wrote:
>>>
>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe
>>>>>> <lu...@spaceroots.org>
>>>> wrote:
>>>>>>> Hi Venkat,
>>>>>>>
>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe
>>>>>>>> <lu...@spaceroots.org>
>>>>>>> wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> While looking further in Percentile class for MATH-1120, I
>>>>>>>>> have found
>>>>>>>>> another problem in the current implementation.
>>>>>>>>> NaNStrategy.FIXED
>>>> should
>>>>>>>>> leave the NaNs in place, but at the end of the
>>>>>>>>> KthSelector.select
>>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of
>>>>>>>>> the small
>>>>>>>>> sub-array. What is really implemented here is therefore
>>>>>>>>> closer to
>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always
>>>>>>>>> occur in the
>>>>>>>>> test cases because they use very short arrays, and we
>>>>>>>>> directly
>>>> switch to
>>>>>>>>> this part of the select method.
>>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>>> If MAXIMAL represent replacing for +inf ; you need
>>>>>>>> something to
>>>>>>>> indicate beyond this for NaN.
>>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>>
>>>>>> Agreed.
>>>>>>
>>>>>>>> What is the test input you see an issue and what is the
>>>>>>>> actual error
>>>>>>>> you are seeing. Please share the test case.
>>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The
>>>>>>> first check in
>>>>>>> the test corresponds to a Percentile configuration at 50%
>>>>>>> percentile,
>>>>>>> and NaNStrategy.FIXED. The array has an odd number of
>>>>>>> entries, so the
>>>>>>> 50% percentile is exactly one of the entries: the one at
>>>>>>> index 5 in the
>>>>>>> final array.
>>>>>>>
>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7,
>>>>>>> NaN, 8 }. So
>>>>>>> for the NaNStrategy.FIXED setting, it should not be modified
>>>>>>> at all in
>>>>>>> the selection algorithm and the result for 50% should be the
>>>>>>> first NaN
>>>>>>> of the array, at index 5. In fact, due to the Arrays.sort,
>>>>>>> we *do*
>>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN,
>>>>>>> NaN }, so
>>>>>>> the result is 5.
>>>>>>>
>>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>>> function.
>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we
>>>>>>> get as a
>>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>>
>>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>>> insertionSort
>>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for
>>>>>> transformation
>>>> of
>>>>>> NaN to +/-Inf
>>>>> I'm fine with it, as long as we clearly state it in the
>>>>> NaNStrategy
>>>>> documentation, saying somethig like:
>>>>>
>>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively
>>>>> *replaced* by
>>>>>  +/- infinity, hence the results will be computed as if
>>>>> infinities were
>>>>>  there in the first place.
>>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>>> purpose - to turn extend the ordering on doubles a total ordering.
>>>> Strategies prescribe only how NaNs are to be treated in the
>>>> ordering.  Side effects on the input array don't make sense in
>>>> this
>>>> context.  The use case for which this class was created was rank
>>>> transformations.  Returning infinities in rank transformations
>>>> would
>>>> blow up computations in these classes.  If a floating point
>>>> computations need to transform NaNs, the specific stats / other
>>>> classes that do this transformation should perform and document
>>>> the
>>>> behavior.
>>>>
>>>> Phil
>>>>
>>> OK. Agreed realized it later.  Hence i have not touched
>>> NaNStrategy in my
>>> patch(MATH-1132) . Now i have added a separate transformer to
>>> transform NaNs
>>> but it uses NanStrategy.  Please let me know on this as this
>>> trasnformation
>>> itself
>>> can be used in different places.
>>
>> Honestly, I don't see the value of this.  I would be more favorable
>> to an addition to MathArrays supporting NaN (or infinity) filtering
>> / transformation.  Several years back we made the decision to just
>> let NaNs propagate in computations, which basically means that if
>> you feed NaNs to [math] you are going to get NaNs out in results.
>> If we do get into the business of filtering NaNs (or infinities for
>> that matter), I think it is probably best to do that in MathArrays
>> or somesuch - i.e., don't decorate APIs with NaN or
>> infinity-handling handling descriptions.  That would be a huge task
>> and would likely end up bleeding implementation detail into the
>> APIs.  As I said above, NaNStrategy has to exist for rank
>> transformations to be well-defined.  NaN-handling in floating point
>> computations is defined and as long as we fully document what our
>> algorithms do, I think it is fair enough to "let the NaNs fall where
>> they may" - which basically means users need to do the filtering /
>> transformation themselves.
>
> So, do I understand correctly that it's OK to add the "remove" and
> "replace" utility methods (cf. MATH-1130) in "MathArrays"?
> Or does this thread conclude that we don't have a use for this within
> Commons Math?

You raise a good point here.  Personally, I don't see an internal
use and if we are sticking with MathArrays including only things we
have internal use for, I would say no, don't include them. 

Phil
>
> Gilles
>
>>
>> Phil
>>>
>>>>> [...]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Gilles <gi...@harfang.homelinux.org>.
On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote:
> On 6/29/14, 9:48 AM, venkatesha murthy wrote:
>> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz <ph...@gmail.com> 
>> wrote:
>>
>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>> <lu...@spaceroots.org>
>>> wrote:
>>>>>> Hi Venkat,
>>>>>>
>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe 
>>>>>>> <lu...@spaceroots.org>
>>>>>> wrote:
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> While looking further in Percentile class for MATH-1120, I 
>>>>>>>> have found
>>>>>>>> another problem in the current implementation. 
>>>>>>>> NaNStrategy.FIXED
>>> should
>>>>>>>> leave the NaNs in place, but at the end of the 
>>>>>>>> KthSelector.select
>>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of the 
>>>>>>>> small
>>>>>>>> sub-array. What is really implemented here is therefore closer 
>>>>>>>> to
>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur 
>>>>>>>> in the
>>>>>>>> test cases because they use very short arrays, and we directly
>>> switch to
>>>>>>>> this part of the select method.
>>>>>>> Are NaNs considered higher than +Inf ?
>>>>>>> If MAXIMAL represent replacing for +inf ; you need something to
>>>>>>> indicate beyond this for NaN.
>>>>>> Well, we can just keep the NaN themselves and handled them
>>>>>> appropriately, hoping not to trash performances too much.
>>>>>>
>>>>> Agreed.
>>>>>
>>>>>>> What is the test input you see an issue and what is the actual 
>>>>>>> error
>>>>>>> you are seeing. Please share the test case.
>>>>>> Just look at PercentileTest.testReplaceNanInRange(). The first 
>>>>>> check in
>>>>>> the test corresponds to a Percentile configuration at 50% 
>>>>>> percentile,
>>>>>> and NaNStrategy.FIXED. The array has an odd number of entries, 
>>>>>> so the
>>>>>> 50% percentile is exactly one of the entries: the one at index 5 
>>>>>> in the
>>>>>> final array.
>>>>>>
>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 
>>>>>> }. So
>>>>>> for the NaNStrategy.FIXED setting, it should not be modified at 
>>>>>> all in
>>>>>> the selection algorithm and the result for 50% should be the 
>>>>>> first NaN
>>>>>> of the array, at index 5. In fact, due to the Arrays.sort, we 
>>>>>> *do*
>>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN 
>>>>>> }, so
>>>>>> the result is 5.
>>>>>>
>>>>>> Agreed. just verified by putting back the earlier insertionSort
>>> function.
>>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get 
>>>>>> as a
>>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>>
>>>>>> If we agree to leave FIXED as unchanged behaviour with your
>>> insertionSort
>>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for 
>>>>> transformation
>>> of
>>>>> NaN to +/-Inf
>>>> I'm fine with it, as long as we clearly state it in the 
>>>> NaNStrategy
>>>> documentation, saying somethig like:
>>>>
>>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* 
>>>> by
>>>>  +/- infinity, hence the results will be computed as if infinities 
>>>> were
>>>>  there in the first place.
>>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>>> purpose - to turn extend the ordering on doubles a total ordering.
>>> Strategies prescribe only how NaNs are to be treated in the
>>> ordering.  Side effects on the input array don't make sense in this
>>> context.  The use case for which this class was created was rank
>>> transformations.  Returning infinities in rank transformations 
>>> would
>>> blow up computations in these classes.  If a floating point
>>> computations need to transform NaNs, the specific stats / other
>>> classes that do this transformation should perform and document the
>>> behavior.
>>>
>>> Phil
>>>
>> OK. Agreed realized it later.  Hence i have not touched NaNStrategy 
>> in my
>> patch(MATH-1132) . Now i have added a separate transformer to 
>> transform NaNs
>> but it uses NanStrategy.  Please let me know on this as this 
>> trasnformation
>> itself
>> can be used in different places.
>
> Honestly, I don't see the value of this.  I would be more favorable
> to an addition to MathArrays supporting NaN (or infinity) filtering
> / transformation.  Several years back we made the decision to just
> let NaNs propagate in computations, which basically means that if
> you feed NaNs to [math] you are going to get NaNs out in results.
> If we do get into the business of filtering NaNs (or infinities for
> that matter), I think it is probably best to do that in MathArrays
> or somesuch - i.e., don't decorate APIs with NaN or
> infinity-handling handling descriptions.  That would be a huge task
> and would likely end up bleeding implementation detail into the
> APIs.  As I said above, NaNStrategy has to exist for rank
> transformations to be well-defined.  NaN-handling in floating point
> computations is defined and as long as we fully document what our
> algorithms do, I think it is fair enough to "let the NaNs fall where
> they may" - which basically means users need to do the filtering /
> transformation themselves.

So, do I understand correctly that it's OK to add the "remove" and
"replace" utility methods (cf. MATH-1130) in "MathArrays"?
Or does this thread conclude that we don't have a use for this within
Commons Math?

Gilles

>
> Phil
>>
>>>> [...]


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Phil Steitz <ph...@gmail.com>.
On 6/29/14, 9:48 AM, venkatesha murthy wrote:
> On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz <ph...@gmail.com> wrote:
>
>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
>>> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org>
>> wrote:
>>>>> Hi Venkat,
>>>>>
>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org>
>>>>> wrote:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> While looking further in Percentile class for MATH-1120, I have found
>>>>>>> another problem in the current implementation. NaNStrategy.FIXED
>> should
>>>>>>> leave the NaNs in place, but at the end of the KthSelector.select
>>>>>>> method, a call to Arrays.sort moves the NaNs to the end of the small
>>>>>>> sub-array. What is really implemented here is therefore closer to
>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
>>>>>>> test cases because they use very short arrays, and we directly
>> switch to
>>>>>>> this part of the select method.
>>>>>> Are NaNs considered higher than +Inf ?
>>>>>> If MAXIMAL represent replacing for +inf ; you need something to
>>>>>> indicate beyond this for NaN.
>>>>> Well, we can just keep the NaN themselves and handled them
>>>>> appropriately, hoping not to trash performances too much.
>>>>>
>>>> Agreed.
>>>>
>>>>>> What is the test input you see an issue and what is the actual error
>>>>>> you are seeing. Please share the test case.
>>>>> Just look at PercentileTest.testReplaceNanInRange(). The first check in
>>>>> the test corresponds to a Percentile configuration at 50% percentile,
>>>>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
>>>>> 50% percentile is exactly one of the entries: the one at index 5 in the
>>>>> final array.
>>>>>
>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
>>>>> for the NaNStrategy.FIXED setting, it should not be modified at all in
>>>>> the selection algorithm and the result for 50% should be the first NaN
>>>>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
>>>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
>>>>> the result is 5.
>>>>>
>>>>> Agreed. just verified by putting back the earlier insertionSort
>> function.
>>>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
>>>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>>>
>>>>> If we agree to leave FIXED as unchanged behaviour with your
>> insertionSort
>>>> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation
>> of
>>>> NaN to +/-Inf
>>> I'm fine with it, as long as we clearly state it in the NaNStrategy
>>> documentation, saying somethig like:
>>>
>>>  When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* by
>>>  +/- infinity, hence the results will be computed as if infinities were
>>>  there in the first place.
>> -1 - this is mixing concerns.  NaNStrategy exists for one specific
>> purpose - to turn extend the ordering on doubles a total ordering.
>> Strategies prescribe only how NaNs are to be treated in the
>> ordering.  Side effects on the input array don't make sense in this
>> context.  The use case for which this class was created was rank
>> transformations.  Returning infinities in rank transformations would
>> blow up computations in these classes.  If a floating point
>> computations need to transform NaNs, the specific stats / other
>> classes that do this transformation should perform and document the
>> behavior.
>>
>> Phil
>>
> OK. Agreed realized it later.  Hence i have not touched NaNStrategy in my
> patch(MATH-1132) . Now i have added a separate transformer to transform NaNs
> but it uses NanStrategy.  Please let me know on this as this trasnformation
> itself
> can be used in different places.

Honestly, I don't see the value of this.  I would be more favorable
to an addition to MathArrays supporting NaN (or infinity) filtering
/ transformation.  Several years back we made the decision to just
let NaNs propagate in computations, which basically means that if
you feed NaNs to [math] you are going to get NaNs out in results. 
If we do get into the business of filtering NaNs (or infinities for
that matter), I think it is probably best to do that in MathArrays
or somesuch - i.e., don't decorate APIs with NaN or
infinity-handling handling descriptions.  That would be a huge task
and would likely end up bleeding implementation detail into the
APIs.  As I said above, NaNStrategy has to exist for rank
transformations to be well-defined.  NaN-handling in floating point
computations is defined and as long as we fully document what our
algorithms do, I think it is fair enough to "let the NaNs fall where
they may" - which basically means users need to do the filtering /
transformation themselves.

Phil
>
>>>>>>> When I implemented the method, I explicitly avoided calling
>> Arrays.sort
>>>>>>> because it is a general purpose method that is know to be efficient
>> only
>>>>>>> for arrays of at least medium size. In most cases, when the array is
>>>>>>> small one falls back to a non-recursive method like a very simple
>>>>>>> insertion sort, which is faster for smaller arrays.
>>>>>> Please help me understand here; even java primitive Arrays.sort does
>>>>>> an insertion sort for less than 7 elements
>>>>>> (Refer sort1(double x[], int off, int len))
>>>>>> So what is it that the custom insertion sort that does differently or
>>>>>> is advantageous. Is it the value 15 elements?
>>>>> I don't see a reference to 7 elements, neither in the Java6 nor in the
>>>>> Java 7 doc
>>>> Please take a look at the sort1 method where there is a first block in
>> the
>>>> code which clearly mentions len < 7
>>>>     /**
>>>>      * Sorts the specified sub-array of doubles into ascending order.
>>>>      */
>>>>     private static void sort1(double x[], int off, int len) {
>>>>     // Insertion sort on smallest arrays
>>>>     if (len < 7) {
>>>>         for (int i=off; i<len+off; i++)
>>>>         for (int j=i; j>off && x[j-1]>x[j]; j--)
>>>>             swap(x, j, j-1);
>>>>         return;
>>>>     }
>>>>     :
>>>>     :
>>>>     : code continues for the else part
>>>>
>>>> Also the grepcode url
>>>> <
>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29
>>>> indicates the same
>>> OK, you were refering to a specific implementation. I was thinking in
>>> the general case.
>>>
>>>> (and in any case the doc explicitly states the algorithms
>>>>> explained are only implementation notes and are not part of the
>>>>> specification).
>>>>>
>>>> Yes its a part of comments anyways.
>>>>
>>>>> However, the most important part for now is the fact that we control it
>>>>> and may be able to implement different NaN strategies. What we have
>>>>> currently fails.
>>>>>
>>>>> I agree on this  and hence here is my take:
>>>> Leave FIXED as-is and use the earlier insertionSort code (just change
>> the
>>>> name to sort rather than hardcoding it as insertionsort) to handle the
>> case
>>>> you were mentioning
>>> If we leave FIXED it as it is done know, even with insertionSort we do
>>> not really control what happens. It is deterministic as the pivoting and
>>> if/else structure is well defined (both in the selection part and in the
>>> final sort for sub-arrays), but it is untrackable so we can't document
>> it.
>>>> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way
>> we
>>>> have covered both Inf and nan cases.
>>> OK.
>>>
>>>> Use REMOVED as default for all Percentile Estimation Types. (mostly
>>>> influenced by R here perhaps)
>>> This would be fine with me. I have concerns with FIXED only, the other
>>> strategies all seem quite realistic.
>>>
>>> Does anybody else has an advice for FIXED? What are the use case?
>>>
>>> best regards,
>>> Luc
>>>
>>>> best regards,
>>>>> Luc
>>>>>
>>>>>>> In the select
>>>>>>> operation, we know we have small sub-arrays at the call point. Going
>>>>>>> back to the former insertionSort would recover the good behavior for
>>>>>>> small arrays, but would in fact not be sufficient to really
>> implement a
>>>>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave like
>>>>>>> NaNStrategy.MAXIMAL but I did not try yet.
>>>>>>>
>>>>>>> My point here is rather: can we really and should we really implement
>>>>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
>>>>>>> store the original position of the NaNs. It is quite cumbersome.
>>>>>>>
>>>>>>> I wonder what is the use case for NaNStrategy.FIXED at all.
>>>>>>>
>>>>>>> Going further and looking at the use cases for other NaNStrategy
>> values,
>>>>>>> the NaNs are replaced by +/- infinity before sorting, which is OK for
>>>>>>> ranking as we only rely on the indices, but we use the values
>> themselves
>>>>>>> in Percentile. So sometimes, we end up with computing interpolations
>>>>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
>>>>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of
>> the
>>>>>>> NaN we should have retrieved without replacement. So here again, I'm
>> not
>>>>>>> sure we are doing the right thing.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> best regards,
>>>>>>> Luc
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>>>
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by venkatesha murthy <ve...@gmail.com>.
On Sun, Jun 29, 2014 at 1:25 AM, Phil Steitz <ph...@gmail.com> wrote:

> On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
> > Le 25/06/2014 06:05, venkatesha murthy a écrit :
> >> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org>
> wrote:
> >>
> >>> Hi Venkat,
> >>>
> >>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
> >>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org>
> >>> wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> While looking further in Percentile class for MATH-1120, I have found
> >>>>> another problem in the current implementation. NaNStrategy.FIXED
> should
> >>>>> leave the NaNs in place, but at the end of the KthSelector.select
> >>>>> method, a call to Arrays.sort moves the NaNs to the end of the small
> >>>>> sub-array. What is really implemented here is therefore closer to
> >>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
> >>>>> test cases because they use very short arrays, and we directly
> switch to
> >>>>> this part of the select method.
> >>>> Are NaNs considered higher than +Inf ?
> >>>> If MAXIMAL represent replacing for +inf ; you need something to
> >>>> indicate beyond this for NaN.
> >>> Well, we can just keep the NaN themselves and handled them
> >>> appropriately, hoping not to trash performances too much.
> >>>
> >> Agreed.
> >>
> >>>> What is the test input you see an issue and what is the actual error
> >>>> you are seeing. Please share the test case.
> >>> Just look at PercentileTest.testReplaceNanInRange(). The first check in
> >>> the test corresponds to a Percentile configuration at 50% percentile,
> >>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
> >>> 50% percentile is exactly one of the entries: the one at index 5 in the
> >>> final array.
> >>>
> >>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
> >>> for the NaNStrategy.FIXED setting, it should not be modified at all in
> >>> the selection algorithm and the result for 50% should be the first NaN
> >>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
> >>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
> >>> the result is 5.
> >>>
> >>> Agreed. just verified by putting back the earlier insertionSort
> function.
> >>
> >>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
> >>> result Double.POSITIVE_INFINITY instead of Double.NaN.
> >>>
> >>> If we agree to leave FIXED as unchanged behaviour with your
> insertionSort
> >> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation
> of
> >> NaN to +/-Inf
> > I'm fine with it, as long as we clearly state it in the NaNStrategy
> > documentation, saying somethig like:
> >
> >  When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* by
> >  +/- infinity, hence the results will be computed as if infinities were
> >  there in the first place.
>
> -1 - this is mixing concerns.  NaNStrategy exists for one specific
> purpose - to turn extend the ordering on doubles a total ordering.
> Strategies prescribe only how NaNs are to be treated in the
> ordering.  Side effects on the input array don't make sense in this
> context.  The use case for which this class was created was rank
> transformations.  Returning infinities in rank transformations would
> blow up computations in these classes.  If a floating point
> computations need to transform NaNs, the specific stats / other
> classes that do this transformation should perform and document the
> behavior.
>
> Phil
>
OK. Agreed realized it later.  Hence i have not touched NaNStrategy in my
patch(MATH-1132) . Now i have added a separate transformer to transform NaNs
but it uses NanStrategy.  Please let me know on this as this trasnformation
itself
can be used in different places.

>
> >>>>> When I implemented the method, I explicitly avoided calling
> Arrays.sort
> >>>>> because it is a general purpose method that is know to be efficient
> only
> >>>>> for arrays of at least medium size. In most cases, when the array is
> >>>>> small one falls back to a non-recursive method like a very simple
> >>>>> insertion sort, which is faster for smaller arrays.
> >>>> Please help me understand here; even java primitive Arrays.sort does
> >>>> an insertion sort for less than 7 elements
> >>>> (Refer sort1(double x[], int off, int len))
> >>>> So what is it that the custom insertion sort that does differently or
> >>>> is advantageous. Is it the value 15 elements?
> >>> I don't see a reference to 7 elements, neither in the Java6 nor in the
> >>> Java 7 doc
> >> Please take a look at the sort1 method where there is a first block in
> the
> >> code which clearly mentions len < 7
> >>     /**
> >>      * Sorts the specified sub-array of doubles into ascending order.
> >>      */
> >>     private static void sort1(double x[], int off, int len) {
> >>     // Insertion sort on smallest arrays
> >>     if (len < 7) {
> >>         for (int i=off; i<len+off; i++)
> >>         for (int j=i; j>off && x[j-1]>x[j]; j--)
> >>             swap(x, j, j-1);
> >>         return;
> >>     }
> >>     :
> >>     :
> >>     : code continues for the else part
> >>
> >> Also the grepcode url
> >> <
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29
> >
> >> indicates the same
> > OK, you were refering to a specific implementation. I was thinking in
> > the general case.
> >
> >> (and in any case the doc explicitly states the algorithms
> >>> explained are only implementation notes and are not part of the
> >>> specification).
> >>>
> >> Yes its a part of comments anyways.
> >>
> >>> However, the most important part for now is the fact that we control it
> >>> and may be able to implement different NaN strategies. What we have
> >>> currently fails.
> >>>
> >>> I agree on this  and hence here is my take:
> >> Leave FIXED as-is and use the earlier insertionSort code (just change
> the
> >> name to sort rather than hardcoding it as insertionsort) to handle the
> case
> >> you were mentioning
> > If we leave FIXED it as it is done know, even with insertionSort we do
> > not really control what happens. It is deterministic as the pivoting and
> > if/else structure is well defined (both in the selection part and in the
> > final sort for sub-arrays), but it is untrackable so we can't document
> it.
> >
> >> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way
> we
> >> have covered both Inf and nan cases.
> > OK.
> >
> >> Use REMOVED as default for all Percentile Estimation Types. (mostly
> >> influenced by R here perhaps)
> > This would be fine with me. I have concerns with FIXED only, the other
> > strategies all seem quite realistic.
> >
> > Does anybody else has an advice for FIXED? What are the use case?
> >
> > best regards,
> > Luc
> >
> >> best regards,
> >>> Luc
> >>>
> >>>>> In the select
> >>>>> operation, we know we have small sub-arrays at the call point. Going
> >>>>> back to the former insertionSort would recover the good behavior for
> >>>>> small arrays, but would in fact not be sufficient to really
> implement a
> >>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave like
> >>>>> NaNStrategy.MAXIMAL but I did not try yet.
> >>>>>
> >>>>> My point here is rather: can we really and should we really implement
> >>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
> >>>>> store the original position of the NaNs. It is quite cumbersome.
> >>>>>
> >>>>> I wonder what is the use case for NaNStrategy.FIXED at all.
> >>>>>
> >>>>> Going further and looking at the use cases for other NaNStrategy
> values,
> >>>>> the NaNs are replaced by +/- infinity before sorting, which is OK for
> >>>>> ranking as we only rely on the indices, but we use the values
> themselves
> >>>>> in Percentile. So sometimes, we end up with computing interpolations
> >>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
> >>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of
> the
> >>>>> NaN we should have retrieved without replacement. So here again, I'm
> not
> >>>>> sure we are doing the right thing.
> >>>>>
> >>>>> What do you think?
> >>>>>
> >>>>> best regards,
> >>>>> Luc
> >>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>>
> >>>>
> >>>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >>> For additional commands, e-mail: dev-help@commons.apache.org
> >>>
> >>>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] use case for NaNStrategy.FIXED?

Posted by Phil Steitz <ph...@gmail.com>.
On 6/25/14, 12:12 AM, Luc Maisonobe wrote:
> Le 25/06/2014 06:05, venkatesha murthy a écrit :
>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org> wrote:
>>
>>> Hi Venkat,
>>>
>>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org>
>>> wrote:
>>>>> Hi all,
>>>>>
>>>>> While looking further in Percentile class for MATH-1120, I have found
>>>>> another problem in the current implementation. NaNStrategy.FIXED should
>>>>> leave the NaNs in place, but at the end of the KthSelector.select
>>>>> method, a call to Arrays.sort moves the NaNs to the end of the small
>>>>> sub-array. What is really implemented here is therefore closer to
>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
>>>>> test cases because they use very short arrays, and we directly switch to
>>>>> this part of the select method.
>>>> Are NaNs considered higher than +Inf ?
>>>> If MAXIMAL represent replacing for +inf ; you need something to
>>>> indicate beyond this for NaN.
>>> Well, we can just keep the NaN themselves and handled them
>>> appropriately, hoping not to trash performances too much.
>>>
>> Agreed.
>>
>>>> What is the test input you see an issue and what is the actual error
>>>> you are seeing. Please share the test case.
>>> Just look at PercentileTest.testReplaceNanInRange(). The first check in
>>> the test corresponds to a Percentile configuration at 50% percentile,
>>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
>>> 50% percentile is exactly one of the entries: the one at index 5 in the
>>> final array.
>>>
>>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
>>> for the NaNStrategy.FIXED setting, it should not be modified at all in
>>> the selection algorithm and the result for 50% should be the first NaN
>>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
>>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
>>> the result is 5.
>>>
>>> Agreed. just verified by putting back the earlier insertionSort function.
>>
>>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
>>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>>
>>> If we agree to leave FIXED as unchanged behaviour with your insertionSort
>> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation of
>> NaN to +/-Inf
> I'm fine with it, as long as we clearly state it in the NaNStrategy
> documentation, saying somethig like:
>
>  When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* by
>  +/- infinity, hence the results will be computed as if infinities were
>  there in the first place.

-1 - this is mixing concerns.  NaNStrategy exists for one specific
purpose - to turn extend the ordering on doubles a total ordering. 
Strategies prescribe only how NaNs are to be treated in the
ordering.  Side effects on the input array don't make sense in this
context.  The use case for which this class was created was rank
transformations.  Returning infinities in rank transformations would
blow up computations in these classes.  If a floating point
computations need to transform NaNs, the specific stats / other
classes that do this transformation should perform and document the
behavior.

Phil
>
>>>>> When I implemented the method, I explicitly avoided calling Arrays.sort
>>>>> because it is a general purpose method that is know to be efficient only
>>>>> for arrays of at least medium size. In most cases, when the array is
>>>>> small one falls back to a non-recursive method like a very simple
>>>>> insertion sort, which is faster for smaller arrays.
>>>> Please help me understand here; even java primitive Arrays.sort does
>>>> an insertion sort for less than 7 elements
>>>> (Refer sort1(double x[], int off, int len))
>>>> So what is it that the custom insertion sort that does differently or
>>>> is advantageous. Is it the value 15 elements?
>>> I don't see a reference to 7 elements, neither in the Java6 nor in the
>>> Java 7 doc
>> Please take a look at the sort1 method where there is a first block in the
>> code which clearly mentions len < 7
>>     /**
>>      * Sorts the specified sub-array of doubles into ascending order.
>>      */
>>     private static void sort1(double x[], int off, int len) {
>>     // Insertion sort on smallest arrays
>>     if (len < 7) {
>>         for (int i=off; i<len+off; i++)
>>         for (int j=i; j>off && x[j-1]>x[j]; j--)
>>             swap(x, j, j-1);
>>         return;
>>     }
>>     :
>>     :
>>     : code continues for the else part
>>
>> Also the grepcode url
>> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29>
>> indicates the same
> OK, you were refering to a specific implementation. I was thinking in
> the general case.
>
>> (and in any case the doc explicitly states the algorithms
>>> explained are only implementation notes and are not part of the
>>> specification).
>>>
>> Yes its a part of comments anyways.
>>
>>> However, the most important part for now is the fact that we control it
>>> and may be able to implement different NaN strategies. What we have
>>> currently fails.
>>>
>>> I agree on this  and hence here is my take:
>> Leave FIXED as-is and use the earlier insertionSort code (just change the
>> name to sort rather than hardcoding it as insertionsort) to handle the case
>> you were mentioning
> If we leave FIXED it as it is done know, even with insertionSort we do
> not really control what happens. It is deterministic as the pivoting and
> if/else structure is well defined (both in the selection part and in the
> final sort for sub-arrays), but it is untrackable so we can't document it.
>
>> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way we
>> have covered both Inf and nan cases.
> OK.
>
>> Use REMOVED as default for all Percentile Estimation Types. (mostly
>> influenced by R here perhaps)
> This would be fine with me. I have concerns with FIXED only, the other
> strategies all seem quite realistic.
>
> Does anybody else has an advice for FIXED? What are the use case?
>
> best regards,
> Luc
>
>> best regards,
>>> Luc
>>>
>>>>> In the select
>>>>> operation, we know we have small sub-arrays at the call point. Going
>>>>> back to the former insertionSort would recover the good behavior for
>>>>> small arrays, but would in fact not be sufficient to really implement a
>>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave like
>>>>> NaNStrategy.MAXIMAL but I did not try yet.
>>>>>
>>>>> My point here is rather: can we really and should we really implement
>>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
>>>>> store the original position of the NaNs. It is quite cumbersome.
>>>>>
>>>>> I wonder what is the use case for NaNStrategy.FIXED at all.
>>>>>
>>>>> Going further and looking at the use cases for other NaNStrategy values,
>>>>> the NaNs are replaced by +/- infinity before sorting, which is OK for
>>>>> ranking as we only rely on the indices, but we use the values themselves
>>>>> in Percentile. So sometimes, we end up with computing interpolations
>>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
>>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of the
>>>>> NaN we should have retrieved without replacement. So here again, I'm not
>>>>> sure we are doing the right thing.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> best regards,
>>>>> Luc
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Luc Maisonobe <lu...@spaceroots.org>.
Le 25/06/2014 06:05, venkatesha murthy a écrit :
> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org> wrote:
> 
>> Hi Venkat,
>>
>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org>
>> wrote:
>>>> Hi all,
>>>>
>>>> While looking further in Percentile class for MATH-1120, I have found
>>>> another problem in the current implementation. NaNStrategy.FIXED should
>>>> leave the NaNs in place, but at the end of the KthSelector.select
>>>> method, a call to Arrays.sort moves the NaNs to the end of the small
>>>> sub-array. What is really implemented here is therefore closer to
>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
>>>> test cases because they use very short arrays, and we directly switch to
>>>> this part of the select method.
>>> Are NaNs considered higher than +Inf ?
>>> If MAXIMAL represent replacing for +inf ; you need something to
>>> indicate beyond this for NaN.
>>
>> Well, we can just keep the NaN themselves and handled them
>> appropriately, hoping not to trash performances too much.
>>
> Agreed.
> 
>>
>>> What is the test input you see an issue and what is the actual error
>>> you are seeing. Please share the test case.
>>
>> Just look at PercentileTest.testReplaceNanInRange(). The first check in
>> the test corresponds to a Percentile configuration at 50% percentile,
>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
>> 50% percentile is exactly one of the entries: the one at index 5 in the
>> final array.
>>
>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
>> for the NaNStrategy.FIXED setting, it should not be modified at all in
>> the selection algorithm and the result for 50% should be the first NaN
>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
>> the result is 5.
>>
>> Agreed. just verified by putting back the earlier insertionSort function.
> 
> 
>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>>
>> If we agree to leave FIXED as unchanged behaviour with your insertionSort
> code; then atleast MAXIMAL/MINIMAL should be allowed for transformation of
> NaN to +/-Inf

I'm fine with it, as long as we clearly state it in the NaNStrategy
documentation, saying somethig like:

 When MAXIMAL/MINIMAL are used, the NaNs are effecively *replaced* by
 +/- infinity, hence the results will be computed as if infinities were
 there in the first place.

> 
>>>>
>>>> When I implemented the method, I explicitly avoided calling Arrays.sort
>>>> because it is a general purpose method that is know to be efficient only
>>>> for arrays of at least medium size. In most cases, when the array is
>>>> small one falls back to a non-recursive method like a very simple
>>>> insertion sort, which is faster for smaller arrays.
>>>
>>> Please help me understand here; even java primitive Arrays.sort does
>>> an insertion sort for less than 7 elements
>>> (Refer sort1(double x[], int off, int len))
>>> So what is it that the custom insertion sort that does differently or
>>> is advantageous. Is it the value 15 elements?
>>
>> I don't see a reference to 7 elements, neither in the Java6 nor in the
>> Java 7 doc
> 
> Please take a look at the sort1 method where there is a first block in the
> code which clearly mentions len < 7
>     /**
>      * Sorts the specified sub-array of doubles into ascending order.
>      */
>     private static void sort1(double x[], int off, int len) {
>     // Insertion sort on smallest arrays
>     if (len < 7) {
>         for (int i=off; i<len+off; i++)
>         for (int j=i; j>off && x[j-1]>x[j]; j--)
>             swap(x, j, j-1);
>         return;
>     }
>     :
>     :
>     : code continues for the else part
> 
> Also the grepcode url
> <http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29>
> indicates the same

OK, you were refering to a specific implementation. I was thinking in
the general case.

> 
> (and in any case the doc explicitly states the algorithms
>> explained are only implementation notes and are not part of the
>> specification).
>>
> Yes its a part of comments anyways.
> 
>>
>> However, the most important part for now is the fact that we control it
>> and may be able to implement different NaN strategies. What we have
>> currently fails.
>>
>> I agree on this  and hence here is my take:
> Leave FIXED as-is and use the earlier insertionSort code (just change the
> name to sort rather than hardcoding it as insertionsort) to handle the case
> you were mentioning

If we leave FIXED it as it is done know, even with insertionSort we do
not really control what happens. It is deterministic as the pivoting and
if/else structure is well defined (both in the selection part and in the
final sort for sub-arrays), but it is untrackable so we can't document it.

> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way we
> have covered both Inf and nan cases.

OK.

> Use REMOVED as default for all Percentile Estimation Types. (mostly
> influenced by R here perhaps)

This would be fine with me. I have concerns with FIXED only, the other
strategies all seem quite realistic.

Does anybody else has an advice for FIXED? What are the use case?

best regards,
Luc

> 
> best regards,
>> Luc
>>
>>>
>>>> In the select
>>>> operation, we know we have small sub-arrays at the call point. Going
>>>> back to the former insertionSort would recover the good behavior for
>>>> small arrays, but would in fact not be sufficient to really implement a
>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave like
>>>> NaNStrategy.MAXIMAL but I did not try yet.
>>>>
>>>> My point here is rather: can we really and should we really implement
>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
>>>> store the original position of the NaNs. It is quite cumbersome.
>>>>
>>>> I wonder what is the use case for NaNStrategy.FIXED at all.
>>>>
>>>> Going further and looking at the use cases for other NaNStrategy values,
>>>> the NaNs are replaced by +/- infinity before sorting, which is OK for
>>>> ranking as we only rely on the indices, but we use the values themselves
>>>> in Percentile. So sometimes, we end up with computing interpolations
>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of the
>>>> NaN we should have retrieved without replacement. So here again, I'm not
>>>> sure we are doing the right thing.
>>>>
>>>> What do you think?
>>>>
>>>> best regards,
>>>> Luc
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by venkatesha murthy <ve...@gmail.com>.
On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org> wrote:

> Hi Venkat,
>
> Le 23/06/2014 21:08, venkatesha murthy a écrit :
> > On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org>
> wrote:
> >> Hi all,
> >>
> >> While looking further in Percentile class for MATH-1120, I have found
> >> another problem in the current implementation. NaNStrategy.FIXED should
> >> leave the NaNs in place, but at the end of the KthSelector.select
> >> method, a call to Arrays.sort moves the NaNs to the end of the small
> >> sub-array. What is really implemented here is therefore closer to
> >> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
> >> test cases because they use very short arrays, and we directly switch to
> >> this part of the select method.
> > Are NaNs considered higher than +Inf ?
> > If MAXIMAL represent replacing for +inf ; you need something to
> > indicate beyond this for NaN.
>
> Well, we can just keep the NaN themselves and handled them
> appropriately, hoping not to trash performances too much.
>
Agreed.

>
> > What is the test input you see an issue and what is the actual error
> > you are seeing. Please share the test case.
>
> Just look at PercentileTest.testReplaceNanInRange(). The first check in
> the test corresponds to a Percentile configuration at 50% percentile,
> and NaNStrategy.FIXED. The array has an odd number of entries, so the
> 50% percentile is exactly one of the entries: the one at index 5 in the
> final array.
>
> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
> for the NaNStrategy.FIXED setting, it should not be modified at all in
> the selection algorithm and the result for 50% should be the first NaN
> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
> the result is 5.
>
> Agreed. just verified by putting back the earlier insertionSort function.


> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
> result Double.POSITIVE_INFINITY instead of Double.NaN.
>
> If we agree to leave FIXED as unchanged behaviour with your insertionSort
code; then atleast MAXIMAL/MINIMAL should be allowed for transformation of
NaN to +/-Inf

> >>
> >> When I implemented the method, I explicitly avoided calling Arrays.sort
> >> because it is a general purpose method that is know to be efficient only
> >> for arrays of at least medium size. In most cases, when the array is
> >> small one falls back to a non-recursive method like a very simple
> >> insertion sort, which is faster for smaller arrays.
> >
> > Please help me understand here; even java primitive Arrays.sort does
> > an insertion sort for less than 7 elements
> > (Refer sort1(double x[], int off, int len))
> > So what is it that the custom insertion sort that does differently or
> > is advantageous. Is it the value 15 elements?
>
> I don't see a reference to 7 elements, neither in the Java6 nor in the
> Java 7 doc

Please take a look at the sort1 method where there is a first block in the
code which clearly mentions len < 7
    /**
     * Sorts the specified sub-array of doubles into ascending order.
     */
    private static void sort1(double x[], int off, int len) {
    // Insertion sort on smallest arrays
    if (len < 7) {
        for (int i=off; i<len+off; i++)
        for (int j=i; j>off && x[j-1]>x[j]; j--)
            swap(x, j, j-1);
        return;
    }
    :
    :
    : code continues for the else part

Also the grepcode url
<http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29>
indicates the same

(and in any case the doc explicitly states the algorithms
> explained are only implementation notes and are not part of the
> specification).
>
Yes its a part of comments anyways.

>
> However, the most important part for now is the fact that we control it
> and may be able to implement different NaN strategies. What we have
> currently fails.
>
> I agree on this  and hence here is my take:
Leave FIXED as-is and use the earlier insertionSort code (just change the
name to sort rather than hardcoding it as insertionsort) to handle the case
you were mentioning
Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way we
have covered both Inf and nan cases.
Use REMOVED as default for all Percentile Estimation Types. (mostly
influenced by R here perhaps)

best regards,
> Luc
>
> >
> >> In the select
> >> operation, we know we have small sub-arrays at the call point. Going
> >> back to the former insertionSort would recover the good behavior for
> >> small arrays, but would in fact not be sufficient to really implement a
> >> NaNStrategy.FIXED. I guess it would be simple to make it behave like
> >> NaNStrategy.MAXIMAL but I did not try yet.
> >>
> >> My point here is rather: can we really and should we really implement
> >> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
> >> store the original position of the NaNs. It is quite cumbersome.
> >>
> >> I wonder what is the use case for NaNStrategy.FIXED at all.
> >>
> >> Going further and looking at the use cases for other NaNStrategy values,
> >> the NaNs are replaced by +/- infinity before sorting, which is OK for
> >> ranking as we only rely on the indices, but we use the values themselves
> >> in Percentile. So sometimes, we end up with computing interpolations
> >> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
> >> x[k+1] has been changed to +infinity, we get +infinity, instead of the
> >> NaN we should have retrieved without replacement. So here again, I'm not
> >> sure we are doing the right thing.
> >>
> >> What do you think?
> >>
> >> best regards,
> >> Luc
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: dev-help@commons.apache.org
> >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> > For additional commands, e-mail: dev-help@commons.apache.org
> >
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>

Re: [math] use case for NaNStrategy.FIXED?

Posted by Luc Maisonobe <lu...@spaceroots.org>.
Hi Venkat,

Le 23/06/2014 21:08, venkatesha murthy a écrit :
> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org> wrote:
>> Hi all,
>>
>> While looking further in Percentile class for MATH-1120, I have found
>> another problem in the current implementation. NaNStrategy.FIXED should
>> leave the NaNs in place, but at the end of the KthSelector.select
>> method, a call to Arrays.sort moves the NaNs to the end of the small
>> sub-array. What is really implemented here is therefore closer to
>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
>> test cases because they use very short arrays, and we directly switch to
>> this part of the select method.
> Are NaNs considered higher than +Inf ?
> If MAXIMAL represent replacing for +inf ; you need something to
> indicate beyond this for NaN.

Well, we can just keep the NaN themselves and handled them
appropriately, hoping not to trash performances too much.

> What is the test input you see an issue and what is the actual error
> you are seeing. Please share the test case.

Just look at PercentileTest.testReplaceNanInRange(). The first check in
the test corresponds to a Percentile configuration at 50% percentile,
and NaNStrategy.FIXED. The array has an odd number of entries, so the
50% percentile is exactly one of the entries: the one at index 5 in the
final array.

The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
for the NaNStrategy.FIXED setting, it should not be modified at all in
the selection algorithm and the result for 50% should be the first NaN
of the array, at index 5. In fact, due to the Arrays.sort, we *do*
reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
the result is 5.

If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
result Double.POSITIVE_INFINITY instead of Double.NaN.

>>
>> When I implemented the method, I explicitly avoided calling Arrays.sort
>> because it is a general purpose method that is know to be efficient only
>> for arrays of at least medium size. In most cases, when the array is
>> small one falls back to a non-recursive method like a very simple
>> insertion sort, which is faster for smaller arrays.
> 
> Please help me understand here; even java primitive Arrays.sort does
> an insertion sort for less than 7 elements
> (Refer sort1(double x[], int off, int len))
> So what is it that the custom insertion sort that does differently or
> is advantageous. Is it the value 15 elements?

I don't see a reference to 7 elements, neither in the Java6 nor in the
Java 7 doc (and in any case the doc explicitly states the algorithms
explained are only implementation notes and are not part of the
specification).

However, the most important part for now is the fact that we control it
and may be able to implement different NaN strategies. What we have
currently fails.

best regards,
Luc

> 
>> In the select
>> operation, we know we have small sub-arrays at the call point. Going
>> back to the former insertionSort would recover the good behavior for
>> small arrays, but would in fact not be sufficient to really implement a
>> NaNStrategy.FIXED. I guess it would be simple to make it behave like
>> NaNStrategy.MAXIMAL but I did not try yet.
>>
>> My point here is rather: can we really and should we really implement
>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
>> store the original position of the NaNs. It is quite cumbersome.
>>
>> I wonder what is the use case for NaNStrategy.FIXED at all.
>>
>> Going further and looking at the use cases for other NaNStrategy values,
>> the NaNs are replaced by +/- infinity before sorting, which is OK for
>> ranking as we only rely on the indices, but we use the values themselves
>> in Percentile. So sometimes, we end up with computing interpolations
>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
>> x[k+1] has been changed to +infinity, we get +infinity, instead of the
>> NaN we should have retrieved without replacement. So here again, I'm not
>> sure we are doing the right thing.
>>
>> What do you think?
>>
>> best regards,
>> Luc
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by venkatesha murthy <ve...@gmail.com>.
On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org> wrote:
> Hi all,
>
> While looking further in Percentile class for MATH-1120, I have found
> another problem in the current implementation. NaNStrategy.FIXED should
> leave the NaNs in place, but at the end of the KthSelector.select
> method, a call to Arrays.sort moves the NaNs to the end of the small
> sub-array. What is really implemented here is therefore closer to
> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
> test cases because they use very short arrays, and we directly switch to
> this part of the select method.
Are NaNs considered higher than +Inf ?
If MAXIMAL represent replacing for +inf ; you need something to
indicate beyond this for NaN.
What is the test input you see an issue and what is the actual error
you are seeing. Please share the test case.
>
> When I implemented the method, I explicitly avoided calling Arrays.sort
> because it is a general purpose method that is know to be efficient only
> for arrays of at least medium size. In most cases, when the array is
> small one falls back to a non-recursive method like a very simple
> insertion sort, which is faster for smaller arrays.

Please help me understand here; even java primitive Arrays.sort does
an insertion sort for less than 7 elements
(Refer sort1(double x[], int off, int len))
So what is it that the custom insertion sort that does differently or
is advantageous. Is it the value 15 elements?

> In the select
> operation, we know we have small sub-arrays at the call point. Going
> back to the former insertionSort would recover the good behavior for
> small arrays, but would in fact not be sufficient to really implement a
> NaNStrategy.FIXED. I guess it would be simple to make it behave like
> NaNStrategy.MAXIMAL but I did not try yet.
>
> My point here is rather: can we really and should we really implement
> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
> store the original position of the NaNs. It is quite cumbersome.
>
> I wonder what is the use case for NaNStrategy.FIXED at all.
>
> Going further and looking at the use cases for other NaNStrategy values,
> the NaNs are replaced by +/- infinity before sorting, which is OK for
> ranking as we only rely on the indices, but we use the values themselves
> in Percentile. So sometimes, we end up with computing interpolations
> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number and
> x[k+1] has been changed to +infinity, we get +infinity, instead of the
> NaN we should have retrieved without replacement. So here again, I'm not
> sure we are doing the right thing.
>
> What do you think?
>
> best regards,
> Luc
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org