You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by venkatesha murthy <ve...@gmail.com> on 2014/06/28 05:36:11 UTC

Re: [math] use case for NaNStrategy.FIXED? {Need Opinion for a NaNTransformer]

On Thu, Jun 26, 2014 at 12:39 AM, venkatesha murthy <
venkateshamurthyts@gmail.com> wrote:

>
>
>
> On Wed, Jun 25, 2014 at 12:45 PM, Luc Maisonobe <lu...@spaceroots.org>
> wrote:
>
>> Le 25/06/2014 06:10, venkatesha murthy a écrit :
>> > On Wed, Jun 25, 2014 at 9:35 AM, venkatesha murthy <
>> > venkateshamurthyts@gmail.com> wrote:
>> > Can i put a patch for this change?
>>
>> I think we should still discuss about FIXED.
>>
>> If you want to set up u patch for the documentation of NaNStrategy
>> explaining the replacement done in MAXIMAL/MINIMAL, please go ahead.
>> You should put it in a new JIRA issue for clarity.
>>
>
Done putting MATH-1132 for tracking.



> While documenting this handling of NaN serves right; i was wondering if we
> could *also* add a transform method in NaNStrategy that transforms an array
> with included NaNs to somthing that Maximal/Minimal/Removable wanted.
> Please let know.
>

All: Just reiterating the point that; a transformer that can replace or
remove NaNs as per NaNStrategy is some thing that can alleviate a
dependency on documentation(Though i agree that we should add documenting).
 A class such as Percentile (or for that matter any other class) can
readily make use of this transformer (rather than duplicating switch cases
every where).

Please let know on what you think of this transformer (I have added a small
patch file in MATH-1132 only for reference and discussion).

Thanks
Venkat.

>
>> best regards,
>> Luc
>>
>> >
>> >>
>> >> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe <lu...@spaceroots.org>
>> >> wrote:
>> >>
>> >>> Hi Venkat,
>> >>>
>> >>> Le 23/06/2014 21:08, venkatesha murthy a écrit :
>> >>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc Maisonobe <lu...@spaceroots.org>
>> >>> wrote:
>> >>>>> Hi all,
>> >>>>>
>> >>>>> While looking further in Percentile class for MATH-1120, I have
>> found
>> >>>>> another problem in the current implementation. NaNStrategy.FIXED
>> should
>> >>>>> leave the NaNs in place, but at the end of the KthSelector.select
>> >>>>> method, a call to Arrays.sort moves the NaNs to the end of the small
>> >>>>> sub-array. What is really implemented here is therefore closer to
>> >>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED. This always occur in the
>> >>>>> test cases because they use very short arrays, and we directly
>> switch
>> >>> to
>> >>>>> this part of the select method.
>> >>>> Are NaNs considered higher than +Inf ?
>> >>>> If MAXIMAL represent replacing for +inf ; you need something to
>> >>>> indicate beyond this for NaN.
>> >>>
>> >>> Well, we can just keep the NaN themselves and handled them
>> >>> appropriately, hoping not to trash performances too much.
>> >>>
>> >> Agreed.
>> >>
>> >>>
>> >>>> What is the test input you see an issue and what is the actual error
>> >>>> you are seeing. Please share the test case.
>> >>>
>> >>> Just look at PercentileTest.testReplaceNanInRange(). The first check
>> in
>> >>> the test corresponds to a Percentile configuration at 50% percentile,
>> >>> and NaNStrategy.FIXED. The array has an odd number of entries, so the
>> >>> 50% percentile is exactly one of the entries: the one at index 5 in
>> the
>> >>> final array.
>> >>>
>> >>> The initial ordering is { 0, 1, 2, 3, 4, NaN, NaN, 5, 7, NaN, 8 }. So
>> >>> for the NaNStrategy.FIXED setting, it should not be modified at all in
>> >>> the selection algorithm and the result for 50% should be the first NaN
>> >>> of the array, at index 5. In fact, due to the Arrays.sort, we *do*
>> >>> reorder the array into  { 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so
>> >>> the result is 5.
>> >>>
>> >>> Agreed. just verified by putting back the earlier insertionSort
>> function.
>> >>
>> >>
>> >>> If we use NaNStrategy.MAXIMAL and any quantile above 67%, we get as a
>> >>> result Double.POSITIVE_INFINITY instead of Double.NaN.
>> >>>
>> >>> If we agree to leave FIXED as unchanged behaviour with your
>> insertionSort
>> >> code; then atleast MAXIMAL/MINIMAL should be allowed for
>> transformation of
>> >> NaN to +/-Inf
>> >>
>> >>>  >>
>> >>>>> When I implemented the method, I explicitly avoided calling
>> Arrays.sort
>> >>>>> because it is a general purpose method that is know to be efficient
>> >>> only
>> >>>>> for arrays of at least medium size. In most cases, when the array is
>> >>>>> small one falls back to a non-recursive method like a very simple
>> >>>>> insertion sort, which is faster for smaller arrays.
>> >>>>
>> >>>> Please help me understand here; even java primitive Arrays.sort does
>> >>>> an insertion sort for less than 7 elements
>> >>>> (Refer sort1(double x[], int off, int len))
>> >>>> So what is it that the custom insertion sort that does differently or
>> >>>> is advantageous. Is it the value 15 elements?
>> >>>
>> >>> I don't see a reference to 7 elements, neither in the Java6 nor in the
>> >>> Java 7 doc
>> >>
>> >> Please take a look at the sort1 method where there is a first block in
>> the
>> >> code which clearly mentions len < 7
>> >>     /**
>> >>      * Sorts the specified sub-array of doubles into ascending order.
>> >>      */
>> >>     private static void sort1(double x[], int off, int len) {
>> >>     // Insertion sort on smallest arrays
>> >>     if (len < 7) {
>> >>         for (int i=off; i<len+off; i++)
>> >>         for (int j=i; j>off && x[j-1]>x[j]; j--)
>> >>             swap(x, j, j-1);
>> >>         return;
>> >>     }
>> >>     :
>> >>     :
>> >>     : code continues for the else part
>> >>
>> >> Also the grepcode url
>> >> <
>> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.sort1%28double[]%2Cint%2Cint%29
>> >
>> >> indicates the same
>> >>
>> >> (and in any case the doc explicitly states the algorithms
>> >>> explained are only implementation notes and are not part of the
>> >>> specification).
>> >>>
>> >> Yes its a part of comments anyways.
>> >>
>> >>>
>> >>> However, the most important part for now is the fact that we control
>> it
>> >>> and may be able to implement different NaN strategies. What we have
>> >>> currently fails.
>> >>>
>> >>> I agree on this  and hence here is my take:
>> >> Leave FIXED as-is and use the earlier insertionSort code (just change
>> the
>> >> name to sort rather than hardcoding it as insertionsort) to handle the
>> case
>> >> you were mentioning
>> >> Continue to use MAXIMAL/MINIMAL for +/-Inf transformation and that way
>> we
>> >> have covered both Inf and nan cases.
>> >> Use REMOVED as default for all Percentile Estimation Types. (mostly
>> >> influenced by R here perhaps)
>> >>
>> >> best regards,
>> >>> Luc
>> >>>
>> >>>>
>> >>>>> In the select
>> >>>>> operation, we know we have small sub-arrays at the call point. Going
>> >>>>> back to the former insertionSort would recover the good behavior for
>> >>>>> small arrays, but would in fact not be sufficient to really
>> implement a
>> >>>>> NaNStrategy.FIXED. I guess it would be simple to make it behave like
>> >>>>> NaNStrategy.MAXIMAL but I did not try yet.
>> >>>>>
>> >>>>> My point here is rather: can we really and should we really
>> implement
>> >>>>> NaNStrategy.FIXED? Looking at how it is used elsewhere, it needs to
>> >>>>> store the original position of the NaNs. It is quite cumbersome.
>> >>>>>
>> >>>>> I wonder what is the use case for NaNStrategy.FIXED at all.
>> >>>>>
>> >>>>> Going further and looking at the use cases for other NaNStrategy
>> >>> values,
>> >>>>> the NaNs are replaced by +/- infinity before sorting, which is OK
>> for
>> >>>>> ranking as we only rely on the indices, but we use the values
>> >>> themselves
>> >>>>> in Percentile. So sometimes, we end up with computing interpolations
>> >>>>> like 0.5 * (x[k] + x[k+1]) or similar. If x[k] is a finite number
>> and
>> >>>>> x[k+1] has been changed to +infinity, we get +infinity, instead of
>> the
>> >>>>> NaN we should have retrieved without replacement. So here again, I'm
>> >>> not
>> >>>>> sure we are doing the right thing.
>> >>>>>
>> >>>>> What do you think?
>> >>>>>
>> >>>>> best regards,
>> >>>>> Luc
>> >>>>>
>> >>>>>
>> ---------------------------------------------------------------------
>> >>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> >>>>> For additional commands, e-mail: dev-help@commons.apache.org
>> >>>>>
>> >>>>
>> >>>> ---------------------------------------------------------------------
>> >>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> >>>> For additional commands, e-mail: dev-help@commons.apache.org
>> >>>>
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> >>> For additional commands, e-mail: dev-help@commons.apache.org
>> >>>
>> >>>
>> >>
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>