You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Luc Maisonobe <lu...@spaceroots.org> on 2014/07/01 08:52:43 UTC

Re: [math] use case for NaNStrategy.FIXED?

Le 30/06/2014 22:00, Phil Steitz a écrit :
> On 6/30/14, 12:33 PM, Luc Maisonobe wrote:
>> Le 30/06/2014 19:15, Phil Steitz a écrit :
>>>
>>>> On Jun 30, 2014, at 9:59 AM, Gilles <gi...@harfang.homelinux.org>
>>>> wrote:
>>>>
>>>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>>>> <gi...@harfang.homelinux.org> wrote:
>>>>>>
>>>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: 
>>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun,
>>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz 
>>>>>>>>>> <ph...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le
>>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit
>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc
>>>>>>>>>>>>>>> Maisonobe <lu...@spaceroots.org>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> While looking further in Percentile class
>>>>>>>>>>>>>>>> for MATH-1120, I have found another problem
>>>>>>>>>>>>>>>> in the current implementation. 
>>>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>> should
>>>>>>>>>>>>>>>> leave the NaNs in place, but at the end of
>>>>>>>>>>>>>>>> the KthSelector.select method, a call to
>>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of 
>>>>>>>>>>>>>>>> the small sub-array. What is really
>>>>>>>>>>>>>>>> implemented here is therefore closer to 
>>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
>>>>>>>>>>>>>>>> This always occur in the test cases because
>>>>>>>>>>>>>>>> they use very short arrays, and we 
>>>>>>>>>>>>>>>> directly
>>>>>>>>>>> switch to
>>>>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If
>>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you
>>>>>>>>>>>>>>> need something to indicate beyond this for
>>>>>>>>>>>>>>> NaN.
>>>>>>>>>>>>>> Well, we can just keep the NaN themselves and
>>>>>>>>>>>>>> handled them appropriately, hoping not to trash
>>>>>>>>>>>>>> performances too much.
>>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What is the test input you see an issue and
>>>>>>>>>>>>>>> what is the actual error you are seeing.
>>>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>>>> Just look at
>>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The 
>>>>>>>>>>>>>> first check in the test corresponds to a
>>>>>>>>>>>>>> Percentile configuration at 50% percentile, and
>>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number
>>>>>>>>>>>>>> of entries, so the 50% percentile is exactly
>>>>>>>>>>>>>> one of the entries: the one at index 5 in the 
>>>>>>>>>>>>>> final array.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN,
>>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be
>>>>>>>>>>>>>> modified at all in the selection algorithm and
>>>>>>>>>>>>>> the result for 50% should be the first NaN of
>>>>>>>>>>>>>> the array, at index 5. In fact, due to the
>>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into  {
>>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the
>>>>>>>>>>>>>> result is 5.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Agreed. just verified by putting back the
>>>>>>>>>>>>>> earlier insertionSort
>>>>>>>>>>> function.
>>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile
>>>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of
>>>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> If we agree to leave FIXED as unchanged
>>>>>>>>>>>>>> behaviour with your
>>>>>>>>>>> insertionSort
>>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be
>>>>>>>>>>>>> allowed for transformation
>>>>>>>>>>> of
>>>>>>>>>>>>> NaN to +/-Inf
>>>>>>>>>>>> I'm fine with it, as long as we clearly state it in
>>>>>>>>>>>> the NaNStrategy documentation, saying somethig
>>>>>>>>>>>> like:
>>>>>>>>>>>>
>>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are
>>>>>>>>>>>> effecively *replaced* by +/- infinity, hence the
>>>>>>>>>>>> results will be computed as if infinities were 
>>>>>>>>>>>> there in the first place.
>>>>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for
>>>>>>>>>>> one specific purpose - to turn extend the ordering on
>>>>>>>>>>> doubles a total ordering. Strategies prescribe only
>>>>>>>>>>> how NaNs are to be treated in the ordering.  Side
>>>>>>>>>>> effects on the input array don't make sense in this 
>>>>>>>>>>> context.  The use case for which this class was
>>>>>>>>>>> created was rank transformations.  Returning
>>>>>>>>>>> infinities in rank transformations would blow up
>>>>>>>>>>> computations in these classes.  If a floating point 
>>>>>>>>>>> computations need to transform NaNs, the specific
>>>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>>>> should perform and document the behavior.
>>>>>>>>>>>
>>>>>>>>>>> Phil
>>>>>>>>>> OK. Agreed realized it later.  Hence i have not
>>>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have
>>>>>>>>>> added a separate transformer to transform NaNs but it
>>>>>>>>>> uses NanStrategy.  Please let me know on this as this 
>>>>>>>>>> trasnformation itself can be used in different places.
>>>>>>>>> Honestly, I don't see the value of this.  I would be more
>>>>>>>>> favorable to an addition to MathArrays supporting NaN (or
>>>>>>>>> infinity) filtering / transformation.  Several years back
>>>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>>>> computations, which basically means that if you feed NaNs
>>>>>>>>> to [math] you are going to get NaNs out in results. If we
>>>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>>>> for that matter), I think it is probably best to do that
>>>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs
>>>>>>>>> with NaN or infinity-handling handling descriptions.
>>>>>>>>> That would be a huge task and would likely end up
>>>>>>>>> bleeding implementation detail into the APIs.  As I said
>>>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>>>> to be well-defined.  NaN-handling in floating point 
>>>>>>>>> computations is defined and as long as we fully document
>>>>>>>>> what our algorithms do, I think it is fair enough to "let
>>>>>>>>> the NaNs fall where they may" - which basically means
>>>>>>>>> users need to do the filtering / transformation
>>>>>>>>> themselves.
>>>>>>>> So, do I understand correctly that it's OK to add the
>>>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in
>>>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>>>> have a use for this within Commons Math?
>>>>>>> You raise a good point here.  Personally, I don't see an
>>>>>>> internal use and if we are sticking with MathArrays including
>>>>>>> only things we have internal use for, I would say no, don't
>>>>>>> include them.
>>>>>> A third way to look at it would be that since some algorithms
>>>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>>>> provide some simple means to do so. A user could then write,
>>>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>>>>
>>>>>> Percentile p = new Percentile(); double q =
>>>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>>>> Double.POSITIVE_INFINITY), 0.1); ---
>>>>>>
>>>>>> Then, if we provide that utility, is it still necessary to
>>>>>> complicate the Percentile code with a NaNStrategy parameter?
>>>>> I may be missing something, but it seems to me that percentile
>>>>> does need a NaNStrategy in the presence if NaNs like the other
>>>>> order statistics do (basically to define their position In the
>>>>> ordering).  It doesn't logically need more than that.  I would
>>>>> be ok defaulting the strategy or even returning NaN in the
>>>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>>>> of this can be documented.
>>>> I am _surely_ missing something because I don't figure out how NaNs
>>>> can be useful when computing those statistics... :-}
>>> Good point.  Basically the different strategies allow you to
>>> effectively ignore or work around them so stats can be well-defined.
>>> Without NaNStrategy, order statistics including NaNs are undefined.
>> The problem with Percentile is that the elements are both reordered
>> *and* used when interpolating between two values. Tor the rank methods,
>> only reordering was needed so the various NaNStrategies could be
>> implemented by replacing at will with +/- infinity and a classical sort
>> would do the ordering for us.
>>
>> Here, we do interpolate on the values, so if we first replace NaN with
>> +infinity to put them at the end, we may interpolate between a finite
>> value and +inf and have +inf as a result, whereas if the NaN "value" was
>> preserved and some magic used to sort the NaN, we would end up
>> interpolating with a NaN and get a NaN as a result. Setting up the magic
>> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
>> have a specific sort (in fact not a sort but a selection algorithm,
>> which is the core of the class). Setting up the magic for FIXED seems
>> more difficult.
>>
>> So I guess implementing NaNStrategies may be done in different ways
>> depending on how they will be used by the calling algorithm. I don't yet
>> know if we can set up a general method to be used for all statistics.
>> Looking only at Percentile, replacement with +/- infinity already seems
>> to not be an appropriate solution.
> 
> I don't think we should try to redefine algorithms so they can
> handle NaNs.  When you do floating point computations with NaNs, you
> get NaNs.  That should be understood to be the contract.  For rank
> statistics, such as Percentile, we need to extend the ordering on
> doubles to include them in ordering for the algorithm to be
> well-defined.  If we subsequently do floating point computations
> with the NaNs, the result should be NaN.  I think we should either
> just simplify things and say we *always* return NaN whenever NaNs
> are present for Percentile, or do the following:
> 
> 0) use the prescribed NaN strategy to determine the ordering - so
> NaNs end up either at the bottom, at the top or in their original
> position (FIXED).
> 1) if interpolation involves a NaN element, the result returned is NaN.

This is exactly what I say. If NaNs are in, NaNs should be out. As for
now we do replace them, we get infinities instead of NaNs. I would
prefer getting NaNs there as it is more consistent with the purpose of
NaNs in floating point encoding.

So I think we agree: NaNs should not be changed, but they should be
reordered. Perhaps we could have something simple like NaNStrategy
having a method:

  boolean shouldSwap(int i1, double d1, int i2, double i2)

that would use both the index and values of the element to check if they
should be swapped or not. This method would be used both for traditional
sorting for ranking and in selection algorithms for percentile. I first
thought we should simply have NaN implement Comparator<Double> but this
would not work for FIXED, whereas the above method would work (simply
returning false as soon as there is a NaN, regardless of it being at a
lower or upper index).

I will take a look at this option.

best regards,
Luc

> 
> Phil
>>
>> best regards,
>> Luc
>>
>>>
>>> Phil
>>>
>>>>> I still don't see a within-math use for recoding methods; though
>>>>> I do see the value from a user perspective.  The problem is if we
>>>>> open this door, we risk MathArrays becoming a mini commons lang.
>>>> If Commons Lang already provides a feature, then we indeed don't
>>>> need to offer the same within CM (if there is no internal use). Can
>>>> someone confirm?
>>>>
>>>>
>>>> Otherwise we could define a new inner interface in MathArrays,
>>>> similar to "MathArrays.Function": --- public class MathArrays {
>>>>
>>>> // ...
>>>>
>>>> public interface Transformer { /** Transform the whole array */ 
>>>> double[] transform(double[] input); /** Transform the array in the
>>>> interval [from, to) and return the whole array */ double[]
>>>> transform(double[] input, int from, int to); /** Transform the
>>>> array in the interval [from, to) and return the transformed
>>>> sub-array */ double[] transformAndSplice(double[] input, int from,
>>>> int to); } } ---
>>>>
>>>> Then we can provide "replace" and "remove" transformers through
>>>> factory methods (example/untested code): --- public class
>>>> MathArrays {
>>>>
>>>> // ...
>>>>
>>>> public static Transformer createReplaceTransformer(final double
>>>> old, final double replace) { return new Transformer() { public
>>>> double[] transform(double[] input) { final int len = input.length; 
>>>> final double[] output = new double[len]; for (int i = 0; i < len;
>>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? 
>>>> replace : input[i]; } return output; }
>>>>
>>>> // etc. }; } } ---
>>>>
>>>>
>>>> Gilles
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>>
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>> ---------------------------------------------------------------------
>>>
>>>
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Phil Steitz <ph...@gmail.com>.
On 7/13/14, 2:40 AM, Luc Maisonobe wrote:
> Hi all,
>
> Le 01/07/2014 08:52, Luc Maisonobe a écrit :
>> Le 30/06/2014 22:00, Phil Steitz a écrit :
>>> On 6/30/14, 12:33 PM, Luc Maisonobe wrote:
>>>> Le 30/06/2014 19:15, Phil Steitz a écrit :
>>>>>> On Jun 30, 2014, at 9:59 AM, Gilles <gi...@harfang.homelinux.org>
>>>>>> wrote:
>>>>>>
>>>>>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>>>>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>>>>>> <gi...@harfang.homelinux.org> wrote:
>>>>>>>>
>>>>>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: 
>>>>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun,
>>>>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz 
>>>>>>>>>>>> <ph...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le
>>>>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit
>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc
>>>>>>>>>>>>>>>>> Maisonobe <lu...@spaceroots.org>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> While looking further in Percentile class
>>>>>>>>>>>>>>>>>> for MATH-1120, I have found another problem
>>>>>>>>>>>>>>>>>> in the current implementation. 
>>>>>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>> leave the NaNs in place, but at the end of
>>>>>>>>>>>>>>>>>> the KthSelector.select method, a call to
>>>>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of 
>>>>>>>>>>>>>>>>>> the small sub-array. What is really
>>>>>>>>>>>>>>>>>> implemented here is therefore closer to 
>>>>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
>>>>>>>>>>>>>>>>>> This always occur in the test cases because
>>>>>>>>>>>>>>>>>> they use very short arrays, and we 
>>>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>>> switch to
>>>>>>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If
>>>>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you
>>>>>>>>>>>>>>>>> need something to indicate beyond this for
>>>>>>>>>>>>>>>>> NaN.
>>>>>>>>>>>>>>>> Well, we can just keep the NaN themselves and
>>>>>>>>>>>>>>>> handled them appropriately, hoping not to trash
>>>>>>>>>>>>>>>> performances too much.
>>>>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What is the test input you see an issue and
>>>>>>>>>>>>>>>>> what is the actual error you are seeing.
>>>>>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>>>>>> Just look at
>>>>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The 
>>>>>>>>>>>>>>>> first check in the test corresponds to a
>>>>>>>>>>>>>>>> Percentile configuration at 50% percentile, and
>>>>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number
>>>>>>>>>>>>>>>> of entries, so the 50% percentile is exactly
>>>>>>>>>>>>>>>> one of the entries: the one at index 5 in the 
>>>>>>>>>>>>>>>> final array.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN,
>>>>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be
>>>>>>>>>>>>>>>> modified at all in the selection algorithm and
>>>>>>>>>>>>>>>> the result for 50% should be the first NaN of
>>>>>>>>>>>>>>>> the array, at index 5. In fact, due to the
>>>>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into  {
>>>>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the
>>>>>>>>>>>>>>>> result is 5.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Agreed. just verified by putting back the
>>>>>>>>>>>>>>>> earlier insertionSort
>>>>>>>>>>>>> function.
>>>>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile
>>>>>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of
>>>>>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If we agree to leave FIXED as unchanged
>>>>>>>>>>>>>>>> behaviour with your
>>>>>>>>>>>>> insertionSort
>>>>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be
>>>>>>>>>>>>>>> allowed for transformation
>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> NaN to +/-Inf
>>>>>>>>>>>>>> I'm fine with it, as long as we clearly state it in
>>>>>>>>>>>>>> the NaNStrategy documentation, saying somethig
>>>>>>>>>>>>>> like:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are
>>>>>>>>>>>>>> effecively *replaced* by +/- infinity, hence the
>>>>>>>>>>>>>> results will be computed as if infinities were 
>>>>>>>>>>>>>> there in the first place.
>>>>>>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for
>>>>>>>>>>>>> one specific purpose - to turn extend the ordering on
>>>>>>>>>>>>> doubles a total ordering. Strategies prescribe only
>>>>>>>>>>>>> how NaNs are to be treated in the ordering.  Side
>>>>>>>>>>>>> effects on the input array don't make sense in this 
>>>>>>>>>>>>> context.  The use case for which this class was
>>>>>>>>>>>>> created was rank transformations.  Returning
>>>>>>>>>>>>> infinities in rank transformations would blow up
>>>>>>>>>>>>> computations in these classes.  If a floating point 
>>>>>>>>>>>>> computations need to transform NaNs, the specific
>>>>>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>>>>>> should perform and document the behavior.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Phil
>>>>>>>>>>>> OK. Agreed realized it later.  Hence i have not
>>>>>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have
>>>>>>>>>>>> added a separate transformer to transform NaNs but it
>>>>>>>>>>>> uses NanStrategy.  Please let me know on this as this 
>>>>>>>>>>>> trasnformation itself can be used in different places.
>>>>>>>>>>> Honestly, I don't see the value of this.  I would be more
>>>>>>>>>>> favorable to an addition to MathArrays supporting NaN (or
>>>>>>>>>>> infinity) filtering / transformation.  Several years back
>>>>>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>>>>>> computations, which basically means that if you feed NaNs
>>>>>>>>>>> to [math] you are going to get NaNs out in results. If we
>>>>>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>>>>>> for that matter), I think it is probably best to do that
>>>>>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs
>>>>>>>>>>> with NaN or infinity-handling handling descriptions.
>>>>>>>>>>> That would be a huge task and would likely end up
>>>>>>>>>>> bleeding implementation detail into the APIs.  As I said
>>>>>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>>>>>> to be well-defined.  NaN-handling in floating point 
>>>>>>>>>>> computations is defined and as long as we fully document
>>>>>>>>>>> what our algorithms do, I think it is fair enough to "let
>>>>>>>>>>> the NaNs fall where they may" - which basically means
>>>>>>>>>>> users need to do the filtering / transformation
>>>>>>>>>>> themselves.
>>>>>>>>>> So, do I understand correctly that it's OK to add the
>>>>>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in
>>>>>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>>>>>> have a use for this within Commons Math?
>>>>>>>>> You raise a good point here.  Personally, I don't see an
>>>>>>>>> internal use and if we are sticking with MathArrays including
>>>>>>>>> only things we have internal use for, I would say no, don't
>>>>>>>>> include them.
>>>>>>>> A third way to look at it would be that since some algorithms
>>>>>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>>>>>> provide some simple means to do so. A user could then write,
>>>>>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>>>>>>
>>>>>>>> Percentile p = new Percentile(); double q =
>>>>>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>>>>>> Double.POSITIVE_INFINITY), 0.1); ---
>>>>>>>>
>>>>>>>> Then, if we provide that utility, is it still necessary to
>>>>>>>> complicate the Percentile code with a NaNStrategy parameter?
>>>>>>> I may be missing something, but it seems to me that percentile
>>>>>>> does need a NaNStrategy in the presence if NaNs like the other
>>>>>>> order statistics do (basically to define their position In the
>>>>>>> ordering).  It doesn't logically need more than that.  I would
>>>>>>> be ok defaulting the strategy or even returning NaN in the
>>>>>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>>>>>> of this can be documented.
>>>>>> I am _surely_ missing something because I don't figure out how NaNs
>>>>>> can be useful when computing those statistics... :-}
>>>>> Good point.  Basically the different strategies allow you to
>>>>> effectively ignore or work around them so stats can be well-defined.
>>>>> Without NaNStrategy, order statistics including NaNs are undefined.
>>>> The problem with Percentile is that the elements are both reordered
>>>> *and* used when interpolating between two values. Tor the rank methods,
>>>> only reordering was needed so the various NaNStrategies could be
>>>> implemented by replacing at will with +/- infinity and a classical sort
>>>> would do the ordering for us.
>>>>
>>>> Here, we do interpolate on the values, so if we first replace NaN with
>>>> +infinity to put them at the end, we may interpolate between a finite
>>>> value and +inf and have +inf as a result, whereas if the NaN "value" was
>>>> preserved and some magic used to sort the NaN, we would end up
>>>> interpolating with a NaN and get a NaN as a result. Setting up the magic
>>>> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
>>>> have a specific sort (in fact not a sort but a selection algorithm,
>>>> which is the core of the class). Setting up the magic for FIXED seems
>>>> more difficult.
>>>>
>>>> So I guess implementing NaNStrategies may be done in different ways
>>>> depending on how they will be used by the calling algorithm. I don't yet
>>>> know if we can set up a general method to be used for all statistics.
>>>> Looking only at Percentile, replacement with +/- infinity already seems
>>>> to not be an appropriate solution.
>>> I don't think we should try to redefine algorithms so they can
>>> handle NaNs.  When you do floating point computations with NaNs, you
>>> get NaNs.  That should be understood to be the contract.  For rank
>>> statistics, such as Percentile, we need to extend the ordering on
>>> doubles to include them in ordering for the algorithm to be
>>> well-defined.  If we subsequently do floating point computations
>>> with the NaNs, the result should be NaN.  I think we should either
>>> just simplify things and say we *always* return NaN whenever NaNs
>>> are present for Percentile, or do the following:
>>>
>>> 0) use the prescribed NaN strategy to determine the ordering - so
>>> NaNs end up either at the bottom, at the top or in their original
>>> position (FIXED).
>>> 1) if interpolation involves a NaN element, the result returned is NaN.
>> This is exactly what I say. If NaNs are in, NaNs should be out. As for
>> now we do replace them, we get infinities instead of NaNs. I would
>> prefer getting NaNs there as it is more consistent with the purpose of
>> NaNs in floating point encoding.
>>
>> So I think we agree: NaNs should not be changed, but they should be
>> reordered. Perhaps we could have something simple like NaNStrategy
>> having a method:
>>
>>   boolean shouldSwap(int i1, double d1, int i2, double i2)
>>
>> that would use both the index and values of the element to check if they
>> should be swapped or not. This method would be used both for traditional
>> sorting for ranking and in selection algorithms for percentile. I first
>> thought we should simply have NaN implement Comparator<Double> but this
>> would not work for FIXED, whereas the above method would work (simply
>> returning false as soon as there is a NaN, regardless of it being at a
>> lower or upper index).
>>
>> I will take a look at this option.
> This does not work as is. The problem is lack of transitivity. Suppose
> that we have three indices i < j < k, with a[i] > a[k] and a[j] = NaN.
>
> Then if during the sort/selection we compare elements at indices i and k
> first, we will swap them, which is correct behaviour regardless of the
> NaN strategy used. If we compare elemntes at indices i and j first and
> then elements at indices j and k and we use FIXED NaNStrategy, then
> we won't swap the two elements and the middle NaN would induced a
> non-globally sorted array. I think FIXED should act as a river flowing
> over rocks : the water (non-NaN elements) would move freely around and
> be sorted, and the rocks (NaN elements) would stay were they are but do
> not prevent water flow.

That is what FIXED is supposed to mean.  If the original array is
x_0, ..., x_n and x_i is NaN, then after applying the FIXED
NaNStrategy to sort the array, x_i will still be NaN.  If this is
inconvenient to support for Percentile, I would be fine with making
selection of this NaNStrategy illegal - i.e, throwing IAE if it is
specified.  As I said before, I would also be OK with making the
presence of NaN values in the input trigger (documented) IAE.

Phil
>
> I have found another way to handle this, using an indirection array that
> could "jump" over NaNs for FIXED strategy or push them to either end for
> MINIMAL or MAXIMAL strategy.
>
> I will try this in the next few days, as time permit.
>
> best regards,
> Luc
>
>> best regards,
>> Luc
>>
>>> Phil
>>>> best regards,
>>>> Luc
>>>>
>>>>> Phil
>>>>>
>>>>>>> I still don't see a within-math use for recoding methods; though
>>>>>>> I do see the value from a user perspective.  The problem is if we
>>>>>>> open this door, we risk MathArrays becoming a mini commons lang.
>>>>>> If Commons Lang already provides a feature, then we indeed don't
>>>>>> need to offer the same within CM (if there is no internal use). Can
>>>>>> someone confirm?
>>>>>>
>>>>>>
>>>>>> Otherwise we could define a new inner interface in MathArrays,
>>>>>> similar to "MathArrays.Function": --- public class MathArrays {
>>>>>>
>>>>>> // ...
>>>>>>
>>>>>> public interface Transformer { /** Transform the whole array */ 
>>>>>> double[] transform(double[] input); /** Transform the array in the
>>>>>> interval [from, to) and return the whole array */ double[]
>>>>>> transform(double[] input, int from, int to); /** Transform the
>>>>>> array in the interval [from, to) and return the transformed
>>>>>> sub-array */ double[] transformAndSplice(double[] input, int from,
>>>>>> int to); } } ---
>>>>>>
>>>>>> Then we can provide "replace" and "remove" transformers through
>>>>>> factory methods (example/untested code): --- public class
>>>>>> MathArrays {
>>>>>>
>>>>>> // ...
>>>>>>
>>>>>> public static Transformer createReplaceTransformer(final double
>>>>>> old, final double replace) { return new Transformer() { public
>>>>>> double[] transform(double[] input) { final int len = input.length; 
>>>>>> final double[] output = new double[len]; for (int i = 0; i < len;
>>>>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? 
>>>>>> replace : input[i]; } return output; }
>>>>>>
>>>>>> // etc. }; } } ---
>>>>>>
>>>>>>
>>>>>> Gilles
>>>>>>
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>>
>>>>>>
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>>
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] use case for NaNStrategy.FIXED?

Posted by Luc Maisonobe <lu...@spaceroots.org>.
Hi all,

Le 01/07/2014 08:52, Luc Maisonobe a écrit :
> Le 30/06/2014 22:00, Phil Steitz a écrit :
>> On 6/30/14, 12:33 PM, Luc Maisonobe wrote:
>>> Le 30/06/2014 19:15, Phil Steitz a écrit :
>>>>
>>>>> On Jun 30, 2014, at 9:59 AM, Gilles <gi...@harfang.homelinux.org>
>>>>> wrote:
>>>>>
>>>>> On Mon, 30 Jun 2014 09:08:00 -0700, Phil Steitz wrote:
>>>>>>> On Jun 29, 2014, at 3:41 PM, Gilles
>>>>>>> <gi...@harfang.homelinux.org> wrote:
>>>>>>>
>>>>>>>>> On Sun, 29 Jun 2014 14:39:51 -0700, Phil Steitz wrote:
>>>>>>>>>> On 6/29/14, 2:30 PM, Gilles wrote:
>>>>>>>>>>> On Sun, 29 Jun 2014 10:25:58 -0700, Phil Steitz wrote: 
>>>>>>>>>>> On 6/29/14, 9:48 AM, venkatesha murthy wrote: On Sun,
>>>>>>>>>>> Jun 29, 2014 at 1:25 AM, Phil Steitz 
>>>>>>>>>>> <ph...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>>> On 6/25/14, 12:12 AM, Luc Maisonobe wrote: Le
>>>>>>>>>>>>> 25/06/2014 06:05, venkatesha murthy a écrit :
>>>>>>>>>>>>>> On Wed, Jun 25, 2014 at 12:21 AM, Luc Maisonobe 
>>>>>>>>>>>>>> <lu...@spaceroots.org>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi Venkat,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Le 23/06/2014 21:08, venkatesha murthy a écrit
>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>> On Tue, Jun 24, 2014 at 12:08 AM, Luc
>>>>>>>>>>>>>>>> Maisonobe <lu...@spaceroots.org>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> Hi all,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> While looking further in Percentile class
>>>>>>>>>>>>>>>>> for MATH-1120, I have found another problem
>>>>>>>>>>>>>>>>> in the current implementation. 
>>>>>>>>>>>>>>>>> NaNStrategy.FIXED
>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> leave the NaNs in place, but at the end of
>>>>>>>>>>>>>>>>> the KthSelector.select method, a call to
>>>>>>>>>>>>>>>>> Arrays.sort moves the NaNs to the end of 
>>>>>>>>>>>>>>>>> the small sub-array. What is really
>>>>>>>>>>>>>>>>> implemented here is therefore closer to 
>>>>>>>>>>>>>>>>> NaNStrategy.MAXIMAL than NaNStrategy.FIXED.
>>>>>>>>>>>>>>>>> This always occur in the test cases because
>>>>>>>>>>>>>>>>> they use very short arrays, and we 
>>>>>>>>>>>>>>>>> directly
>>>>>>>>>>>> switch to
>>>>>>>>>>>>>>>>> this part of the select method.
>>>>>>>>>>>>>>>> Are NaNs considered higher than +Inf ? If
>>>>>>>>>>>>>>>> MAXIMAL represent replacing for +inf ; you
>>>>>>>>>>>>>>>> need something to indicate beyond this for
>>>>>>>>>>>>>>>> NaN.
>>>>>>>>>>>>>>> Well, we can just keep the NaN themselves and
>>>>>>>>>>>>>>> handled them appropriately, hoping not to trash
>>>>>>>>>>>>>>> performances too much.
>>>>>>>>>>>>>> Agreed.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What is the test input you see an issue and
>>>>>>>>>>>>>>>> what is the actual error you are seeing.
>>>>>>>>>>>>>>>> Please share the test case.
>>>>>>>>>>>>>>> Just look at
>>>>>>>>>>>>>>> PercentileTest.testReplaceNanInRange(). The 
>>>>>>>>>>>>>>> first check in the test corresponds to a
>>>>>>>>>>>>>>> Percentile configuration at 50% percentile, and
>>>>>>>>>>>>>>> NaNStrategy.FIXED. The array has an odd number
>>>>>>>>>>>>>>> of entries, so the 50% percentile is exactly
>>>>>>>>>>>>>>> one of the entries: the one at index 5 in the 
>>>>>>>>>>>>>>> final array.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The initial ordering is { 0, 1, 2, 3, 4, NaN,
>>>>>>>>>>>>>>> NaN, 5, 7, NaN, 8 }. So for the
>>>>>>>>>>>>>>> NaNStrategy.FIXED setting, it should not be
>>>>>>>>>>>>>>> modified at all in the selection algorithm and
>>>>>>>>>>>>>>> the result for 50% should be the first NaN of
>>>>>>>>>>>>>>> the array, at index 5. In fact, due to the
>>>>>>>>>>>>>>> Arrays.sort, we *do* reorder the array into  {
>>>>>>>>>>>>>>> 0, 1, 2, 3, 4, 5, 7, 8, NaN, NaN, NaN }, so the
>>>>>>>>>>>>>>> result is 5.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Agreed. just verified by putting back the
>>>>>>>>>>>>>>> earlier insertionSort
>>>>>>>>>>>> function.
>>>>>>>>>>>>>>> If we use NaNStrategy.MAXIMAL and any quantile
>>>>>>>>>>>>>>> above 67%, we get as a result
>>>>>>>>>>>>>>> Double.POSITIVE_INFINITY instead of
>>>>>>>>>>>>>>> Double.NaN.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> If we agree to leave FIXED as unchanged
>>>>>>>>>>>>>>> behaviour with your
>>>>>>>>>>>> insertionSort
>>>>>>>>>>>>>> code; then atleast MAXIMAL/MINIMAL should be
>>>>>>>>>>>>>> allowed for transformation
>>>>>>>>>>>> of
>>>>>>>>>>>>>> NaN to +/-Inf
>>>>>>>>>>>>> I'm fine with it, as long as we clearly state it in
>>>>>>>>>>>>> the NaNStrategy documentation, saying somethig
>>>>>>>>>>>>> like:
>>>>>>>>>>>>>
>>>>>>>>>>>>> When MAXIMAL/MINIMAL are used, the NaNs are
>>>>>>>>>>>>> effecively *replaced* by +/- infinity, hence the
>>>>>>>>>>>>> results will be computed as if infinities were 
>>>>>>>>>>>>> there in the first place.
>>>>>>>>>>>> -1 - this is mixing concerns.  NaNStrategy exists for
>>>>>>>>>>>> one specific purpose - to turn extend the ordering on
>>>>>>>>>>>> doubles a total ordering. Strategies prescribe only
>>>>>>>>>>>> how NaNs are to be treated in the ordering.  Side
>>>>>>>>>>>> effects on the input array don't make sense in this 
>>>>>>>>>>>> context.  The use case for which this class was
>>>>>>>>>>>> created was rank transformations.  Returning
>>>>>>>>>>>> infinities in rank transformations would blow up
>>>>>>>>>>>> computations in these classes.  If a floating point 
>>>>>>>>>>>> computations need to transform NaNs, the specific
>>>>>>>>>>>> stats / other classes that do this transformation
>>>>>>>>>>>> should perform and document the behavior.
>>>>>>>>>>>>
>>>>>>>>>>>> Phil
>>>>>>>>>>> OK. Agreed realized it later.  Hence i have not
>>>>>>>>>>> touched NaNStrategy in my patch(MATH-1132) . Now i have
>>>>>>>>>>> added a separate transformer to transform NaNs but it
>>>>>>>>>>> uses NanStrategy.  Please let me know on this as this 
>>>>>>>>>>> trasnformation itself can be used in different places.
>>>>>>>>>> Honestly, I don't see the value of this.  I would be more
>>>>>>>>>> favorable to an addition to MathArrays supporting NaN (or
>>>>>>>>>> infinity) filtering / transformation.  Several years back
>>>>>>>>>> we made the decision to just let NaNs propagate in
>>>>>>>>>> computations, which basically means that if you feed NaNs
>>>>>>>>>> to [math] you are going to get NaNs out in results. If we
>>>>>>>>>> do get into the business of filtering NaNs (or infinities
>>>>>>>>>> for that matter), I think it is probably best to do that
>>>>>>>>>> in MathArrays or somesuch - i.e., don't decorate APIs
>>>>>>>>>> with NaN or infinity-handling handling descriptions.
>>>>>>>>>> That would be a huge task and would likely end up
>>>>>>>>>> bleeding implementation detail into the APIs.  As I said
>>>>>>>>>> above, NaNStrategy has to exist for rank transformations
>>>>>>>>>> to be well-defined.  NaN-handling in floating point 
>>>>>>>>>> computations is defined and as long as we fully document
>>>>>>>>>> what our algorithms do, I think it is fair enough to "let
>>>>>>>>>> the NaNs fall where they may" - which basically means
>>>>>>>>>> users need to do the filtering / transformation
>>>>>>>>>> themselves.
>>>>>>>>> So, do I understand correctly that it's OK to add the
>>>>>>>>> "remove" and "replace" utility methods (cf. MATH-1130) in
>>>>>>>>> "MathArrays"? Or does this thread conclude that we don't
>>>>>>>>> have a use for this within Commons Math?
>>>>>>>> You raise a good point here.  Personally, I don't see an
>>>>>>>> internal use and if we are sticking with MathArrays including
>>>>>>>> only things we have internal use for, I would say no, don't
>>>>>>>> include them.
>>>>>>> A third way to look at it would be that since some algorithms
>>>>>>> require being passed "clean" data (e.g. without NaNs), we could
>>>>>>> provide some simple means to do so. A user could then write,
>>>>>>> e.g. --- double[] data = { 1, Double.NaN, 3, 6, 5 };
>>>>>>>
>>>>>>> Percentile p = new Percentile(); double q =
>>>>>>> p.evaluate(MathArrays.replace(data, Double.NaN,
>>>>>>> Double.POSITIVE_INFINITY), 0.1); ---
>>>>>>>
>>>>>>> Then, if we provide that utility, is it still necessary to
>>>>>>> complicate the Percentile code with a NaNStrategy parameter?
>>>>>> I may be missing something, but it seems to me that percentile
>>>>>> does need a NaNStrategy in the presence if NaNs like the other
>>>>>> order statistics do (basically to define their position In the
>>>>>> ordering).  It doesn't logically need more than that.  I would
>>>>>> be ok defaulting the strategy or even returning NaN in the
>>>>>> presence of NaNs (or whenever interpolation involves a NaN). All
>>>>>> of this can be documented.
>>>>> I am _surely_ missing something because I don't figure out how NaNs
>>>>> can be useful when computing those statistics... :-}
>>>> Good point.  Basically the different strategies allow you to
>>>> effectively ignore or work around them so stats can be well-defined.
>>>> Without NaNStrategy, order statistics including NaNs are undefined.
>>> The problem with Percentile is that the elements are both reordered
>>> *and* used when interpolating between two values. Tor the rank methods,
>>> only reordering was needed so the various NaNStrategies could be
>>> implemented by replacing at will with +/- infinity and a classical sort
>>> would do the ordering for us.
>>>
>>> Here, we do interpolate on the values, so if we first replace NaN with
>>> +infinity to put them at the end, we may interpolate between a finite
>>> value and +inf and have +inf as a result, whereas if the NaN "value" was
>>> preserved and some magic used to sort the NaN, we would end up
>>> interpolating with a NaN and get a NaN as a result. Setting up the magic
>>> for MINIMAL and MAXIMAL seems achievable withing the sort, as we already
>>> have a specific sort (in fact not a sort but a selection algorithm,
>>> which is the core of the class). Setting up the magic for FIXED seems
>>> more difficult.
>>>
>>> So I guess implementing NaNStrategies may be done in different ways
>>> depending on how they will be used by the calling algorithm. I don't yet
>>> know if we can set up a general method to be used for all statistics.
>>> Looking only at Percentile, replacement with +/- infinity already seems
>>> to not be an appropriate solution.
>>
>> I don't think we should try to redefine algorithms so they can
>> handle NaNs.  When you do floating point computations with NaNs, you
>> get NaNs.  That should be understood to be the contract.  For rank
>> statistics, such as Percentile, we need to extend the ordering on
>> doubles to include them in ordering for the algorithm to be
>> well-defined.  If we subsequently do floating point computations
>> with the NaNs, the result should be NaN.  I think we should either
>> just simplify things and say we *always* return NaN whenever NaNs
>> are present for Percentile, or do the following:
>>
>> 0) use the prescribed NaN strategy to determine the ordering - so
>> NaNs end up either at the bottom, at the top or in their original
>> position (FIXED).
>> 1) if interpolation involves a NaN element, the result returned is NaN.
> 
> This is exactly what I say. If NaNs are in, NaNs should be out. As for
> now we do replace them, we get infinities instead of NaNs. I would
> prefer getting NaNs there as it is more consistent with the purpose of
> NaNs in floating point encoding.
> 
> So I think we agree: NaNs should not be changed, but they should be
> reordered. Perhaps we could have something simple like NaNStrategy
> having a method:
> 
>   boolean shouldSwap(int i1, double d1, int i2, double i2)
> 
> that would use both the index and values of the element to check if they
> should be swapped or not. This method would be used both for traditional
> sorting for ranking and in selection algorithms for percentile. I first
> thought we should simply have NaN implement Comparator<Double> but this
> would not work for FIXED, whereas the above method would work (simply
> returning false as soon as there is a NaN, regardless of it being at a
> lower or upper index).
> 
> I will take a look at this option.

This does not work as is. The problem is lack of transitivity. Suppose
that we have three indices i < j < k, with a[i] > a[k] and a[j] = NaN.

Then if during the sort/selection we compare elements at indices i and k
first, we will swap them, which is correct behaviour regardless of the
NaN strategy used. If we compare elemntes at indices i and j first and
then elements at indices j and k and we use FIXED NaNStrategy, then
we won't swap the two elements and the middle NaN would induced a
non-globally sorted array. I think FIXED should act as a river flowing
over rocks : the water (non-NaN elements) would move freely around and
be sorted, and the rocks (NaN elements) would stay were they are but do
not prevent water flow.

I have found another way to handle this, using an indirection array that
could "jump" over NaNs for FIXED strategy or push them to either end for
MINIMAL or MAXIMAL strategy.

I will try this in the next few days, as time permit.

best regards,
Luc

> 
> best regards,
> Luc
> 
>>
>> Phil
>>>
>>> best regards,
>>> Luc
>>>
>>>>
>>>> Phil
>>>>
>>>>>> I still don't see a within-math use for recoding methods; though
>>>>>> I do see the value from a user perspective.  The problem is if we
>>>>>> open this door, we risk MathArrays becoming a mini commons lang.
>>>>> If Commons Lang already provides a feature, then we indeed don't
>>>>> need to offer the same within CM (if there is no internal use). Can
>>>>> someone confirm?
>>>>>
>>>>>
>>>>> Otherwise we could define a new inner interface in MathArrays,
>>>>> similar to "MathArrays.Function": --- public class MathArrays {
>>>>>
>>>>> // ...
>>>>>
>>>>> public interface Transformer { /** Transform the whole array */ 
>>>>> double[] transform(double[] input); /** Transform the array in the
>>>>> interval [from, to) and return the whole array */ double[]
>>>>> transform(double[] input, int from, int to); /** Transform the
>>>>> array in the interval [from, to) and return the transformed
>>>>> sub-array */ double[] transformAndSplice(double[] input, int from,
>>>>> int to); } } ---
>>>>>
>>>>> Then we can provide "replace" and "remove" transformers through
>>>>> factory methods (example/untested code): --- public class
>>>>> MathArrays {
>>>>>
>>>>> // ...
>>>>>
>>>>> public static Transformer createReplaceTransformer(final double
>>>>> old, final double replace) { return new Transformer() { public
>>>>> double[] transform(double[] input) { final int len = input.length; 
>>>>> final double[] output = new double[len]; for (int i = 0; i < len;
>>>>> i++) { output[i] = Precision.equalsIncludingNaN(input[i], old) ? 
>>>>> replace : input[i]; } return output; }
>>>>>
>>>>> // etc. }; } } ---
>>>>>
>>>>>
>>>>> Gilles
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>>
>>>>>
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>>
>>>> ---------------------------------------------------------------------
>>>>
>>>>
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>>> For additional commands, e-mail: dev-help@commons.apache.org
>>>
>>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>>
>>
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org