You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "coady (via GitHub)" <gi...@apache.org> on 2023/06/16 01:08:40 UTC

[GitHub] [arrow] coady opened a new issue, #36118: Inconsistent ordering of descending nulls in `sort_indices` and `rank` functions.

coady opened a new issue, #36118:
URL: https://github.com/apache/arrow/issues/36118

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   ```python
   In []: arr = pa.array(['a', None, 'b'])
   
   In []: pc.sort_indices(arr, [('', 'descending')])
   Out[]: 
   [
     2,
     0,  # 'at_end' means >
     1
   ]
   
   In []: pc.sort_indices(arr, [('', 'descending')], null_placement='at_start')
   Out[]: 
   [
     1,
     2,  # 'at_start' means <
     0
   ]
   
   In []: pc.rank(arr, 'descending')
   Out[]: 
   [
     2,
     3, # literally 'at_end', not >
     1
   ]
   
   In []: pc.rank(arr, 'descending', null_placement='at_start')
   Out[]: 
   [
     3,
     1, # literally 'at_start', not <
     2
   ]
   ```
   
   I guess there is some genuine ambiguity. I'm inclined to think `sort_indices` is right because:
   * although the terms are `at_start` and `at_end`, docs also refer to nulls being `>`
   * it matches the expectation that `descending` would reverse `ascending`
   
   But either way, it's inconsistent.
   
   ### Component(s)
   
   C++, Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coady closed issue #36118: Inconsistent ordering of descending nulls in `sort_indices` and `rank` functions.

Posted by "coady (via GitHub)" <gi...@apache.org>.
coady closed issue #36118: Inconsistent ordering of descending nulls in `sort_indices` and `rank` functions.
URL: https://github.com/apache/arrow/issues/36118


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] js8544 commented on issue #36118: Inconsistent ordering of descending nulls in `sort_indices` and `rank` functions.

Posted by "js8544 (via GitHub)" <gi...@apache.org>.
js8544 commented on issue #36118:
URL: https://github.com/apache/arrow/issues/36118#issuecomment-1594017284

   Hi, I think you misunderstood the meaning of `sort_indices`. It returns the sorted array in which each element is represented as their original index. In the example you give, None has index 1 and is placed at the end/start correctly. You can see this if you call `take` on the result:
   ```python
   arr = pa.array(['a', None, 'b'])
   sorted_indices = pc.sort_indices(arr, [('', 'descending')])
   pc.take(arr, sorted_indices)
   
   <pyarrow.lib.StringArray object at 0x7fee5066de40>
   [
     "b",
     "a",
     null
   ]
   ```
   
   ```python
   arr = pa.array(['a', None, 'b'])
   sorted_indices = pc.sort_indices(arr, [('', 'descending')], null_placement='at_start')
   pc.take(arr, sorted_indices)
   
   <pyarrow.lib.StringArray object at 0x7fee98482260>
   [
     null,
     "b",
     "a"
   ]
   ```
   
   The behavior is consistent with rank, i.e. literally at_start and at_end. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org