You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/10 02:12:08 UTC

[GitHub] [arrow] cyb70289 commented on pull request #8524: ARROW-10345: [C++][Compute] Fix NaN error in sorting and topn kernels

cyb70289 commented on pull request #8524:
URL: https://github.com/apache/arrow/pull/8524#issuecomment-724403787


   > Is "sort_indices" actually stable? The [documentation](https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions) says "non-stable".
   
   In current code, "sort_indices" is stable, it uses "std::stable_sort". "partition_nth_indices" is not stable, it uses "std::nth_element".
   [numpy.argsort](https://numpy.org/doc/stable/reference/generated/numpy.argsort.html) has a parameter to select sorting algorithm, maybe arrow can also implement unstable quicksort (std::sort) and provide a kernel option to select.
   
   ```
   # comparison of numpy stable and unstable sort
   # quicksort gives better performance
   
   In [1]: import numpy as np
   
   In [2]: a = np.random.randint(0, 100, 12345678)
   
   In [3]: time i1 = np.argsort(a, kind='quicksort')
   CPU times: user 909 ms, sys: 4.67 ms, total: 914 ms
   Wall time: 913 ms
   
   In [4]: time i2 = np.argsort(a, kind='stable')
   CPU times: user 998 ms, sys: 20.8 ms, total: 1.02 s
   Wall time: 1.02 s
   
   # stable and quick sort gives different result
   In [5]: (i1 == i2).all()
   Out[5]: False
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org