You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2020/11/03 14:02:00 UTC
[jira] [Commented] (ARROW-10345) [C++] NaN breaks sorting
[ https://issues.apache.org/jira/browse/ARROW-10345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225420#comment-17225420 ]
Joris Van den Bossche commented on ARROW-10345:
-----------------------------------------------
+1 on sorting NaNs last, but before Nulls. That also what eg Julia does, or what many database do (eg https://clickhouse.tech/docs/en/sql-reference/statements/select/order-by/#sorting-of-special-values, https://www.postgresql.org/docs/9.0/datatype-numeric.html, although databases typically do that by defining a different comparison to let NaN==NaN).
> [C++] NaN breaks sorting
> ------------------------
>
> Key: ARROW-10345
> URL: https://issues.apache.org/jira/browse/ARROW-10345
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 2.0.0
> Reporter: Antoine Pitrou
> Assignee: Yibo Cai
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.0.0
>
> Time Spent: 40m
> Remaining Estimate: 0h
>
> {code:python}
> >>> import numpy as np
> >>> import pyarrow.compute as pc
> >>> pc.sort_indices([3.0, 4.0, 1.0, 2.0, None])
> <pyarrow.lib.UInt64Array object at 0x7f78368a0c90>
> [
> 2,
> 3,
> 0,
> 1,
> 4
> ]
> >>> pc.sort_indices([3.0, 4.0, np.nan, 1.0, 2.0, None])
> <pyarrow.lib.UInt64Array object at 0x7f783684bf30>
> [
> 0,
> 1,
> 2,
> 3,
> 4,
> 5
> ]
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)