You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ian Cook (Jira)" <ji...@apache.org> on 2021/05/06 15:35:00 UTC
[jira] [Updated] (ARROW-12042) [C++] Change or rationalize output
of array_sort_indices on ChunkedArray
[ https://issues.apache.org/jira/browse/ARROW-12042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ian Cook updated ARROW-12042:
-----------------------------
Description:
Currently when the {{array_sort_indices}} compute function is called on a ChunkedArray of two or more Arrays, it returns a ChunkedArray of Arrays of _local_ sort indices for each Array. Demonstrating this with the R bindings:
{code:java}
> x <- ChunkedArray$create(c(2L, 1L), c(4L, 3L))
> arrow:::call_function("array_sort_indices", x, options = list(order = FALSE))
ChunkedArray
[
[
1,
0
],
[
1,
0
]
]
{code}
Compare to the {{sort_indices}} compute function which returns an Array of _global_ sort indices in this case:
{code:java}
> arrow:::call_function("sort_indices", x, options = list(names = "", orders = 0L))
Array
<uint64>
[
1,
0,
3,
2
]{code}
Is this behavior deliberate? If so, we should document it clearly. If not, we should change it.
Note that the docs currently states that {{array_sort_indices}} only works on Arrays [https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions] (see note (4)) but evidently that is not exactly correct.
was:
Currently when the {{array_sort_indices}} compute function is called on a ChunkedArray of two or more Arrays, it returns a ChunkedArray of Arrays of _local_ sort indices for each Array. Demonstrating this with the R bindings (but note that these R examples will not run until ARROW-11703 is merged):
{code:java}
> x <- ChunkedArray$create(c(2L, 1L), c(4L, 3L))
> arrow:::call_function("array_sort_indices", x, options = list(order = FALSE))
ChunkedArray
[
[
1,
0
],
[
1,
0
]
]
{code}
Compare to the {{sort_indices}} compute function which returns an Array of _global_ sort indices in this case:
{code:java}
> arrow:::call_function("sort_indices", x, options = list(names = "", orders = 0L))
Array
<uint64>
[
1,
0,
3,
2
]{code}
Is this behavior deliberate? If so, we should document it clearly. If not, we should change it.
Note that the docs currently states that {{array_sort_indices}} only works on Arrays [https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions] (see note (4)) but evidently that is not exactly correct.
> [C++] Change or rationalize output of array_sort_indices on ChunkedArray
> ------------------------------------------------------------------------
>
> Key: ARROW-12042
> URL: https://issues.apache.org/jira/browse/ARROW-12042
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Affects Versions: 4.0.0
> Reporter: Ian Cook
> Priority: Major
>
> Currently when the {{array_sort_indices}} compute function is called on a ChunkedArray of two or more Arrays, it returns a ChunkedArray of Arrays of _local_ sort indices for each Array. Demonstrating this with the R bindings:
> {code:java}
> > x <- ChunkedArray$create(c(2L, 1L), c(4L, 3L))
> > arrow:::call_function("array_sort_indices", x, options = list(order = FALSE))
> ChunkedArray
> [
> [
> 1,
> 0
> ],
> [
> 1,
> 0
> ]
> ]
> {code}
> Compare to the {{sort_indices}} compute function which returns an Array of _global_ sort indices in this case:
> {code:java}
> > arrow:::call_function("sort_indices", x, options = list(names = "", orders = 0L))
> Array
> <uint64>
> [
> 1,
> 0,
> 3,
> 2
> ]{code}
> Is this behavior deliberate? If so, we should document it clearly. If not, we should change it.
> Note that the docs currently states that {{array_sort_indices}} only works on Arrays [https://arrow.apache.org/docs/cpp/compute.html#sorts-and-partitions] (see note (4)) but evidently that is not exactly correct.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)