You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ian Cook (Jira)" <ji...@apache.org> on 2021/03/25 16:39:00 UTC

[jira] [Updated] (ARROW-12087) [C++] Fix sort_indices, array_sort_indices timestamp support discrepancy

     [ https://issues.apache.org/jira/browse/ARROW-12087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ian Cook updated ARROW-12087:
-----------------------------
    Description: 
{{sort_indices}} supports sorting by timestamp arrays, but {{array_sort_indices}} does not. Here's some example R code to demonstrate this (but this example code depends on ARROW-11703 to run):
{code:java}
tbl <- tibble::tibble(
  dttm = lubridate::ymd_hms(c("2021-01-01 00:00:00", "1900-01-01 00:00:00")),
)
rb <- arrow::record_batch(tbl)

# this fails:
arrow:::call_function(
  "array_sort_indices",
  rb$dttm,
  options = list(order = F)
)
## Error: NotImplemented: Function array_sort_indices has no kernel matching input types (array[timestamp[us, tz=UTC]])

# this fails because it internally calls array_sort_indices
arrow:::call_function(
  "sort_indices",
  rb,
  options = list(names = "dttm", orders = 0L)
)
## Error: NotImplemented: Function array_sort_indices has no kernel matching input types (array[timestamp[us, tz=UTC]])

# this succeeds
arrow:::call_function(
  "sort_indices",
  rb,
  options = list(names = c("dttm", "dttm"), orders = 0L)
) 
## Array
## <uint64>
## [
##   1,
##   0
## ]{code}

  was:
{{sort_indices}} supports sorting by timestamp arrays, but {{array_sort_indices}} does not. Here's some example R code to demonstrate this (but this example code depends on ARROW-11703 to run):
{code:java}
tbl <- tibble::tibble(
  dttm = lubridate::ymd_hms(c("2021-01-01 00:00:00", "1900-01-01 00:00:00")),
)
rb <- arrow::record_batch(tbl)

# this fails:
arrow:::call_function(
  "array_sort_indices",
  rb$dttm,
  options = list(order = F)
)
## Error: NotImplemented: Function array_sort_indices has no kernel matching input types (array[timestamp[us, tz=UTC]])

# this fails
arrow:::call_function(
  "sort_indices",
  rb,
  options = list(names = "dttm", orders = 0L)
)
## Error: NotImplemented: Function array_sort_indices has no kernel matching input types (array[timestamp[us, tz=UTC]])

# this succeeds
arrow:::call_function(
  "sort_indices",
  rb,
  options = list(names = c("dttm", "dttm"), orders = 0L)
) 
## Array
## <uint64>
## [
##   1,
##   0
## ]{code}


> [C++] Fix sort_indices, array_sort_indices timestamp support discrepancy
> ------------------------------------------------------------------------
>
>                 Key: ARROW-12087
>                 URL: https://issues.apache.org/jira/browse/ARROW-12087
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Ian Cook
>            Priority: Major
>
> {{sort_indices}} supports sorting by timestamp arrays, but {{array_sort_indices}} does not. Here's some example R code to demonstrate this (but this example code depends on ARROW-11703 to run):
> {code:java}
> tbl <- tibble::tibble(
>   dttm = lubridate::ymd_hms(c("2021-01-01 00:00:00", "1900-01-01 00:00:00")),
> )
> rb <- arrow::record_batch(tbl)
> # this fails:
> arrow:::call_function(
>   "array_sort_indices",
>   rb$dttm,
>   options = list(order = F)
> )
> ## Error: NotImplemented: Function array_sort_indices has no kernel matching input types (array[timestamp[us, tz=UTC]])
> # this fails because it internally calls array_sort_indices
> arrow:::call_function(
>   "sort_indices",
>   rb,
>   options = list(names = "dttm", orders = 0L)
> )
> ## Error: NotImplemented: Function array_sort_indices has no kernel matching input types (array[timestamp[us, tz=UTC]])
> # this succeeds
> arrow:::call_function(
>   "sort_indices",
>   rb,
>   options = list(names = c("dttm", "dttm"), orders = 0L)
> ) 
> ## Array
> ## <uint64>
> ## [
> ##   1,
> ##   0
> ## ]{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)