You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/09/13 12:00:00 UTC

[jira] [Commented] (ARROW-12778) [R] Support tidyselect where() selection helper in dplyr verbs

    [ https://issues.apache.org/jira/browse/ARROW-12778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17603536#comment-17603536 ] 

Dewey Dunnington commented on ARROW-12778:
------------------------------------------

I think the missing piece is something like {{simulate_data_frame()}} (which is what dbplyr and the substrait R package use to support {{where()}}:

{code:R}
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.

simulate_data_frame <- function(schema) {
  arrays <- lapply(schema$fields, function(field) concat_arrays(type = field$type))
  vectors <- lapply(
    arrays,
    function(array) tryCatch(
      as.vector(array), 
      error = function(...) vctrs::unspecified()
    )
  )
  
  names(vectors) <- names(schema)
  tibble::new_tibble(vectors, nrow = 0)
}

simulate_data_frame(schema(col1 = int32(), col2 = string()))
#> # A tibble: 0 × 2
#> # … with 2 variables: col1 <int>, col2 <chr>
{code}


In substrait R, for example, [~thisisnic] implemented {{where()}} support with only a few lines! https://github.com/voltrondata/substrait-r/blob/ed6f4057b90c91275bcce5a125b0836e60ab4e8d/R/pkg-dplyr.R#L242

> [R] Support tidyselect where() selection helper in dplyr verbs
> --------------------------------------------------------------
>
>                 Key: ARROW-12778
>                 URL: https://issues.apache.org/jira/browse/ARROW-12778
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Ian Cook
>            Priority: Major
>              Labels: dplyr
>             Fix For: 10.0.0
>
>
> Since we can now determine the data type of an unevaluated array expression (ARROW-12291) I think we should be able to support the {{where()}} selection helper.
> This is already done for the {{relocate()}} verb (in ARROW-12781 ) but not for any other verbs. 
> Steps required to do this:
>  # ARROW-12781 
>  # ARROW-12105
>  # Remove the {{check_select_helpers()}} function definition and remove all the calls to it
>  # Modify any remaining the {{expect_error()}} tests that test {{where()}} and check for the error message {{"Unsupported selection helper"}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)