You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2021/12/07 17:20:00 UTC

[jira] [Commented] (ARROW-15010) [R] Create a function registry for our NSE funcs

    [ https://issues.apache.org/jira/browse/ARROW-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454764#comment-17454764 ] 

Dewey Dunnington commented on ARROW-15010:
------------------------------------------

I think the main thing is to break up the definition of translations into a few files because the current file is getting quite long and contains some translation logic and some actual translations. This also helps with managing multiple function translation PRs (and the testing is separated anyway).

Short reprex of what I was thinking:

{code:R}
nse_funcs <- new.env(parent = emptyenv())
agg_funcs <- new.env(parent = emptyenv())

translation_registry <- function() {
  nse_funcs
}

translation_registry_agg <- function() {
  agg_funcs
}

register_translation <- function(fun_name, fun, registry = translation_registry()) {
  name <- gsub("^.*?::", "", fun_name)
  namespace <- gsub("::.*$", "", fun_name)
  
  attr(fun, "arrow_namespace") <- namespace
  nse_funcs[[name]] <- fun
  invisible(fun)
}

register_translation_agg <- function(fun_name, fun, registry = translation_registry_agg()) {
  register_translation(fun_name, fun, registry = registry)
}

# ...then one of these per package
register_lubridate_translations <- function() {
  register_translation("lubridate::year", function(x) {
    # ...
  })
}

# ...then define them at package load
.onLoad <- function(...) {
  register_lubridate_translations()
}
{code}

This has some nice features:

- You can define related functions, aggregate functions, and (soon!) window functions next to each other
- No need to worry about collate order (you're the onLoad thing already, caching all the functions into {{.cache$funcs}})
- You can export {{register_translation()}} and make translating to Arrow expressions another package's problem (e.g., geoarrow translations for {{st_literally_everything()}})
- There's a place for namespacing if or when that gets supported 


> [R] Create a function registry for our NSE funcs
> ------------------------------------------------
>
>                 Key: ARROW-15010
>                 URL: https://issues.apache.org/jira/browse/ARROW-15010
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Jonathan Keane
>            Priority: Major
>
> This came up in the comments of ARROW-14575, but would be helpful even without doing anything with the namespace: instead of {{nse_funcs <- ...}}, create a registry + function to register each function and then use that registry in place of {{nse_funcs}}. 
> This will: 
> * Give us a bit more freedom for defining bindings in separate files (though which nse_funcs+collate we could do the same). E.g. have one file per package
> * Possibly give a place to add some documentation (though see ARROW-15011)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)