You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/16 16:45:52 UTC

[GitHub] [arrow] nealrichardson commented on pull request #11904: ARROW-15010: [R] Create a function registry for our NSE funcs

nealrichardson commented on pull request #11904:
URL: https://github.com/apache/arrow/pull/11904#issuecomment-995991878


   I'll read through the PR, but first some high-level questions.
   
   > This PR:
   > 
   > * Defines `register_translation()` and `register_translation_agg()` instead of direct assignment into `nse_funcs` and `agg_funcs`. This enables attaching a package name when the function is being registered (e.g., `register_translation("stringr::str_dup", function(x, ...) ...)`) and makes it possible in the future to allow other packages to define translations and/or for us to change how we evaluate translated expressions without changing many function assignments.
   
   Why is attaching a package name useful?
   
   I see the value of providing an extension point to users/other package developers.
   
   I don't understand what you mean by allowing "us to change how we evaluate translated expressions without changing many function assignments".
   
   > * Moves the registration of translations to package load time rather than package build time. This enables splitting up translations into multiple files, adds the usual CMD check that normal functions undergo (e.g., for use of missing variables), and opens up the possibility of defining different translations for different versions of packages (or omitting translations if a package isn't installed).
   
   Why is it a good idea to move this to package load time? Some might see that as a negative if it is significant in run time. It also doesn't seem like it is necessary to move the assignment to load time, is it? You could have the registry function but still assign at build time.
   
   We can split up translations into multiple files without this change.
   
   Why is "the possibility of defining different translations for different versions of packages" valuable?
   
   Why is "omitting translations if a package isn't installed" valuable?
   
   > * Splits up the definition of translations into dplyr-funcs-type.R, dplyr-funcs-conditional.R, dplyr-funcs-string.R, dplyr-funcs-datetime.R, and dplyr-funcs-math.R. This matches where the translations are tested (the test files were named test-dplyr-funcs-string.R, etc.). Some translations were moved to dplyr-summarise.R because the translations were being tested in test-dplyr-summarise.R. This makes it easier for parallel PRs defining translations and makes it easier for new contributors to figure out where tests should go.
   > * Consolidates internal documentation on how to write translations into one .Rd file rather than scattered around as comments in dplyr-functions.R as they were before.
   > * Removes direct references to `nse_funcs` and `agg_funcs` except where used to implement evaluation
   > 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org