You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/06 22:58:06 UTC

[GitHub] [arrow] nealrichardson opened a new pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

nealrichardson opened a new pull request #9916:
URL: https://github.com/apache/arrow/pull/9916


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609295183



##########
File path: r/R/compute.R
##########
@@ -34,6 +48,30 @@ call_function <- function(function_name, ..., args = list(...), options = empty_
   compute__CallFunction(function_name, args, options)
 }
 
+#' List available Arrow C++ compute functions
+#'
+#' This function lists the names of all available Arrow compute functions.
+#' These can be called by passing to [call_function()], or they can be
+#' called by name with an `arrow_` prefix inside a `dplyr` verb.
+#'
+#' The resulting list describes the capabilities of your `arrow` build.
+#' Some functions, such as string and regular expression functions,
+#' require optional build-time C++ dependencies. If your `arrow` package
+#' was not compiled with those features enabled, those functions will
+#' not appear in this list.
+#'
+#' @param pattern Optional regular expression to filter the function list
+#' @param ... Additional parameters passed to `grep()`
+#' @return A character vector of available Arrow C++ function names
+#' @export
+list_compute_functions <- function(pattern = NULL, ...) {
+  funcs <- compute__GetFunctionNames()
+  if (!is.null(pattern)) {
+    funcs <- grep(pattern, funcs, value = TRUE, ...)
+  }
+  funcs
+}

Review comment:
       Oh, disregard—I see this is coming straight from the C++ and not through our R mappings.

##########
File path: r/R/compute.R
##########
@@ -34,6 +48,30 @@ call_function <- function(function_name, ..., args = list(...), options = empty_
   compute__CallFunction(function_name, args, options)
 }
 
+#' List available Arrow C++ compute functions
+#'
+#' This function lists the names of all available Arrow compute functions.
+#' These can be called by passing to [call_function()], or they can be
+#' called by name with an `arrow_` prefix inside a `dplyr` verb.
+#'
+#' The resulting list describes the capabilities of your `arrow` build.
+#' Some functions, such as string and regular expression functions,
+#' require optional build-time C++ dependencies. If your `arrow` package
+#' was not compiled with those features enabled, those functions will
+#' not appear in this list.
+#'
+#' @param pattern Optional regular expression to filter the function list
+#' @param ... Additional parameters passed to `grep()`
+#' @return A character vector of available Arrow C++ function names
+#' @export
+list_compute_functions <- function(pattern = NULL, ...) {
+  funcs <- compute__GetFunctionNames()
+  if (!is.null(pattern)) {
+    funcs <- grep(pattern, funcs, value = TRUE, ...)
+  }
+  funcs
+}

Review comment:
       Would it be possible to name the output vector with the names of the R functions mapped to these compute functions (where such mappings exist, and for the others leave them nameless)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609279547



##########
File path: r/R/compute.R
##########
@@ -15,10 +15,24 @@
 # specific language governing permissions and limitations
 # under the License.
 
+#' Call an Arrow compute function
+#'
+#' This function provides a lower-level API to calling Arrow functions by their
+#' string function name. You won't use it directly for most applications.
+#' Many Arrow compute functions are mapped to R methods,
+#' and in a `dplyr` evaluation context, [all Arrow functions][list_compute_functions()]
+#' are callable with an `arrow_` prefix.
+#' @param function_name string Arrow compute function name
+#' @param ... Function arguments, which may include `Array`, `ChunkedArray`, `Scalar`,
+#' `RecordBatch`, or `Table`.
+#' @param args list arguments as an alternative to specifying in `...`
+#' @param options named list of C++ function options.
+#' @return An `Array`, `ChunkedArray`, `Scalar`, `RecordBatch`, or `Table`, whatever the compute function results in.

Review comment:
       ```suggestion
   #' @return An `Array`, `ChunkedArray`, `Scalar`, `RecordBatch`, or `Table`, whichever the compute function results in.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609278479



##########
File path: r/R/compute.R
##########
@@ -15,10 +15,24 @@
 # specific language governing permissions and limitations
 # under the License.
 
+#' Call an Arrow compute function
+#'
+#' This function provides a lower-level API to calling Arrow functions by their

Review comment:
       tiny nit
   ```suggestion
   #' This function provides a lower-level API for calling Arrow functions by their
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609882167



##########
File path: r/R/compute.R
##########
@@ -34,6 +48,30 @@ call_function <- function(function_name, ..., args = list(...), options = empty_
   compute__CallFunction(function_name, args, options)
 }
 
+#' List available Arrow C++ compute functions
+#'
+#' This function lists the names of all available Arrow compute functions.
+#' These can be called by passing to [call_function()], or they can be
+#' called by name with an `arrow_` prefix inside a `dplyr` verb.

Review comment:
       Good call.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#issuecomment-816103529


   CI from my fork: https://github.com/nealrichardson/arrow/actions/runs/730429264


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609294737



##########
File path: r/R/compute.R
##########
@@ -34,6 +48,30 @@ call_function <- function(function_name, ..., args = list(...), options = empty_
   compute__CallFunction(function_name, args, options)
 }
 
+#' List available Arrow C++ compute functions
+#'
+#' This function lists the names of all available Arrow compute functions.
+#' These can be called by passing to [call_function()], or they can be
+#' called by name with an `arrow_` prefix inside a `dplyr` verb.
+#'
+#' The resulting list describes the capabilities of your `arrow` build.
+#' Some functions, such as string and regular expression functions,
+#' require optional build-time C++ dependencies. If your `arrow` package
+#' was not compiled with those features enabled, those functions will
+#' not appear in this list.
+#'

Review comment:
       I'd suggest adding an additional paragraph here saying something like:
   >This is not a complete list of all the compute functions available in the Arrow R package. The R package also includes Arrow translations of many base R functions and some popular functions from tidyverse packages that can be called directly on Arrow objects or inside `dplyr` verbs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#issuecomment-814523370


   https://issues.apache.org/jira/browse/ARROW-12200


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609276936



##########
File path: r/R/compute.R
##########
@@ -34,6 +48,30 @@ call_function <- function(function_name, ..., args = list(...), options = empty_
   compute__CallFunction(function_name, args, options)
 }
 
+#' List available Arrow C++ compute functions
+#'
+#' This function lists the names of all available Arrow compute functions.
+#' These can be called by passing to [call_function()], or they can be
+#' called by name with an `arrow_` prefix inside a `dplyr` verb.
+#'
+#' The resulting list describes the capabilities of your `arrow` build.
+#' Some functions, such as string and regular expression functions,
+#' require optional build-time C++ dependencies. If your `arrow` package
+#' was not compiled with those features enabled, those functions will
+#' not appear in this list.
+#'
+#' @param pattern Optional regular expression to filter the function list
+#' @param ... Additional parameters passed to `grep()`
+#' @return A character vector of available Arrow C++ function names
+#' @export
+list_compute_functions <- function(pattern = NULL, ...) {
+  funcs <- compute__GetFunctionNames()
+  if (!is.null(pattern)) {
+    funcs <- grep(pattern, funcs, value = TRUE, ...)
+  }
+  funcs
+}

Review comment:
       Would it be possible to name the output vector with the names of the R functions mapped to these compute functions (where such mappings exist, and for the others leave them nameless)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #9916:
URL: https://github.com/apache/arrow/pull/9916


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609289774



##########
File path: r/R/compute.R
##########
@@ -15,10 +15,24 @@
 # specific language governing permissions and limitations
 # under the License.
 
+#' Call an Arrow compute function
+#'
+#' This function provides a lower-level API to calling Arrow functions by their
+#' string function name. You won't use it directly for most applications.
+#' Many Arrow compute functions are mapped to R methods,
+#' and in a `dplyr` evaluation context, [all Arrow functions][list_compute_functions()]
+#' are callable with an `arrow_` prefix.
+#' @param function_name string Arrow compute function name
+#' @param ... Function arguments, which may include `Array`, `ChunkedArray`, `Scalar`,
+#' `RecordBatch`, or `Table`.
+#' @param args list arguments as an alternative to specifying in `...`
+#' @param options named list of C++ function options.
+#' @return An `Array`, `ChunkedArray`, `Scalar`, `RecordBatch`, or `Table`, whatever the compute function results in.
+#' @seealso [Arrow C++ documentation](https://arrow.apache.org/docs/cpp/compute.html) for the functions and their respective options.
+#' @export

Review comment:
       It would be good to have an example or two:
   ```suggestion
   #' @examples
   #' \donttest{
   #' a <- Array$create(c(1L, 2L, 3L, NA, 5L))
   #' s <- Scalar$create(4L)
   #' call_function("fill_null", a, s)
   #'
   #' a <- Array$create(rnorm(10000))
   #' call_function("quantile", a, options = list(q = seq(0, 1, 0.25)))
   #' }
   #' @export
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609295875



##########
File path: r/R/compute.R
##########
@@ -34,6 +48,30 @@ call_function <- function(function_name, ..., args = list(...), options = empty_
   compute__CallFunction(function_name, args, options)
 }
 
+#' List available Arrow C++ compute functions
+#'
+#' This function lists the names of all available Arrow compute functions.

Review comment:
       ```suggestion
   #' This function lists the names of all available Arrow C++ library compute functions.
   ```
   (also see my comment below)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609278479



##########
File path: r/R/compute.R
##########
@@ -15,10 +15,24 @@
 # specific language governing permissions and limitations
 # under the License.
 
+#' Call an Arrow compute function
+#'
+#' This function provides a lower-level API to calling Arrow functions by their

Review comment:
       tiny nit: s/to/for/




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609296958



##########
File path: r/R/compute.R
##########
@@ -34,6 +48,30 @@ call_function <- function(function_name, ..., args = list(...), options = empty_
   compute__CallFunction(function_name, args, options)
 }
 
+#' List available Arrow C++ compute functions
+#'
+#' This function lists the names of all available Arrow compute functions.
+#' These can be called by passing to [call_function()], or they can be
+#' called by name with an `arrow_` prefix inside a `dplyr` verb.

Review comment:
       I assume some of these would not actually work because the options are not wired up in `make_compute_options`. Do you think it's worth addressing that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9916: ARROW-12200: [R] Export and document list_compute_functions

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9916:
URL: https://github.com/apache/arrow/pull/9916#discussion_r609294737



##########
File path: r/R/compute.R
##########
@@ -34,6 +48,30 @@ call_function <- function(function_name, ..., args = list(...), options = empty_
   compute__CallFunction(function_name, args, options)
 }
 
+#' List available Arrow C++ compute functions
+#'
+#' This function lists the names of all available Arrow compute functions.
+#' These can be called by passing to [call_function()], or they can be
+#' called by name with an `arrow_` prefix inside a `dplyr` verb.
+#'
+#' The resulting list describes the capabilities of your `arrow` build.
+#' Some functions, such as string and regular expression functions,
+#' require optional build-time C++ dependencies. If your `arrow` package
+#' was not compiled with those features enabled, those functions will
+#' not appear in this list.
+#'

Review comment:
       I'd suggest adding an additional paragraph here saying something like:
   >This is not a complete list of all the compute functions available in the `arrow` R package. The R package also includes Arrow translations of many base R functions and some popular functions from tidyverse packages that can be called directly on Arrow objects or inside `dplyr` verbs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org