You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/12 00:04:31 UTC
[GitHub] [arrow] nealrichardson opened a new pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
nealrichardson opened a new pull request #9170:
URL: https://github.com/apache/arrow/pull/9170
Also:
* Exposes `SupportedBackendNames` from the C++ library (perhaps it should be renamed and include "memory" in it, was just following what the internal functions were calling it) so we can make that visible at higher layers
* Adds a general `arrow_info()` function that includes memory pool settings, build-time feature availability (compression codecs, S3), etc., which could be useful for debugging purposes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#issuecomment-758371255
@github-actions rebase
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ianmcook commented on a change in pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#discussion_r556072024
##########
File path: r/R/arrow-package.R
##########
@@ -76,6 +76,39 @@ option_use_threads <- function() {
!is_false(getOption("arrow.use_threads"))
}
+#' Report information on the package's capabilities
+#'
+#' This function summarizes a number of build-time configurations and run-time
+#' settings for the Arrow package. It may be useful for diagnostics.
+#' @return A list including version information, boolean "capabilities", and
+#' statistics from Arrow's memory allocator.
+#' @export
+#' @importFrom utils packageVersion
+arrow_info <- function() {
+ opts <- options()
+ out <- list(
+ version = packageVersion("arrow"),
+ libarrow = arrow_available(),
+ options = opts[grep("^arrow.", names(opts))]
Review comment:
Escape the dot (`\\.`) to make it match only literal dots
##########
File path: r/R/arrow-package.R
##########
@@ -76,6 +76,39 @@ option_use_threads <- function() {
!is_false(getOption("arrow.use_threads"))
}
+#' Report information on the package's capabilities
+#'
+#' This function summarizes a number of build-time configurations and run-time
+#' settings for the Arrow package. It may be useful for diagnostics.
+#' @return A list including version information, boolean "capabilities", and
+#' statistics from Arrow's memory allocator.
+#' @export
+#' @importFrom utils packageVersion
+arrow_info <- function() {
+ opts <- options()
+ out <- list(
+ version = packageVersion("arrow"),
Review comment:
`packageVersion()` returns a special object with classes `package_version` and `numeric_version`. It's probably fine, but there's a chance it might print weird in some user environments. Maybe safer to wrap it in `as.character()`?
##########
File path: r/NEWS.md
##########
@@ -35,6 +35,8 @@
* Large string types can now be written to Parquet files
* The [pronouns `.data` and `.env`](https://rlang.r-lib.org/reference/tidyeval-data.html) are now fully supported in Arrow `dplyr` pipelines.
* Option `arrow.skip_nul` (default `FALSE`, as in `base::scan()`) allows conversion of Arrow string (`utf8()`) type data containing embedded nul `\0` characters to R. If set to `TRUE`, nuls will be stripped and a warning is emitted if any are found.
+* `arrow_info()` for an overview of various run-time and build-time Arrow configurations, useful for debugging
+* Set environment variable `ARROW_DEFAULT_MEMORY_POOL` before loading the Arrow package to change memory allocators. Windows packages are built with "mimalloc"; most others have "jemalloc". These are used by default if they were built, and they're generally much faster than the system malloc, but sometimes it is useful to turn them off for debugging purposes. To disable them, set `ARROW_DEFAULT_MEMORY_POOL=system`.
Review comment:
Elsewhere in `NEWS.md`, `jemalloc` and `mimalloc` appear in backticks
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#discussion_r555989724
##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -87,6 +87,19 @@ const char* je_arrow_malloc_conf =
#endif // ARROW_JEMALLOC
namespace arrow {
+
+std::vector<std::string> SupportedBackendNames() {
Review comment:
I had tried but was failing on some namespace issues; can try again.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#discussion_r556108017
##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -87,6 +87,19 @@ const char* je_arrow_malloc_conf =
#endif // ARROW_JEMALLOC
namespace arrow {
+
+std::vector<std::string> SupportedBackendNames() {
Review comment:
I fixed this and renamed it to `SupportedMemoryBackendNames`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#issuecomment-758313538
https://issues.apache.org/jira/browse/ARROW-11176
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson closed pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #9170:
URL: https://github.com/apache/arrow/pull/9170
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] ianmcook commented on pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#issuecomment-759156358
LGTM 👍
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on a change in pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#discussion_r555946108
##########
File path: cpp/src/arrow/memory_pool.cc
##########
@@ -87,6 +87,19 @@ const char* je_arrow_malloc_conf =
#endif // ARROW_JEMALLOC
namespace arrow {
+
+std::vector<std::string> SupportedBackendNames() {
Review comment:
It would be better to reuse `SupportedBackends` below. Otherwise we risk this function being out of sync.
##########
File path: r/R/arrow-package.R
##########
@@ -76,6 +76,39 @@ option_use_threads <- function() {
!is_false(getOption("arrow.use_threads"))
}
+#' Report information on the package's capabilities
+#'
+#' This function summarizes a number of build-time configurations and run-time
+#' settings for the Arrow package. It may be useful for diagnostics.
+#' @return A list including version information, boolean "capabilities", and
+#' statistics from Arrow's memory allocator.
Review comment:
Does it make sense to include runtime statistics together with static configuration information?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#issuecomment-759128887
I added a print method for arrow info. It looks something like this:
```
> arrow_info()
Arrow package version: 2.0.0.9000
Capabilities:
s3 TRUE
snappy TRUE
gzip TRUE
brotli TRUE
zstd TRUE
lz4 TRUE
lz4_frame TRUE
lzo FALSE
bz2 TRUE
jemalloc TRUE
mimalloc FALSE
Memory:
Allocator jemalloc
Current 0 bytes
Max -855217280 bytes
```
The trouble is that when memory allocated is >2gb, it overflows an integer in the `int64_t` to `SEXP` conversion (https://github.com/r-lib/cpp11/issues/145). I'll see if I can work around that somehow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#discussion_r555990658
##########
File path: r/R/arrow-package.R
##########
@@ -76,6 +76,39 @@ option_use_threads <- function() {
!is_false(getOption("arrow.use_threads"))
}
+#' Report information on the package's capabilities
+#'
+#' This function summarizes a number of build-time configurations and run-time
+#' settings for the Arrow package. It may be useful for diagnostics.
+#' @return A list including version information, boolean "capabilities", and
+#' statistics from Arrow's memory allocator.
Review comment:
I don't think it makes sense to separate them; I expect that this is a function I'll ask bug reporters to run to give me all of the arrow debug information about their setup.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#issuecomment-758301990
@pitrou your review requested for the C++ change; you don't have to have an opinion about the R code (but you may!)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on a change in pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#discussion_r556107770
##########
File path: r/R/arrow-package.R
##########
@@ -76,6 +76,39 @@ option_use_threads <- function() {
!is_false(getOption("arrow.use_threads"))
}
+#' Report information on the package's capabilities
+#'
+#' This function summarizes a number of build-time configurations and run-time
+#' settings for the Arrow package. It may be useful for diagnostics.
+#' @return A list including version information, boolean "capabilities", and
+#' statistics from Arrow's memory allocator.
+#' @export
+#' @importFrom utils packageVersion
+arrow_info <- function() {
+ opts <- options()
+ out <- list(
+ version = packageVersion("arrow"),
Review comment:
I think the special object is more useful because you can do `>` etc. methods with it. And the first line in its print method is as.character() so I think we're ok there, unless there's a scenario I'm missing:
```r
print.numeric_version <- function (x, ...) {
y <- as.character(x)
if (!length(y))
writeLines(gettext("<0 elements>"))
else if (any("quote" == names(list(...))))
print(ifelse(is.na(y), NA_character_, sQuote(y)), ...)
else print(ifelse(is.na(y), NA_character_, sQuote(y)), quote = FALSE,
...)
invisible(x)
}
```
What we really should do is add an S3 class to this object and make a print method for it, like `sessionInfo()`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] nealrichardson commented on pull request #9170: ARROW-11176: [R] Expose memory pool name and document setting it
Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #9170:
URL: https://github.com/apache/arrow/pull/9170#issuecomment-759130633
Coercing to `double` works around it:
```
Memory:
Allocator jemalloc
Current 7.47 Gb
Max 11.2 Gb
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org