You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/09 03:14:30 UTC

[GitHub] [arrow] karldw opened a new pull request, #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

karldw opened a new pull request, #12849:
URL: https://github.com/apache/arrow/pull/12849

   This PR aims to address ARROW-15092 and ARROW-15608, allowing the function `create_package_with_all_dependencies()` to be run on non-linux systems.
   
   To accomplish this, the code parses the [versions.txt](https://github.com/apache/arrow/blob/master/cpp/thirdparty/versions.txt) file in R. This process is a little hairy, because the existing code in versions.txt and [download_dependencies.sh](https://github.com/apache/arrow/blob/master/cpp/thirdparty/download_dependencies.sh) depend on shell substitution and array parsing.
   
   In writing this PR, I assumed that only base R functions were available, and the format of versions.txt couldn't be changed to make things easier. I wrote some roxygen-like documentation, but didn't actually want to generate the `*.Rd` files, so just started the lines with `#` instead of `#'`.
   
   I'd be very grateful for feedback here. The first question is whether this approach makes sense at a macro level, and then whether my implementation seems reasonable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] karldw commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
karldw commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1095796331

   Okay, great! @nealrichardson, just to confirm: do you want to go with the suggestion to add `pkgbuild` as a new dependency? I added it and tweaked the bash script - let's see how tests go on this new version.
   
   --------------------
   
   Specific replies:
   
   @nealrichardson:
   
   I think `readlink` without `-f` isn't useful here, so I removed it from the chain. I also had to change the syntax a little to get things working, but I might be missing some clever bash-ism.
   
   Just to repeat @assignUser's earlier comment, the use case I have in my head is that the package is downloaded on one machine, then installed on another. For that reason, I was trying not to make too many assumptions about the build capabilities on the downloading machine. But this offline build is a pretty niche demand, and it's probably okay to ask those users to make sure they have bash available when downloading.
   
   
   @assignUser:
   
   > I noticed that there are no tests for `create_package_with_all_dependencies` so that might be something we could add in this PR.
   
   Running `create_package_with_all_dependencies` requires downloading ~100MB of files, which seemed like a pretty heavy test to run every time. I added a test for `run_download_script` that skips the actual download, but checks that the requirements are in place.
   
   
   @assignUser and @wjones127, thanks for the tips!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] assignUser commented on a diff in pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
assignUser commented on code in PR #12849:
URL: https://github.com/apache/arrow/pull/12849#discussion_r847372973


##########
r/R/install-arrow.R:
##########
@@ -137,6 +137,262 @@ reload_arrow <- function() {
   }
 }
 
+# Substitute like the bash shell
+#
+# @param one_string A length-1 character vector
+# @param possible_values A dictionary-ish set of variables that could provide
+#   values to substitute in.
+# @return `one_string`, with values substituted like bash would.
+#
+# Only supports a small subset of bash substitution patterns. May have multiple
+# bash variables in `one_string`
+# Used as a helper to parse versions.txt
+..install_substitute_like_bash <- function(one_string, possible_values) {

Review Comment:
   I think it is not necessary to use the dot-prefix for internal functions as it is done nowhere else in the package. Nice job keeping a consistent naming scheme though! 👍 



##########
r/R/install-arrow.R:
##########
@@ -137,6 +137,262 @@ reload_arrow <- function() {
   }
 }
 
+# Substitute like the bash shell
+#
+# @param one_string A length-1 character vector
+# @param possible_values A dictionary-ish set of variables that could provide
+#   values to substitute in.
+# @return `one_string`, with values substituted like bash would.
+#
+# Only supports a small subset of bash substitution patterns. May have multiple
+# bash variables in `one_string`
+# Used as a helper to parse versions.txt
+..install_substitute_like_bash <- function(one_string, possible_values) {
+  stopifnot(
+    !is.null(names(possible_values)),
+    !any(names(possible_values) == ""),
+    anyDuplicated(names(possible_values)) == 0,
+    length(one_string) == 1,
+    !anyNA(possible_values)
+  )
+  # Find the name of the version we want, something like
+  # ARROW_RAPIDJSON_BUILD_VERSION
+  # The `(//./_|:1)` is a special case to handle some bash fanciness.
+  version_regex <- "\\$\\{(ARROW_[A-Z0-9_]+?_VERSION)(//./_|:1)?\\}"
+  # Extract the matched groups. If the pattern occurs multiple times, we need
+  # all (non-overlapping) matches. stringr::str_extract_all() or
+  # base::gregexec() would be useful here, but gregexec was only introduced in
+  # R 4.1.
+  matched_substrings <- regmatches(
+    one_string,
+    gregexpr(version_regex, one_string, perl = TRUE)
+  )[[1]] # Subset [[1]] because one_string has length 1
+  # `matched_substrings` is a character vector with length equal to the number
+  # of non-overlapping matches of `version_regex` in `one_string`. `match_list`
+  # is a list (same length as `matched_substrings`), where each list element is
+  # a length-3 character vector. The first element of the vector is the value
+  # from `matched_substrings` (e.g. "${ARROW_ZZZ_VERSION//./_}"). The following
+  #  two values are the captured groups specified in `version_regex` e.g.
+  # "ARROW_ZZZ_VERSION" and "//./_".
+  match_list <- regmatches(
+    matched_substrings,
+    regexec(version_regex, matched_substrings, perl = TRUE)
+  )
+  # Small helper to take slices of match_list
+  extract_chr_by_idx <- function(lst, idx) {
+    vapply(lst, function(x) x[idx], FUN.VALUE = character(1L))
+  }
+  string_to_sub <- extract_chr_by_idx(match_list, 1L)
+  version_varnames <- extract_chr_by_idx(match_list, 2L)
+  bash_special_cases <- extract_chr_by_idx(match_list, 3L)
+  version_values <- possible_values[version_varnames]
+  version_values <- ifelse(
+    bash_special_cases == "", version_values, ifelse(
+      bash_special_cases == ":1", substring(version_values, 2), ifelse(
+        bash_special_cases == "//./_", gsub(".", "_", version_values, fixed = TRUE),
+        NA_character_ # otherwise
+      )
+    )
+  )
+  num_to_sub <- length(string_to_sub)
+  stopifnot(
+    all(version_varnames %in% names(possible_values)),
+    !anyNA(version_values),
+    num_to_sub >= 1,
+    num_to_sub < 10 # Something has gone wrong if we're doing 10+
+  )
+  out <- one_string
+  for (idx in seq_len(num_to_sub)) {
+    # not gsub in case there are duplicates
+    out <- sub(string_to_sub[idx], version_values[idx], out, fixed = TRUE)
+  }
+  out
+}
+
+# Substitute all values in the filenames and URLs of versions.txt
+#
+# @param deps_unsubstituted A list with two elements, `filenames` and `urls`
+# @param possible_values A dictionary-ish set of variables that could provide
+#   values to substitute in.
+# @return A list with two elements, `filenames` and `urls`, with values
+#   substituted into the strings like bash would.
+#
+# Used as a helper to parse versions.txt
+..install_substitute_all <- function(deps_unsubstituted, possible_values) {
+  file_substituted <- vapply(
+    deps_unsubstituted$filenames,
+    ..install_substitute_like_bash,
+    FUN.VALUE = character(1),
+    possible_values = possible_values
+  )
+  url_substituted <- vapply(
+    deps_unsubstituted$urls,
+    ..install_substitute_like_bash,
+    FUN.VALUE = character(1),
+    possible_values = possible_values
+  )
+  list(
+    filenames = unname(file_substituted),
+    urls = unname(url_substituted)
+  )
+}
+
+# Parse the version lines portion of versions.txt
+#
+# @param version_lines A character vector of lines read from versions.txt
+# @return The parsed and bash-substiuted version values
+#
+# Used as a helper to parse versions.txt
+..install_parse_version_lines <- function(version_lines) {
+  version_lines <- trimws(version_lines)
+  version_regex <- "^(ARROW_[A-Z0-9_]+_)(VERSION|SHA256_CHECKSUM)=([^=]+)$"
+  if (!all(grepl(version_regex, version_lines, perl = TRUE))) {
+    stop("Failed to parse version lines")
+  }
+  match_list <- regmatches(
+    version_lines,
+    regexec(version_regex, version_lines, perl = TRUE)
+  )
+  # Find the lines where the second regex match group is that are "VERSION" (as
+  # opposed to "SHA256_CHECKSUM")
+  version_idx <- vapply(
+    match_list,
+    function(m) m[[3]] == "VERSION",
+    FUN.VALUE = logical(1)
+  )
+  version_matches <- match_list[version_idx]
+  # Fancy indexing here is just to pull the first and second regex match out,
+  # e.g. "ARROW_RAPIDJSON_BUILD_" and "VERSION"
+  version_varnames <- vapply(
+    version_matches,
+    function(m) paste0(m[[2]], m[[3]]),
+    FUN.VALUE = character(1)
+  )
+  version_values <- vapply(version_matches,
+    function(m) m[[4]],
+    FUN.VALUE = character(1)
+  )
+  names(version_values) <- version_varnames
+  return(version_values)
+}
+
+# Parse the URL + filename array portion of versions.txt
+#
+# @param array_lines Characer vector of lines from the versions.txt file with
+#   the filename and URL array
+# @return A list with two character vectors, with names `filenames` and `urls`
+#
+# The output of this function has split out the filename and URL components,
+# but has not yet substituted in the version numbers. The output is next passed
+# to `..install_substitute_all()`
+#
+# Used as a helper to parse versions.txt
+..install_parse_dependency_array <- function(array_lines) {
+  stopifnot(
+    length(array_lines) >= 1,
+    is.character(array_lines),
+    !anyNA(array_lines)
+  )
+  array_lines <- trimws(array_lines)
+
+  # Parse the array_lines with a regex. Each line of the array is a different
+  # component, with a format like `"<component name> <filename> <url>"` (quotes
+  # included). The filename and URL include some version string that's defined
+  # earlier in the file.
+  # Regex in words:
+  # Start with `"ARROW_`, then any capital ASCII letter, number, or underscore.
+  # After a space, find anything except a space, colon, or forward slash. (No
+  # space is essential, and would be essential to parsing the array in bash.
+  # The colon and slash are just basic guards that this is a filename.) Next, a
+  # space. Then a URL, starting with https://, and including anything except a
+  # space. (This is the URL before substituting in the version sting, so normal
+  # URL parsing rules don't apply.)
+  dep_array_regex <- '^"(ARROW_[A-Z0-9_]+_URL) ([^ :/"]+) (https://[^ "]+)"$'
+  if (!all(grepl(dep_array_regex, array_lines, perl = TRUE))) {
+    stop("Cannot parse thirdparty dependency array in expected format.")
+  }
+  list(
+    filenames = gsub(dep_array_regex, "\\2", array_lines, perl = TRUE),
+    urls      = gsub(dep_array_regex, "\\3", array_lines, perl = TRUE)
+  )
+}
+
+# Parse the versions.txt file
+#
+# @param versions_file Filename pointing to versions.txt
+# @return The parsed and ready-to-use values, as a named list of vectors
+#
+#
+# The versions.txt file is included as part of the R tar file, and is here:
+# https://github.com/apache/arrow/blob/master/cpp/thirdparty/versions.txt
+#
+# Used as a helper to parse versions.txt
+..install_parse_lines <- function(versions_file) {
+  orig_lines <- readLines(versions_file)
+
+  lines <- gsub("#.*", "", orig_lines, perl = TRUE)
+  lines <- lines[lines != ""]
+
+  dep_array_start_idx <- grep("^DEPENDENCIES=\\($", lines, perl = TRUE)
+  dep_array_lines <- lines[
+    seq.int(from = dep_array_start_idx + 1, to = length(lines) - 1, by = 1)
+  ]
+  version_lines <- lines[seq.int(1, dep_array_start_idx - 1, by = 1)]
+  version_info <- ..install_parse_version_lines(version_lines)
+
+  failed_to_parse <- c(
+    anyNA(orig_lines),
+    length(orig_lines) > 1000,
+    length(lines) == 0,
+    length(dep_array_start_idx) != 1,
+    dep_array_start_idx <= 1,
+    dep_array_start_idx >= length(lines) - 3,
+    lines[length(lines)] != ")",
+    length(dep_array_lines) == 0,
+    anyNA(version_info)
+  )
+
+  if (any(failed_to_parse)) {
+    stop(

Review Comment:
   You could use `rlang::abort` here but that is very heterogenous in the package so 🤷 (and you aimed for base R which is good!) 



##########
r/R/install-arrow.R:
##########
@@ -137,6 +137,262 @@ reload_arrow <- function() {
   }
 }
 
+# Substitute like the bash shell
+#
+# @param one_string A length-1 character vector
+# @param possible_values A dictionary-ish set of variables that could provide
+#   values to substitute in.
+# @return `one_string`, with values substituted like bash would.
+#
+# Only supports a small subset of bash substitution patterns. May have multiple
+# bash variables in `one_string`
+# Used as a helper to parse versions.txt
+..install_substitute_like_bash <- function(one_string, possible_values) {
+  stopifnot(
+    !is.null(names(possible_values)),
+    !any(names(possible_values) == ""),
+    anyDuplicated(names(possible_values)) == 0,
+    length(one_string) == 1,
+    !anyNA(possible_values)
+  )
+  # Find the name of the version we want, something like
+  # ARROW_RAPIDJSON_BUILD_VERSION
+  # The `(//./_|:1)` is a special case to handle some bash fanciness.
+  version_regex <- "\\$\\{(ARROW_[A-Z0-9_]+?_VERSION)(//./_|:1)?\\}"
+  # Extract the matched groups. If the pattern occurs multiple times, we need
+  # all (non-overlapping) matches. stringr::str_extract_all() or
+  # base::gregexec() would be useful here, but gregexec was only introduced in
+  # R 4.1.
+  matched_substrings <- regmatches(
+    one_string,
+    gregexpr(version_regex, one_string, perl = TRUE)
+  )[[1]] # Subset [[1]] because one_string has length 1
+  # `matched_substrings` is a character vector with length equal to the number
+  # of non-overlapping matches of `version_regex` in `one_string`. `match_list`
+  # is a list (same length as `matched_substrings`), where each list element is
+  # a length-3 character vector. The first element of the vector is the value
+  # from `matched_substrings` (e.g. "${ARROW_ZZZ_VERSION//./_}"). The following
+  #  two values are the captured groups specified in `version_regex` e.g.
+  # "ARROW_ZZZ_VERSION" and "//./_".

Review Comment:
   👍 



##########
r/R/install-arrow.R:
##########
@@ -137,6 +137,262 @@ reload_arrow <- function() {
   }
 }
 
+# Substitute like the bash shell
+#
+# @param one_string A length-1 character vector
+# @param possible_values A dictionary-ish set of variables that could provide
+#   values to substitute in.
+# @return `one_string`, with values substituted like bash would.
+#
+# Only supports a small subset of bash substitution patterns. May have multiple
+# bash variables in `one_string`
+# Used as a helper to parse versions.txt
+..install_substitute_like_bash <- function(one_string, possible_values) {

Review Comment:
   > I wrote some roxygen-like documentation, but didn't actually want to generate the `*.Rd` files, so just started the lines with `#` instead of `#'`.
   
   You can prevent the creation of an Rd file by using `@noRd` .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1101489333

   Not immediately relevant here but in tools/nixlibs.R, there is code in `set_thirdparty_urls()` that generates the env vars that point to the files, and there are more cases that needs special handling now: google cloud cpp (comes out as ARROW_GOOGLE_URL) and nhlomann json (comes out as ARROW_NLOHMANN_URL). These aren't used now because we haven't added GCP bindings to the R package yet. But whenever we do, we'll need to fix those urls or else this fat package won't work for them.
   
   (Feel free to ignore for now or ticket for later, just commenting because I noticed it when trying to debug why it wasn't installing before.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
wjones127 commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1098149969

   Yeah maybe we need to download with curl instead?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] karldw commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
karldw commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1100875525

   If any of the folks here want to try this out:
   ```sh
   git clone --single-branch --branch=fix-15092 --depth 1 git@github.com:karldw/arrow.git
   cd arrow/r
   make sync-cpp
   R CMD build --no-build-vignettes .
   R -e 'source("R/install-arrow.R"); create_package_with_all_dependencies(source_file = "arrow_7.0.0.9000.tar.gz")'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1095191610

   @karldw thanks for taking a stab at this and apologies for missing your earlier ping on jira (I get lots of jira notifications). From the error message reported on the issue, it looks like the problem is the call to `readlink` in the download_dependencies.sh script. Reading the man page for that, it looks like it exists only to resolve any symlinks that might be in the `DESTDIR`. I'm not sure why exactly (ARROW-4033 last edited this line; the change was initially added in #2673 with no discussion). Before adding all of this R code, I might try editing that line to
   
   ```
   DESTDIR=$(readlink -f "${DESTDIR}" || readlink "${DESTDIR}" || ${DESTDIR})
   ```
   
   (macOS has readlink but doesn't support -f; `|| ${DESTDIR}` because it probably doesn't matter anyway).
   
   In general I think trying to download_dependencies.sh work on more platforms is the right solution, rather than hacking around it in R.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1098523529

   What about a different approach? We don't need to use the download script itself to download the files, but we do need to use versions.txt. We could use a slightly different bash script to generate R syntax that would download the files, then use R to download--no wget or curl or other system things required beyond what R already has, just need bash.
   
   To illustrate, I patched the existing download script:
   
   ```
   diff --git a/cpp/thirdparty/download_dependencies.sh b/cpp/thirdparty/download_dependencies.sh
   index 7ffffa08c..2c0447cdb 100755
   --- a/cpp/thirdparty/download_dependencies.sh
   +++ b/cpp/thirdparty/download_dependencies.sh
   @@ -30,14 +30,11 @@ else
      DESTDIR=$1
    fi
    
   -DESTDIR=$(readlink -f "${DESTDIR}")
   -
    download_dependency() {
      local url=$1
      local out=$2
    
   -  wget --quiet --continue --output-document="${out}" "${url}" || \
   -    (echo "Failed downloading ${url}" 1>&2; exit 1)
   +  echo 'download.file("'${url}'", "'${out}'")'
    }
    
    main() {
   @@ -46,7 +43,6 @@ main() {
      # Load `DEPENDENCIES` variable.
      source ${SOURCE_DIR}/versions.txt
    
   -  echo "# Environment variables for offline Arrow build"
      for ((i = 0; i < ${#DEPENDENCIES[@]}; i++)); do
        local dep_packed=${DEPENDENCIES[$i]}
    
   @@ -55,8 +51,6 @@ main() {
    
        local out=${DESTDIR}/${dep_tar_name}
        download_dependency "${dep_url}" "${out}"
   -
   -    echo "export ${dep_url_var}=${out}"
      done
    }
    ```
   
   in R we can get:
   
   ```
   > system("source ../cpp/thirdparty/download_dependencies.sh /tmp", intern=TRUE)
    [1] "download.file(\"https://github.com/abseil/abseil-cpp/archive/20210324.2.tar.gz\", \"/tmp/absl-20210324.2.tar.gz\")"                                                                 
    [2] "download.file(\"https://github.com/aws/aws-sdk-cpp/archive/1.8.133.tar.gz\", \"/tmp/aws-sdk-cpp-1.8.133.tar.gz\")"                                                                  
    [3] "download.file(\"https://github.com/awslabs/aws-checksums/archive/v0.1.12.tar.gz\", \"/tmp/aws-checksums-v0.1.12.tar.gz\")"                                                          
    [4] "download.file(\"https://github.com/awslabs/aws-c-common/archive/v0.6.9.tar.gz\", \"/tmp/aws-c-common-v0.6.9.tar.gz\")" 
   ...
   ```
   
   which you could then source.
   
   I'm also ok with just fixing the one `readlink` line and saying that the script requires bash and wget, possibly just adding that to the error message that happens if the download script errors. That's much simpler. I don't think R code and tests for whether we can determine if the current machine has bash and wget installed are worth the trouble.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1093630332

   https://issues.apache.org/jira/browse/ARROW-15092


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] karldw commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
karldw commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1097448316

   
   There are three categories of failed test here:
   
   1. Test failures I think I understand and can fix. In these first two builds, I think the issues is that the build doesn't [copy](https://github.com/apache/arrow/blob/d5db0ef78290898a6c64e4a79e8722e5cde6dfd1/r/Makefile#L42) files into `tools/` when I expected. That's fine.
       - R / AMD64 Ubuntu 20.04 R 4.1
       - R / rstudio/r-base:4.0-centos7 
       - R / rhub/debian-gcc-devel:latest (This one fails because `wget` isn't installed. Surprising, to me, but also fine.)
   2. Test failures I think aren't related to the changes I made, but please let me know if that's wrong.
       - Python / AMD64 MacOS 10.15 Python 3: Segfault in pytest.
       - C++ / AMD64 MacOS 10.15 C++: Core dump in `TestS3FS.GetFileInfoRoot`
   3. Test failures I don't understand, and could use help diagnosing. All of these fail at [`is_wget_available`](https://github.com/apache/arrow/pull/12849/files#diff-85f1f83b021c999b209e5671ed98c291287a3361ae55dcfd57e9380ebff623f2R160-R163), but I thought `bash -c 'wget -V'` would run successfully on all of these:
       - R / AMD64 Windows R 3.6 RTools 35
       - R / AMD64 Windows R 4.1 RTools 40
       - R / AMD64 Windows R devel RTools 42 
   
   I can add code to skip the test if `wget` isn't available, but first it seems important to track down what's happening in the cases I thought `wget` would be available, but isn't. I don't have a Windows system to test on -- any tips would be very welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] karldw commented on a diff in pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
karldw commented on code in PR #12849:
URL: https://github.com/apache/arrow/pull/12849#discussion_r851755147


##########
r/R/install-arrow.R:
##########
@@ -209,16 +216,22 @@ create_package_with_all_dependencies <- function(dest_file = NULL, source_file =
   on.exit(unlink(untar_dir, recursive = TRUE), add = TRUE)
   utils::untar(source_file, exdir = untar_dir)
   tools_dir <- file.path(untar_dir, "arrow/tools")
-  download_dependencies_sh <- file.path(tools_dir, "cpp/thirdparty/download_dependencies.sh")
+  download_dependencies_sh <- file.path(tools_dir, "download_dependencies_R.sh")

Review Comment:
   Thanks, I had forgotten!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jonkeane closed pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
jonkeane closed pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems
URL: https://github.com/apache/arrow/pull/12849


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] karldw commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
karldw commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1103860751

   > Not immediately relevant here but in tools/nixlibs.R, there is code in `set_thirdparty_urls()` that generates the env vars that point to the files, and there are more cases that needs special handling now: google cloud cpp (comes out as ARROW_GOOGLE_URL) and nhlomann json (comes out as ARROW_NLOHMANN_URL). These aren't used now because we haven't added GCP bindings to the R package yet. But whenever we do, we'll need to fix those urls or else this fat package won't work for them.
   
   One option might be to modify `download_dependencies_R.sh` to pull the actual names from versions.txt? Here's a demo that spits out the names we need in CSV format. (In reality we would want a command line flag rather than simply removing the download code.)
   
   ```diff
   @@ -44,20 +44,25 @@ download_dependency() {
      echo 'download.file("'${url}'", "'${out}'", quiet = TRUE)'
    }
    
   +print_tar_name() {
   +  local url_var=$1
   +  local tar_name=$2
   +  echo "'${url_var}','${tar_name}'"
   +}
   +
    main() {
   -  mkdir -p "${DESTDIR}"
    
      # Load `DEPENDENCIES` variable.
      source ${SOURCE_DIR}/cpp/thirdparty/versions.txt
    
   +  echo 'env_var,filename'
      for ((i = 0; i < ${#DEPENDENCIES[@]}; i++)); do
        local dep_packed=${DEPENDENCIES[$i]}
    
        # Unpack each entry of the form "$home_var $tar_out $dep_url"
        IFS=" " read -r dep_url_var dep_tar_name dep_url <<< "${dep_packed}"
    
   -    local out=${DESTDIR}/${dep_tar_name}
   -    download_dependency "${dep_url}" "${out}"
   +    print_tar_name "${dep_url_var}" "${dep_tar_name}"
      done
    }
   
   ```
   
   
   
   <details>
   
   <summary>
   Output
   </summary>
   
   ```
   env_var,filename
   'ARROW_ABSL_URL','absl-20210324.2.tar.gz'
   'ARROW_AWSSDK_URL','aws-sdk-cpp-1.8.133.tar.gz'
   'ARROW_AWS_CHECKSUMS_URL','aws-checksums-v0.1.12.tar.gz'
   'ARROW_AWS_C_COMMON_URL','aws-c-common-v0.6.9.tar.gz'
   'ARROW_AWS_C_EVENT_STREAM_URL','aws-c-event-stream-v0.1.5.tar.gz'
   'ARROW_BOOST_URL','boost-1.75.0.tar.gz'
   'ARROW_BROTLI_URL','brotli-v1.0.9.tar.gz'
   'ARROW_BZIP2_URL','bzip2-1.0.8.tar.gz'
   'ARROW_CARES_URL','cares-1.17.2.tar.gz'
   'ARROW_CRC32C_URL','crc32c-1.1.2.tar.gz'
   'ARROW_GBENCHMARK_URL','gbenchmark-v1.6.0.tar.gz'
   'ARROW_GFLAGS_URL','gflags-v2.2.2.tar.gz'
   'ARROW_GLOG_URL','glog-v0.5.0.tar.gz'
   'ARROW_GOOGLE_CLOUD_CPP_URL','google-cloud-cpp-v1.39.0.tar.gz'
   'ARROW_GRPC_URL','grpc-v1.35.0.tar.gz'
   'ARROW_GTEST_URL','gtest-1.11.0.tar.gz'
   'ARROW_JEMALLOC_URL','jemalloc-5.2.1.tar.bz2'
   'ARROW_LZ4_URL','lz4-8f61d8eb7c6979769a484cde8df61ff7c4c77765.tar.gz'
   'ARROW_MIMALLOC_URL','mimalloc-v1.7.3.tar.gz'
   'ARROW_NLOHMANN_JSON_URL','nlohmann-json-v3.10.2.tar.gz'
   'ARROW_OPENTELEMETRY_URL','opentelemetry-cpp-v1.2.0.tar.gz'
   'ARROW_OPENTELEMETRY_PROTO_URL','opentelemetry-proto-v0.11.0.tar.gz'
   'ARROW_ORC_URL','orc-1.7.3.tar.gz'
   'ARROW_PROTOBUF_URL','protobuf-v3.18.1.tar.gz'
   'ARROW_RAPIDJSON_URL','rapidjson-1a803826f1197b5e30703afe4b9c0e7dd48074f5.tar.gz'
   'ARROW_RE2_URL','re2-2021-11-01.tar.gz'
   'ARROW_SNAPPY_URL','snappy-1.1.9.tar.gz'
   'ARROW_THRIFT_URL','thrift-0.13.0.tar.gz'
   'ARROW_UTF8PROC_URL','utf8proc-v2.7.0.tar.gz'
   'ARROW_XSIMD_URL','xsimd-7d1778c3b38d63db7cec7145d939f40bc5d859d1.tar.gz'
   'ARROW_ZLIB_URL','zlib-1.2.12.tar.gz'
   'ARROW_ZSTD_URL','zstd-v1.5.1.tar.gz'
   ```
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1093630345

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] karldw commented on a diff in pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
karldw commented on code in PR #12849:
URL: https://github.com/apache/arrow/pull/12849#discussion_r848986139


##########
r/R/install-arrow.R:
##########
@@ -137,6 +137,54 @@ reload_arrow <- function() {
   }
 }
 
+#' Run download script
+#'
+#' @param tools_dir The `tools/` directory in the package.
+#' @param skip_download Skip doing the actual download?
+#' Note: we only need bash and wget here, not a full compiler suite.
+#' These will often be missing on Windows, but are typically present on other
+#' systems.
+#' @noRd
+run_download_script <- function(tools_dir, skip_download = FALSE) {
+  if (.Platform$OS.type == "windows") {
+    if (!requireNamespace("pkgbuild", quietly = TRUE)) {
+      stop("pkgbuild is required on Windows")
+    }
+    if (!pkgbuild::has_build_tools()) {
+      stop("This script requires bash and wget. Please install Rtools or run this in WSL.")
+    }
+    # Have build tools available for the rest of this function
+    pkgbuild::local_build_tools()
+  }
+  suppressWarnings(
+    is_wget_available <- system2(
+      "bash", c("-c", "'wget -V'"),
+      stdout = FALSE, stderr = FALSE
+    ) == 0
+  )
+  if (!is_wget_available) {
+    stop("Can't run wget - will not be able to download files")
+  }
+  bash_file <- file.path(tools_dir, "cpp/thirdparty/download_dependencies.sh")
+  if (!file.exists(bash_file)) {
+    stop("Can't find expected download_dependencies.sh file.")
+  }
+  # If you change this path, also need to edit nixlibs.R
+  download_dir <- file.path(tools_dir, "thirdparty_dependencies")
+  dir.create(download_dir)
+
+  if (!skip_download) {

Review Comment:
   We don't need `skip_download` outside of testing. Do you mean temporarily redefining the `system2` function? That seems like it could lead to some hard-to-diagnose bugs if a future reader is expecting `base::system2` when they see `system2`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
wjones127 commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1097472807

   > I don't have a Windows system to test on -- any tips would be very welcome.
   
   I'll take a look tomorrow on my Windows machine. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] assignUser commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
assignUser commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1095455087

   @wjones127 `create_package_with_all_dependencies` itself doesn't build the package, so rtools wouldn't necessarily be installed (windows binaries are on CRAN after all) but requiring rtools would be fine if there is a proper error message if it isn't found vs. the unclear error that we have now. Although `pkgbuild` is neither a soft or hard dependency at the moment so we would need to add that, which is a con in my opinion.
   
   In the end the R solution works everywhere and does not require new dependencies but adds a lot of code and is brittle to changes in `versions.txt`.  Where as the bash solution requires less changes/new code but (maybe) disadvantages windows users and possibly adds a new soft dependency. 🤷 
   
   I'll defer to your experience with the package and userbase @wjones127 @nealrichardson 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
wjones127 commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1095279149

   > Is there a setup agnostic way to run bash scripts on windows from R? The only possibility I see is using WSL or checking for git-bash (which is not on path by default afaik).
   
   On Windows I think we can take for granted that if a user is building a package they have already installed RTools (or should do so).
   
   It looks like pkgbuild has a function `with_build_tools()` ([docs](https://r-lib.github.io/pkgbuild/reference/has_build_tools.html)) that can temporarily add the appropriate RTools to the PATH before running a supplied script. Perhaps that is the appropriate thing to use here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] assignUser commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
assignUser commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1095219531

   > In general I think trying to download_dependencies.sh work on more platforms is the right solution, rather than hacking around it in R.
   
   Is there a setup agnostic way to run bash scripts on windows from R? The only possibility I see is using WSL or checking for git-bash (which is not on path by default afaik).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1095433531

   Yeah it is safe to assume bash these days, especially for scripts that CRAN won't run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1095477558

   > I'll defer to your experience with the package and userbase @wjones127 @nealrichardson
   
   IMO the bash script should be fixed anyway since it's not robust. I suspect that will solve the issue here too, in which case we don't need all of this R code (though I recognize and appreciate the effort).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1106885261

   Benchmark runs are scheduled for baseline = 20bc63a820e3d691f4484135108c73b5c2ef6746 and contender = c73870acdc775c43b00c1816f9c17e78036c8025. c73870acdc775c43b00c1816f9c17e78036c8025 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/da9f329b06b34573a719e6a661ede04e...e9bf2067d9e24ec4a9842b415f33ffd0/)
   [Finished :arrow_down:0.12% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/0ddaf17bc31c4586850558d7c0a3c531...60d783b882c84406ac1bd2e75e2a29b2/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/1e29a7793425439ab723da1dcfde6b9d...38ceace945a7474883887c2f2fcc8389/)
   [Finished :arrow_down:0.25% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/9847263fe8334e0cbf57594bcd56d91c...0667cc85791542d990b1d41d7c330678/)
   Buildkite builds:
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/561| `c73870ac` ec2-t3-xlarge-us-east-2>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/549| `c73870ac` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/547| `c73870ac` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/559| `c73870ac` ursa-thinkcentre-m75q>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/560| `20bc63a8` ec2-t3-xlarge-us-east-2>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/548| `20bc63a8` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/546| `20bc63a8` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/558| `20bc63a8` ursa-thinkcentre-m75q>
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #12849:
URL: https://github.com/apache/arrow/pull/12849#discussion_r849004739


##########
r/R/install-arrow.R:
##########
@@ -137,6 +137,54 @@ reload_arrow <- function() {
   }
 }
 
+#' Run download script
+#'
+#' @param tools_dir The `tools/` directory in the package.
+#' @param skip_download Skip doing the actual download?
+#' Note: we only need bash and wget here, not a full compiler suite.
+#' These will often be missing on Windows, but are typically present on other
+#' systems.
+#' @noRd
+run_download_script <- function(tools_dir, skip_download = FALSE) {
+  if (.Platform$OS.type == "windows") {
+    if (!requireNamespace("pkgbuild", quietly = TRUE)) {
+      stop("pkgbuild is required on Windows")
+    }
+    if (!pkgbuild::has_build_tools()) {
+      stop("This script requires bash and wget. Please install Rtools or run this in WSL.")
+    }
+    # Have build tools available for the rest of this function
+    pkgbuild::local_build_tools()
+  }
+  suppressWarnings(
+    is_wget_available <- system2(
+      "bash", c("-c", "'wget -V'"),
+      stdout = FALSE, stderr = FALSE
+    ) == 0
+  )
+  if (!is_wget_available) {
+    stop("Can't run wget - will not be able to download files")
+  }
+  bash_file <- file.path(tools_dir, "cpp/thirdparty/download_dependencies.sh")
+  if (!file.exists(bash_file)) {
+    stop("Can't find expected download_dependencies.sh file.")
+  }
+  # If you change this path, also need to edit nixlibs.R
+  download_dir <- file.path(tools_dir, "thirdparty_dependencies")
+  dir.create(download_dir)
+
+  if (!skip_download) {

Review Comment:
   > Do you mean temporarily redefining the system2 function? 
   
   Yes, I mean redefining within the scope of the test, also called mocking. I think you'd have to use `mockery::stub` and not `testthat::with_mock` or mockr, since the latter don't work with functions in base. Would look something like (though I haven't tested this code):
   
   ```r
     test_that("Prep thirdparty download", {
       skip_on_cran()
       # Replaces system2 with a function that always returns 0 when called from run_download_script
       mockery::stub(run_download_script, 'system2', 0)
       tools_dir <- tempdir()
       run_download_script(tools_dir)
     })
   ```
   
   See: https://github.com/r-lib/mockery#examples
   
   This would mean adding mockery as a dev dependency. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] assignUser commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
assignUser commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1097706749

   @karldw I checked and it looks like rtools40 does not come with `wget` (and it can't be installed with `pacman` either. That would explain the test failures under 3.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #12849:
URL: https://github.com/apache/arrow/pull/12849#discussion_r848772353


##########
r/R/install-arrow.R:
##########
@@ -137,6 +137,54 @@ reload_arrow <- function() {
   }
 }
 
+#' Run download script
+#'
+#' @param tools_dir The `tools/` directory in the package.
+#' @param skip_download Skip doing the actual download?
+#' Note: we only need bash and wget here, not a full compiler suite.
+#' These will often be missing on Windows, but are typically present on other
+#' systems.
+#' @noRd
+run_download_script <- function(tools_dir, skip_download = FALSE) {
+  if (.Platform$OS.type == "windows") {
+    if (!requireNamespace("pkgbuild", quietly = TRUE)) {
+      stop("pkgbuild is required on Windows")
+    }
+    if (!pkgbuild::has_build_tools()) {
+      stop("This script requires bash and wget. Please install Rtools or run this in WSL.")
+    }
+    # Have build tools available for the rest of this function
+    pkgbuild::local_build_tools()
+  }
+  suppressWarnings(
+    is_wget_available <- system2(
+      "bash", c("-c", "'wget -V'"),
+      stdout = FALSE, stderr = FALSE
+    ) == 0
+  )
+  if (!is_wget_available) {
+    stop("Can't run wget - will not be able to download files")
+  }
+  bash_file <- file.path(tools_dir, "cpp/thirdparty/download_dependencies.sh")
+  if (!file.exists(bash_file)) {
+    stop("Can't find expected download_dependencies.sh file.")
+  }
+  # If you change this path, also need to edit nixlibs.R
+  download_dir <- file.path(tools_dir, "thirdparty_dependencies")
+  dir.create(download_dir)
+
+  if (!skip_download) {

Review Comment:
   Do we need `skip_download` outside of testing? What might be better is to temporarily override `system2` within the test to always return 0 and do nothing. That would also avoid the issue of the error on missing wget in the test environment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] karldw commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
karldw commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1100513434

   Thanks, that's clever!
   
   Since we're only using `versions.txt` from `cpp/thirdparty/`, I also changed the Makefile to skip the other bundled files.
   
   I also left the `readline` changes in the original `download_dependencies.sh` file -- let me know if you want it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on code in PR #12849:
URL: https://github.com/apache/arrow/pull/12849#discussion_r852132974


##########
r/Makefile:
##########
@@ -41,7 +41,8 @@ deps:
 # we must rename .env to dotenv and then replace references to it in cpp/CMakeLists.txt
 sync-cpp:
 	cp ../NOTICE.txt inst/NOTICE.txt
-	rsync --archive --delete --exclude 'apidoc' --exclude 'build' --exclude 'build-support/boost_*' --exclude 'examples' --exclude 'src/gandiva' --exclude 'src/jni' --exclude 'src/plasma' --exclude 'src/skyhook' --exclude 'submodules' --exclude '**/*_test.cc' ../cpp tools/
+	rsync --archive --delete --exclude 'apidoc' --exclude 'build' --exclude 'build-support/boost_*' --exclude 'examples' --exclude 'src/gandiva' --exclude 'src/jni' --exclude 'src/plasma' --exclude 'src/skyhook' --exclude 'submodules' --exclude '**/*_test.cc' --exclude 'thirdparty' ../cpp tools/
+	cp -p ../cpp/thirdparty/versions.txt tools/

Review Comment:
   We need versions.txt to be inside of cpp/thirdparty because cmake expects it there, and we also need a header that is included there, so let's just keep the whole dir as before
   
   ```suggestion
   	rsync --archive --delete --exclude 'apidoc' --exclude 'build' --exclude 'build-support/boost_*' --exclude 'examples' --exclude 'src/gandiva' --exclude 'src/jni' --exclude 'src/plasma' --exclude 'src/skyhook' --exclude 'submodules' --exclude '**/*_test.cc'  ../cpp tools/
   ```



##########
r/tools/download_dependencies_R.sh:
##########
@@ -0,0 +1,64 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# This script downloads all the thirdparty dependencies as a series of tarballs
+# that can be used for offline builds, etc.
+# This script is specific for R users, using R's download facilities rather than
+# wget. The original version of this script is cpp/thirdparty/download_dependencies.sh
+# Changes from the original:
+#  * don't download the files, just emit R code to download
+#  * require a download directory
+#  * don't print env vars
+
+set -eu
+
+SOURCE_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+
+if [ "$#" -ne 1 ]; then
+  >&2 echo "Download directory must be specified on the command line"
+  exit 1
+else
+  DESTDIR=$1
+fi
+
+download_dependency() {
+  local url=$1
+  local out=$2
+
+  echo 'download.file("'${url}'", "'${out}'", quiet = TRUE)'
+}
+
+main() {
+  mkdir -p "${DESTDIR}"
+
+  # Load `DEPENDENCIES` variable.
+  source ${SOURCE_DIR}/versions.txt

Review Comment:
   ```suggestion
     source ${SOURCE_DIR}/cpp/thirdparty/versions.txt
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on PR #12849:
URL: https://github.com/apache/arrow/pull/12849#issuecomment-1103903170

   Yeah, something like that is probably the right solution. Would you mind making a JIRA for that? I don't think we need to tackle that here, and I want to make sure this fix gets in the upcoming release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on code in PR #12849:
URL: https://github.com/apache/arrow/pull/12849#discussion_r851748313


##########
r/R/install-arrow.R:
##########
@@ -209,16 +216,22 @@ create_package_with_all_dependencies <- function(dest_file = NULL, source_file =
   on.exit(unlink(untar_dir, recursive = TRUE), add = TRUE)
   utils::untar(source_file, exdir = untar_dir)
   tools_dir <- file.path(untar_dir, "arrow/tools")
-  download_dependencies_sh <- file.path(tools_dir, "cpp/thirdparty/download_dependencies.sh")
+  download_dependencies_sh <- file.path(tools_dir, "download_dependencies_R.sh")

Review Comment:
   I'm not seeing this file; did you forget to check it in?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org