You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/25 18:24:31 UTC

[GitHub] [arrow] karldw opened a new pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

karldw opened a new pull request #11001:
URL: https://github.com/apache/arrow/pull/11001


   I took a stab at implementing the approach @nealrichardson laid out in [ARROW-12981](https://issues.apache.org/jira/browse/ARROW-12981?focusedCommentId=17400415#comment-17400415). Please let me know what you think, and if you'd like any changes!
   
   I wrote some basic tests for the `download_optional_dependencies()` helper function, but it would be good to have more comprehensive install tests. These could be something like:
   
   ```sh
   export LIBARROW_BINARY=false
   export LIBARROW_BUILD=true
   export LIBARROW_DOWNLOAD=false
   export LIBARROW_MINIMAL=false
   
   # Make sure offline, feature-light installation works
   R -e "install.packages('arrow_x.y.z.p.tar.xz')
   R -e 'stopifnot(arrow::arrow_available(), isFALSE(arrow::arrow_info()$capabilities["parquet"]))'
   
   # Download and install the thirdparty features
   R -e "arrow::download_optional_dependencies('arrow-thirdparty')"
   source arrow-thirdparty/DEFINE_ENV_VARS.sh
   R -e "install.packages('arrow_x.y.z.p.tar.xz')
   R -e 'stopifnot(arrow::arrow_available(), isTRUE(arrow::arrow_info()$capabilities["parquet"]))'
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912744484


   I'll give this one last read through before merging, but I think this is good to go. Thank you for all this work + taking the journey with us as we found the best way to accomplish this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r697552783



##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access

Review comment:
       I'm happy to move it over, and it would definitely be a smoother process than installing twice, but how do you want to handle the `thirdparty/download_dependencies.sh` and `thirdparty/versions.txt` files?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696870523



##########
File path: r/tools/nixlibs.R
##########
@@ -329,18 +288,22 @@ build_libarrow <- function(src_dir, dst_dir) {
     LDFLAGS = R_CMD_config("LDFLAGS")
   )
   env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
+  # Add env variables like ARROW_S3=ON. Order doesn't matter. Depends on `download_ok`
   env_vars <- with_s3_support(env_vars)
   env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
-  }
+  env_vars <- with_jemalloc(env_vars)

Review comment:
       If I read versions.txt correctly, I think you could get the env vars from the files like:
   
   ```
   files <- dir(Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR"))
   toupper(sub("(.*?)-.*", "ARROW_\\1_URL", files))
    [1] "ARROW_ABSL_URL"       "ARROW_AWS_URL"        "ARROW_AWS_URL"       
    [4] "ARROW_AWS_URL"        "ARROW_AWS_URL"        "ARROW_BOOST_URL"     
    [7] "ARROW_BROTLI_URL"     "ARROW_BZIP2_URL"      "ARROW_CARES_URL"     
   [10] "ARROW_GBENCHMARK_URL" "ARROW_GFLAGS_URL"     "ARROW_GLOG_URL"      
   [13] "ARROW_GRPC_URL"       "ARROW_GTEST_URL"      "ARROW_JEMALLOC_URL"  
   [16] "ARROW_LZ4_URL"        "ARROW_MIMALLOC_URL"   "ARROW_ORC_URL"       
   [19] "ARROW_PROTOBUF_URL"   "ARROW_RAPIDJSON_URL"  "ARROW_RE2_URL"       
   [22] "ARROW_SNAPPY_URL"     "ARROW_THRIFT_URL"     "ARROW_UTF8PROC_URL"  
   [25] "ARROW_XSIMD_URL"      "ARROW_ZLIB_URL"       "ARROW_ZSTD_URL"    
   ```
   
   though the AWS ones need some special handling. I would just take whatever is in that dir and set those, don't worry about any being missing or whether someone has already set an env var for one of these (seems unlikely and worth discouraging). 
   
   Agree that the download function should just return the dir it downloaded to. Could also print a message about setting that env var before building.
   
   Solaris doesn't turn JSON off because downloading isn't the problem on Solaris, and rapidjson compiles fine. It's somehow handled differently than the other third party dependencies so it doesn't go into `libarrow_bundled_dependencies.a`, the building of which seemed to be the problem on Solaris. Though I would be fine disabling it on solaris too, once that is an option--it just wasn't necessary when we were trying to get a passing build there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane closed pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane closed pull request #11001:
URL: https://github.com/apache/arrow/pull/11001


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r699278771



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,66 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#' @param download_dependencies_sh location of the dependency download script,
+#' defaults to the one included with the arrow package.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `download_optional_dependencies(my_dependencies)`
+#' * Copy the directory `my-arrow-dependencies` to the computer without internet access
+#'
+#' ### On the computer without internet access, use the pre-downloaded dependencies:
+#' * Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied `my_dependencies`.
+#' * Install the `arrow` package
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(
+  deps_dir = Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR"),
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh = system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE

Review comment:
       To start, I've made this an argument to the function so that we can call it without installing in CI. We could also do this as an environment variable like we do for `deps_dir` (either internally or as an argument here). I don't have strong feelings one way or the other, though since this is pretty internal-use / CI-use only we might be best off not exposing this as an argument at all.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909185913


   @github-actions crossbow submit -g r


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r697645741



##########
File path: r/tests/testthat/test-install-arrow.R
##########
@@ -37,3 +37,20 @@ r_only({
     })
   })
 })
+
+
+r_only({
+  test_that("download_optional_dependencies", {
+    skip_if_offline()
+    deps_dir <- tempfile()
+    download_successful <- expect_output(
+      download_optional_dependencies(deps_dir),
+      "export ARROW_THRIFT_URL"

Review comment:
       I've deleted the test, but I'll leave this conversation open so we can talk about where to put the test in CI.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912089761


   Revision: 2d73ac19dcfaec0127c7590ce5809e9f15e874a1
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-812](https://github.com/ursacomputing/crossbow/branches/all?query=actions-812)
   
   |Task|Status|
   |----|------|
   |test-r-offline-maximal|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-812-github-test-r-offline-maximal)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-812-github-test-r-offline-maximal)|
   |test-r-offline-minimal|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-812-azure-test-r-offline-minimal)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-812-azure-test-r-offline-minimal)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698569604



##########
File path: r/tools/nixlibs.R
##########
@@ -82,7 +91,7 @@ download_binary <- function(os = identify_os()) {
 # * `TRUE` (not case-sensitive), to try to discover your current OS, or
 # * some other string, presumably a related "distro-version" that has binaries
 #   built that work for your OS
-identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", Sys.getenv("LIBARROW_DOWNLOAD"))) {
+identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", Sys.getenv("TEST_OFFLINE_BUILD"))) {

Review comment:
       Good catch. I think it should actually just look at `LIBARROW_BINARY`:
   
   ```r
   identify_os <- function(os = Sys.getenv("LIBARROW_BINARY")) {
     ...
   ```
   
   It's maybe worth noting that:
   * `identify_os` won't be called at all when `TEST_OFFLINE_BUILD` is `true` (but could be called if it was set to anything else)
   * At an earlier step, `configure` sets `LIBARROW_BINARY=true` if it was unset and `NOT_CRAN` is `true`
   
   https://github.com/apache/arrow/blob/5a13cbf81ee66172b63341d20acf51efc03d0c97/r/tools/nixlibs.R#L581-L583
   
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701504849



##########
File path: dev/tasks/r/github.linux.offline.build.yml
##########
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# NOTE: must set "Crossbow" as name to have the badge links working in the
+# github comment reports!
+name: Crossbow
+
+on:
+  push
+
+jobs:
+  grab-dependencies:
+    name: "Download thirdparty dependencies"
+    runs-on: ubuntu-20.04
+    strategy:
+      fail-fast: false
+    env:
+      ARROW_R_DEV: "TRUE"
+      RSPM: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"
+    steps:
+      - name: Checkout Arrow
+        run: |
+          git clone --no-checkout {{ arrow.remote }} arrow
+          git -C arrow fetch -t {{ arrow.remote }} {{ arrow.branch }}
+          git -C arrow checkout FETCH_HEAD
+          git -C arrow submodule update --init --recursive
+      - name: Free Up Disk Space
+        shell: bash
+        run: arrow/ci/scripts/util_cleanup.sh
+      - name: Fetch Submodules and Tags
+        shell: bash
+        run: cd arrow && ci/scripts/util_checkout.sh
+      - uses: r-lib/actions/setup-r@v1
+      - name: Pull Arrow dependencies
+        run: |
+          cd arrow/r
+          # This is `make build`, but with no vignettes and not running `make doc`
+          cp ../NOTICE.txt inst/NOTICE.txt
+          rsync --archive --delete ../cpp tools/
+          cp -p ../.env tools/
+          cp -p ../NOTICE.txt tools/
+          cp -p ../LICENSE.txt tools/
+          R CMD build --no-build-vignettes --no-manual .
+          built_tar=$(ls -1 arrow*.tar.gz | head -n 1)
+          R -e "source('R/install-arrow.R'); create_package_with_all_dependencies(dest_file = 'arrow_with_deps.tar.gz', source_file = \"${built_tar}\")"
+        shell: bash
+      - name: Upload the third party dependency artifacts
+        uses: actions/upload-artifact@v2
+        with:
+          name: thirdparty_deps
+          path: arrow/r/arrow_with_deps.tar.gz
+
+  intall-offline:
+    name: "Install offline"
+    needs: [grab-dependencies]
+    runs-on: ubuntu-20.04
+    strategy:
+      fail-fast: false
+    env:
+      ARROW_R_DEV: "TRUE"
+      RSPM: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"
+    steps:
+      - name: Checkout Arrow
+        run: |
+          git clone --no-checkout {{ arrow.remote }} arrow
+          git -C arrow fetch -t {{ arrow.remote }} {{ arrow.branch }}
+          git -C arrow checkout FETCH_HEAD
+          git -C arrow submodule update --init --recursive
+      - uses: r-lib/actions/setup-r@v1
+      - name: Download artifacts
+        uses: actions/download-artifact@v2
+        with:
+          name: thirdparty_deps
+          path: arrow/r/arrow_with_deps.tar.gz

Review comment:
       ```suggestion
             path: arrow/r/
   ```
   
   I'm not certain this will work, but I'm going to try.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909186696


   Revision: 2410d5574db9c03d41c7faf70480c8189bcf69c9
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-805](https://github.com/ursacomputing/crossbow/branches/all?query=actions-805)
   
   |Task|Status|
   |----|------|
   |conda-linux-gcc-py36-cpu-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-linux-gcc-py36-cpu-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-linux-gcc-py36-cpu-r40)|
   |conda-linux-gcc-py37-cpu-r41|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-linux-gcc-py37-cpu-r41)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-linux-gcc-py37-cpu-r41)|
   |conda-osx-clang-py36-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-osx-clang-py36-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-osx-clang-py36-r40)|
   |conda-osx-clang-py37-r41|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-osx-clang-py37-r41)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-osx-clang-py37-r41)|
   |conda-win-vs2017-py36-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-win-vs2017-py36-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-win-vs2017-py36-r40)|
   |conda-win-vs2017-py37-r41|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-win-vs2017-py37-r41)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-win-vs2017-py37-r41)|
   |homebrew-r-autobrew|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-homebrew-r-autobrew)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-homebrew-r-autobrew)|
   |test-r-depsource-auto|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-depsource-auto)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-depsource-auto)|
   |test-r-depsource-system|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-depsource-system)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-depsource-system)|
   |test-r-devdocs|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-devdocs)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-devdocs)|
   |test-r-gcc-11|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-gcc-11)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-gcc-11)|
   |test-r-install-local|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-install-local)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-install-local)|
   |test-r-linux-as-cran|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-linux-as-cran)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-linux-as-cran)|
   |test-r-linux-rchk|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-linux-rchk)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-linux-rchk)|
   |test-r-linux-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-linux-valgrind)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-linux-valgrind)|
   |test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-minimal-build)|
   |test-r-offline-maximal|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-offline-maximal)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-offline-maximal)|
   |test-r-offline-minimal|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-offline-minimal)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-offline-minimal)|
   |test-r-rhub-debian-gcc-devel-lto-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rhub-debian-gcc-devel-lto-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rhub-debian-gcc-devel-lto-latest)|
   |test-r-rhub-ubuntu-gcc-release-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rhub-ubuntu-gcc-release-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rhub-ubuntu-gcc-release-latest)|
   |test-r-rocker-r-base-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rocker-r-base-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rocker-r-base-latest)|
   |test-r-rstudio-r-base-3.6-bionic|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-bionic)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-bionic)|
   |test-r-rstudio-r-base-3.6-centos7-devtoolset-8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)|
   |test-r-rstudio-r-base-3.6-centos8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-centos8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-centos8)|
   |test-r-rstudio-r-base-3.6-opensuse15|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-opensuse15)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-opensuse15)|
   |test-r-rstudio-r-base-3.6-opensuse42|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-opensuse42)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-opensuse42)|
   |test-r-ubuntu-21.04|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-ubuntu-21.04)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-ubuntu-21.04)|
   |test-r-version-compatibility|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-version-compatibility)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-version-compatibility)|
   |test-r-versions|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-versions)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-versions)|
   |test-ubuntu-18.04-r-sanitizer|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-ubuntu-18.04-r-sanitizer)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-ubuntu-18.04-r-sanitizer)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696771438



##########
File path: r/inst/build_arrow_static.sh
##########
@@ -59,7 +59,7 @@ ${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \
     -DARROW_FILESYSTEM=ON \
     -DARROW_JEMALLOC=${ARROW_JEMALLOC:-$ARROW_DEFAULT_PARAM} \
     -DARROW_MIMALLOC=${ARROW_MIMALLOC:-ON} \
-    -DARROW_JSON=ON \
+    -DARROW_JSON=${ARROW_JSON:-ON} \

Review comment:
       Done! ARROW-13768
   Should I leave this change as it is, or revert to `-DARROW_JSON=ON`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908805878


   @github-actions crossbow submit test-r-offline-minimal test-r-offline-maximal 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696708048



##########
File path: r/tools/nixlibs.R
##########
@@ -209,75 +222,21 @@ find_available_binary <- function(os) {
   os
 }
 
-download_source <- function() {
-  tf1 <- tempfile()
-  src_dir <- tempfile()
-
-  # Given VERSION as x.y.z.p
-  p <- package_version(VERSION)[1, 4]
-  if (is.na(p) || p < 1000) {
-    # This is either just x.y.z or it has a small (R-only) patch version
-    # Download from the official Apache release, dropping the p
-    VERSION <- as.character(package_version(VERSION)[1, -4])
-    if (apache_download(VERSION, tf1)) {
-      untar(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/apache-arrow-", VERSION, "/cpp")
-    }
-  } else if (p != 9000) {
-    # This is a custom dev version (x.y.z.9999) or a nightly (x.y.z.20210505)
-    # (Don't try to download on the default dev .9000 version)
-    if (nightly_download(VERSION, tf1)) {
-      unzip(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/cpp")
-    }
-  }
-
-  if (dir.exists(src_dir)) {
-    cat("*** Successfully retrieved C++ source\n")
-    options(.arrow.cleanup = c(getOption(".arrow.cleanup"), src_dir))
-    # These scripts need to be executable
-    system(
-      sprintf("chmod 755 %s/build-support/*.sh", src_dir),
-      ignore.stdout = quietly, ignore.stderr = quietly
-    )
-    return(src_dir)
-  } else {
-    return(NULL)
-  }
-}
-
-nightly_download <- function(version, destfile) {
-  source_url <- paste0(arrow_repo, "src/arrow-", version, ".zip")
-  try_download(source_url, destfile)
-}
-
-apache_download <- function(version, destfile, n_mirrors = 3) {
-  apache_path <- paste0("arrow/arrow-", version, "/apache-arrow-", version, ".tar.gz")
-  apache_urls <- c(
-    # This returns a different mirror each time
-    rep("https://www.apache.org/dyn/closer.lua?action=download&filename=", n_mirrors),
-    "https://downloads.apache.org/" # The backup
+find_local_source <- function() {
+  # We'll take the first of these that exists
+  # The first case probably occurs if we're in the arrow git repo
+  # The second probably occurs if we're installing the arrow R package
+  cpp_dir_options <- c(
+    Sys.getenv("ARROW_SOURCE_HOME", ".."),
+    "tools/cpp"
   )
-  downloaded <- FALSE
-  for (u in apache_urls) {
-    downloaded <- try_download(paste0(u, apache_path), destfile)
-    if (downloaded) {
-      break
-    }
-  }
-  downloaded
-}
-
-find_local_source <- function(arrow_home = Sys.getenv("ARROW_SOURCE_HOME", "..")) {
-  if (file.exists(paste0(arrow_home, "/cpp/src/arrow/api.h"))) {
-    # We're in a git checkout of arrow, so we can build it
-    cat("*** Found local C++ source\n")
-    return(paste0(arrow_home, "/cpp"))
-  } else {
+  valid_cpp_dir <- file.exists(file.path(cpp_dir_options, "src/arrow/api.h"))
+  if (!any(valid_cpp_dir)) {
     return(NULL)
   }
+  cpp_dir <- cpp_dir_options[valid_cpp_dir][1]
+  cat(paste0("*** Found local C++ source:\n    '", cpp_dir, "'\n"))
+  cpp_dir

Review comment:
       I think the intent reads more clearly this way
   
   ```suggestion
     for (cpp_dir in cpp_dir_options) {
       if (file.exists(file.path(cpp_dir, "cpp/src/arrow/api.h"))) {
         cat(paste0("*** Found local C++ source: '", cpp_dir, "'\n"))
         return(cpp_dir)
       }
     }
     NULL
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908806136


   Revision: 479b054f7549d1265dfec308e46aabc79844fee0
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-804](https://github.com/ursacomputing/crossbow/branches/all?query=actions-804)
   
   |Task|Status|
   |----|------|
   |test-r-offline-maximal|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-804-github-test-r-offline-maximal)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-804-github-test-r-offline-maximal)|
   |test-r-offline-minimal|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-804-azure-test-r-offline-minimal)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-804-azure-test-r-offline-minimal)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698758582



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package
+#' - Run this function
+#' - Copy the saved dependency files to the computer with internet access
+#'
+#' ### On the computer without internet access:
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied folder of dependency files.
+#' - Install the `arrow` package
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir = NULL) {
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  if (is.null(deps_dir) && Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR") != "") {
+    deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  }
+
+  dir.create(deps_dir, showWarnings = FALSE, recursive = TRUE)
+  # Run download_dependencies.sh
+  cat(paste0("*** Downloading optional dependencies to ", deps_dir, "\n"))
+  return_status <- system2(download_dependencies_sh,
+    args = deps_dir, stdout = FALSE, stderr = FALSE
+  )
+  if (isTRUE(return_status == 0)) {
+    cat(paste0(
+      "**** Set environment variable on offline machine and re-build arrow:\n",

Review comment:
       As I'm thinking about what to write, I feel like I'm just duplicating the help text. What about this message instead? (Or no message at all.)
   ```
   **** Download successful to <directory>
        See ?download_optional_dependencies for more details.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r697026914



##########
File path: r/tools/nixlibs.R
##########
@@ -415,10 +389,134 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
+turn_off_thirdparty_features <- function(env_vars) {
+
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC=OFF",
+    "ARROW_JEMALLOC=OFF",
+    "ARROW_PARQUET=OFF", # depends on thrift
+    "ARROW_DATASET=OFF", # depends on parquet
+    "ARROW_S3=OFF",
+    "ARROW_WITH_BROTLI=OFF",
+    "ARROW_WITH_BZ2=OFF",
+    "ARROW_WITH_LZ4=OFF",
+    "ARROW_WITH_SNAPPY=OFF",
+    "ARROW_WITH_ZLIB=OFF",
+    "ARROW_WITH_ZSTD=OFF",
+    "ARROW_WITH_RE2=OFF",
+    "ARROW_WITH_UTF8PROC=OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON=OFF",
+    # The syntax to turn off XSIMD is different.
+    'EXTRA_CMAKE_FLAGS="-DARROW_SIMD_LEVEL=NONE"'
+  )
+  if (Sys.getenv("EXTRA_CMAKE_FLAGS") != "") {
+    # Error rather than overwriting EXTRA_CMAKE_FLAGS
+    # (Correctly inserting the flag into an existing quoted string is tricky)
+    stop("Sorry, setting EXTRA_CMAKE_FLAGS is not supported at this time.")
+  }
+  paste(env_vars, paste(turn_off, collapse = " "))
+}
+
+set_thirdparty_urls <- function(env_vars) {
+  deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  files <- list.files(deps_dir, full.names = FALSE)
+  if (length(files) == 0) {
+    # This will be true if the variable is unset, if it's set but the directory
+    # doesn't exist, or if it exists but is empty.
+    return(env_vars)
+  }
+  dep_names <- c(
+    "absl", # not used; seems to be a dependency of gRPC
+    "aws-sdk-cpp",
+    "aws-checksums",
+    "aws-c-common",
+    "aws-c-event-stream",
+    "boost",
+    "brotli",
+    "bzip2",
+    "cares", # not used; "a dependency of gRPC"
+    "gbenchmark", # not used; "Google benchmark, for testing"
+    "gflags", # not used; "for command line utilities (formerly Googleflags)"
+    "glog", # not used; "for logging"
+    "grpc", # not used; "for remote procedure calls"
+    "gtest", # not used; "Googletest, for testing"
+    "jemalloc",
+    "lz4",
+    "mimalloc",
+    "orc", # not used; "for Apache ORC format support"
+    "protobuf", # not used; "Google Protocol Buffers, for data serialization"
+    "rapidjson",
+    "re2",
+    "snappy",
+    "thrift",
+    "utf8proc",
+    "xsimd",
+    "zlib",
+    "zstd"
+  )
+  dep_regex <- paste0("^(", paste(dep_names, collapse = "|"), ").*")
+  # If there were extra files in the folder (not matching our regex) drop them.
+  files <- files[grepl(dep_regex, files, perl = TRUE)]
+  # Convert e.g. "thrift-0.13.0.tar.gz" to ARROW_THRIFT_URL
+  # Note that if there's no file called thrift*, we won't add
+  # ARROW_THRIFT_URL to env_vars.
+  url_env_varname <- sub(dep_regex, "ARROW_\\1_URL", files, perl = TRUE)
+  url_env_varname <- toupper(gsub("-", "_", url_env_varname, fixed = TRUE))
+  # Special case: ARROW_AWSSDK_URL for aws-sdk-cpp-<version>.tar.gz
+  url_env_varname <- sub("ARROW_AWS_SDK_CPP_URL", "ARROW_AWSSDK_URL", url_env_varname, fixed = TRUE)

Review comment:
       This isn't the most clear. Let me know if you want a different approach! 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909734070


   > edit: are you sure you want that? the next line checks if `(!dir.exists())` and returns
   
   My goal was to figure out if we're building using the downloaded dependencies by checking if the `download` folder exists in `tools/cpp/thirdparty`. Whether or not we're building the downloaded dependencies, I thought the `tools/cpp/thirdparty` folder should always exist, since it gets copied in by `make build`. This `stopifnot` was a way to check that assumption. (Put differently, testing if `tools/cpp/thirdparty/download` is missing doesn't tell me much if `tools/cpp/thirdparty` is also missing.)
   
   Does that seem reasonable? If you have a cleaner way, let me know!
   
   The windows builds were failing because I hadn't documented all of my arguments. Hopefully they pass now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-910265945


   > I thought the `tools/cpp/thirdparty` folder should always exist, since it gets copied in by `make build`.
   
   But tools/cpp is .gitignored, so that check may/will fail in a git checkout, as on CI. 
   
   I'll read over the latest iteration and see if I can suggest an alternative.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698674825



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       ```suggestion
   #' - If you don't already have the `arrow` package installed, get this function by
   #' `source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R")`
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -373,7 +374,15 @@ ensure_cmake <- function() {
     )
     cmake_tar <- tempfile()
     cmake_dir <- tempfile()
-    try_download(cmake_binary_url, cmake_tar)
+    download_successful <- try_download(cmake_binary_url, cmake_tar)
+    if (!download_successful) {
+      cat(paste0(
+        "*** cmake was not found locally and download failed.\n",
+        "    Make sure cmake is installed and available on your PATH\n",
+        "    (or download '", cmake_binary_url,
+        "' and define the CMAKE environment variable).\n"
+      ))

Review comment:
       ```suggestion
         cat(paste0(
           "*** cmake was not found locally and download failed.\n",
           "    Make sure cmake >= 3.10 is installed and available on your PATH,\n",
           "    or download ", cmake_binary_url, "\n",
           "    and define the CMAKE environment variable.\n"
         ))
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package
+#' - Run this function
+#' - Copy the saved dependency files to the computer with internet access
+#'
+#' ### On the computer without internet access:
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied folder of dependency files.
+#' - Install the `arrow` package
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir = NULL) {
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  if (is.null(deps_dir) && Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR") != "") {
+    deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  }
+
+  dir.create(deps_dir, showWarnings = FALSE, recursive = TRUE)
+  # Run download_dependencies.sh
+  cat(paste0("*** Downloading optional dependencies to ", deps_dir, "\n"))
+  return_status <- system2(download_dependencies_sh,
+    args = deps_dir, stdout = FALSE, stderr = FALSE
+  )
+  if (isTRUE(return_status == 0)) {
+    cat(paste0(
+      "**** Set environment variable on offline machine and re-build arrow:\n",

Review comment:
       Should this message also tell you to copy the directory to the other machine?

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package
+#' - Run this function
+#' - Copy the saved dependency files to the computer with internet access
+#'
+#' ### On the computer without internet access:
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied folder of dependency files.
+#' - Install the `arrow` package
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir = NULL) {
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  if (is.null(deps_dir) && Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR") != "") {
+    deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  }

Review comment:
       ```suggestion
   download_optional_dependencies <- function(deps_dir = Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")) {
     # This script is copied over from arrow/cpp/... to arrow/r/inst/...
     download_dependencies_sh <- system.file(
       "thirdparty/download_dependencies.sh",
       package = "arrow",
       mustWork = TRUE
     )
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -304,10 +316,12 @@ By default, these are all unset. All boolean variables are case-insensitive.
   won't look for Arrow libraries on your system and instead will look to download/build them.
   Use this if you have a version mismatch between installed system libraries
   and the version of the R package you're installing.
-* `LIBARROW_DOWNLOAD`: Unless set to `false`, the build script
-  will attempt to download C++ binary or source bundles.
+* `TEST_OFFLINE_BUILD`: Unless set to `true`, the build script
+  will download prebuilt C++ binary or third-party source bundles as necessary.
   If you're in a checkout of the `apache/arrow` git repository

Review comment:
       ```suggestion
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,14 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `download_optional_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed. This function
+provides a way to download them in advance. Relevant environment variables are
+`ARROW_THIRDPARTY_DEPENDENCY_DIR` for the directory of downloaded dependencies
+and `TEST_OFFLINE_BUILD` to force the build process not to download.

Review comment:
       I don't think we should document this in this vignette--users should not worry with this env var, it's for us for testing

##########
File path: r/vignettes/install.Rmd
##########
@@ -304,10 +316,12 @@ By default, these are all unset. All boolean variables are case-insensitive.
   won't look for Arrow libraries on your system and instead will look to download/build them.
   Use this if you have a version mismatch between installed system libraries
   and the version of the R package you're installing.
-* `LIBARROW_DOWNLOAD`: Unless set to `false`, the build script
-  will attempt to download C++ binary or source bundles.
+* `TEST_OFFLINE_BUILD`: Unless set to `true`, the build script
+  will download prebuilt C++ binary or third-party source bundles as necessary.
   If you're in a checkout of the `apache/arrow` git repository
-  and want to build the C++ library from the local source, make this `false`.
+  and want to build the C++ library from the local source, make this `false` or
+  not set. If building the C++ library from source with cmake unavailable, cmake

Review comment:
       ```suggestion
     If building the C++ library from source with cmake unavailable, cmake
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -29,17 +29,8 @@ if (getRversion() < 3.4 && is.null(getOption("download.file.method"))) {
 options(.arrow.cleanup = character()) # To collect dirs to rm on exit
 on.exit(unlink(getOption(".arrow.cleanup")))
 
+

Review comment:
       ```suggestion
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +300,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it defined

Review comment:
       Why?

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,14 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `download_optional_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed. This function
+provides a way to download them in advance. Relevant environment variables are

Review comment:
       These sentences should probably mention the offline/airgapped server use case and how you'd use it. 

##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +422,144 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON" = "OFF",
+    # The syntax to turn off XSIMD is different.
+    # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+    "EXTRA_CMAKE_FLAGS" = paste(
+      env_var_list[["EXTRA_CMAKE_FLAGS"]],
+      "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+    )
+  )
+  # Create a new env_var_list, with the values of turn_off set.
+  # replace() also adds new values if they didn't exist before
+  replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+  # This function is run in most typical cases -- when download_ok is TRUE *or*
+  # ARROW_THIRDPARTY_DEPENDENCY_DIR is set. It does *not* check if existing
+  # *_SOURCE_URL variables are set. (It is also run whenever ARROW_DEPENDENCY_SOURCE
+  # is "SYSTEM", but doesn't affect the build in that case.)
+  deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  if (deps_dir == "") {
+    return(env_var_list)
+  }
+  files <- list.files(deps_dir, full.names = FALSE)
+  if (length(files) == 0) {
+    # This will be true if the directory doesn't exist, or if it exists but is empty.
+    # Here the build will continue, but will likely fail when the downloads are
+    # unavailable. The user will end up with the arrow-without-arrow package.
+    cat(paste0(
+      "*** Error: ARROW_THIRDPARTY_DEPENDENCY_DIR was set but has no files.\n",

Review comment:
       ```suggestion
         "*** Warning: ARROW_THIRDPARTY_DEPENDENCY_DIR was set but has no files.\n",
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +43,24 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+# But binary defaults to not OK
+binary_ok <- !identical(tolower(Sys.getenv("LIBARROW_BINARY", "false")), "false")
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more
+
+quietly <- !env_is("ARROW_R_DEV", "true") # try_download uses quietly global
+# * download_ok, build_ok: Use prebuilt binary, if found, otherwise try to build
+# * !download_ok, build_ok: Build with local git checkout, if available, or
+#   sources included in r/tools/cpp/. Optional dependencies are not included,
+#   and will not be automatically downloaded.
+#   cmake will still be downloaded if necessary
+#   https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+# * download_ok, !build_ok: Only use prebuilt binary, if found
+# * neither: Get the arrow-without-arrow package
+# Download and build are OK unless you say not to (or can't access github)
+download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && try_download("https://github.com", tempfile())
+
+

Review comment:
       ```suggestion
   # For local debugging, set ARROW_R_DEV=TRUE to make this script print more
   quietly <- !env_is("ARROW_R_DEV", "true")
   
   # Default is build from source, not download a binary
   build_ok <- !env_is("LIBARROW_BUILD", "false")
   binary_ok <- !identical(tolower(Sys.getenv("LIBARROW_BINARY", "false")), "false")
   
   # Check if we're doing an offline build.
   # (Note that cmake will still be downloaded if necessary
   #  https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
   download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && try_download("https://github.com", tempfile())
   
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -343,6 +357,7 @@ By default, these are all unset. All boolean variables are case-insensitive.
 * `CMAKE`: When building the C++ library from source, you can specify a
   `/path/to/cmake` to use a different version than whatever is found on the `$PATH`
 
+

Review comment:
       ```suggestion
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Oh, I guess you're also relying on the package installation to deliver the download_dependencies.sh and versions.txt scripts?

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Yeah that makes sense. I was hoping to avoid the sound of "to install arrow, first install arrow". 

##########
File path: r/vignettes/install.Rmd
##########
@@ -285,17 +309,28 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or (3) uninstalling
 the conflicting `zstd`.
 See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
 
+* Offline installation fails when dependencies haven't been downloaded to
+`ARROW_THIRDPARTY_DEPENDENCY_DIR`. The package currently depends on the
+third-party project RapidJSON. See `?download_optional_dependencies`.
+See discussion [here](https://issues.apache.org/jira/browse/ARROW-13768) on

Review comment:
       We should just solve this rather than document the exception, IMO

##########
File path: r/vignettes/install.Rmd
##########
@@ -342,6 +373,15 @@ By default, these are all unset. All boolean variables are case-insensitive.
   The directory will be created if it does not exist.
 * `CMAKE`: When building the C++ library from source, you can specify a
   `/path/to/cmake` to use a different version than whatever is found on the `$PATH`
+* `ARROW_THIRDPARTY_DEPENDENCY_DIR`: Directory with downloaded third-party
+  dependency files. Run `download_optional_dependencies(my-dir)` to download.
+* `TEST_OFFLINE_BUILD`: When set to `true`, the build script will not download

Review comment:
       A better place for this would be in the developing.Rmd vignette (we have another TEST_R_WITHOUT_LIBARROW env var that could also be documented there too, like this one it's not something a package user would ever want to do)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson edited a comment on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson edited a comment on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909640938


   > Tests are currently having errors because of this line:
   > 
   > https://github.com/apache/arrow/blob/6daff455ad1e4c5ac4c84bda5711bdb5c30b6156/r/tools/nixlibs.R#L466
   > 
   > That directory (`tools/cpp/thirdparty`) would exist if `make build` had been run. Any suggestions?
   
   I can investigate later, though that wouldn't explain why the windows builds are failing since that script doesn't get called there
   
   edit: are you sure you want that? the next line checks if (!dir.exists()) and returns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912154998


   @github-actions crossbow submit test-r-offline-maximal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909581290


   A couple of things:
   
   * I merged here (because I wanted the fixes from ARROW-13776), but let me know if you want to rebase for a cleaner series of commits.
   * I removed `download_optional_dependencies` but didn't update `test-r-offline-maximal`
   * Let me know if you want a different name than `create_package_with_all_dependencies`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-907645394


   @github-actions crossbow submit -g test-r-offline-minimal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701097959



##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +42,24 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more
+quietly <- !env_is("ARROW_R_DEV", "true")
+
+# Default is build from source, not download a binary
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+binary_ok <- !(env_is("LIBARROW_BINARY", "false") || env_is("LIBARROW_BINARY", ""))
+
+# Check if we're doing an offline build.
+# (Note that cmake will still be downloaded if necessary
+#  https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
+download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && try_download("https://github.com", tempfile())
+
+# This path, within the tar file, might exist if
+# create_package_with_all_dependencies() was run. Otherwise, it won't, but
+# tools/cpp/thirdparty/ still will.

Review comment:
       `tools/cpp/thirdparty/` isn't guaranteed to exist in a git checkout

##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +421,137 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON" = "OFF",
+    # The syntax to turn off XSIMD is different.
+    # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+    "EXTRA_CMAKE_FLAGS" = paste(
+      env_var_list[["EXTRA_CMAKE_FLAGS"]],
+      "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+    )
+  )
+  # Create a new env_var_list, with the values of turn_off set.
+  # replace() also adds new values if they didn't exist before
+  replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+  # This function does *not* check if existing *_SOURCE_URL variables are set.
+  # The directory tools/cpp/thirdparty/download is created by
+  # create_package_with_all_dependencies() and saved in the tar file.
+  # In all other cases, where we're not installing from that offline tar file,
+  # that directory won't exist, but tools/cpp/thirdparty/ still should.
+  # Test tools/cpp/thirdparty to avoid false negatives.
+  deps_dir <- thirdparty_dependency_dir # defined at the top
+  stopifnot(dir.exists(dirname(thirdparty_dependency_dir)))
+  if (!dir.exists(deps_dir)) {
+    return(env_var_list)
+  }
+  files <- list.files(deps_dir, full.names = FALSE)
+  url_env_varname <- toupper(sub("(.*?)-.*", "ARROW_\\1_URL", files))
+  # Special handling for the aws dependencies, which have extra `-`
+  aws <- grepl("^aws", files)
+  url_env_varname[aws] <- sub(
+    "AWS_SDK_CPP", "AWSSDK",
+    gsub(
+      "-", "_",
+      sub(
+        "(AWS.*)-.*", "ARROW_\\1_URL",
+        toupper(files[aws])
+      )
+    )
+  )
+  full_filenames <- file.path(normalizePath(deps_dir), files)

Review comment:
       ```suggestion
     full_filenames <- file.path(normalizePath(thirdparty_dependency_dir), files)
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +299,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it defined
+    EXTRA_CMAKE_FLAGS = Sys.getenv("EXTRA_CMAKE_FLAGS"),
     # Make sure we build with the same compiler settings that R is using
     CC = R_CMD_config("CC"),
     CXX = paste(R_CMD_config("CXX11"), R_CMD_config("CXX11STD")),
     # CXXFLAGS = R_CMD_config("CXX11FLAGS"), # We don't want the same debug symbols
     LDFLAGS = R_CMD_config("LDFLAGS")
   )
-  env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
-  env_vars <- with_s3_support(env_vars)
-  env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
+  env_var_list <- with_s3_support(env_var_list)
+  env_var_list <- with_mimalloc(env_var_list)
+  # turn_off_thirdparty_features() needs to happen after with_mimalloc() and
+  # with_s3_support(), since those might turn features ON.
+  thirdparty_deps_unavailable <- !download_ok &&
+    !dir.exists(thirdparty_dependency_dir) &&
+    !env_is("ARROW_DEPENDENCY_SOURCE", "system")
+  if (is_solaris()) {
+    # Note that JSON support does work on Solaris, but will be turned off with
+    # the rest of the thirdparty dependencies (when ARROW-13768 is resolved and
+    # JSON can be turned off at all). All other dependencies don't compile
+    # (e.g thrift, jemalloc, and xsimd) or do compile but `ar` fails to build
+    # libarrow_bundled_dependencies (e.g. re2 and utf8proc).
+    env_var_list <- turn_off_thirdparty_features(env_var_list)
+  } else if (thirdparty_deps_unavailable) {
+    cat(paste0(
+      "*** Building C++ library from source, but downloading thirdparty dependencies\n",
+      "    is not possible, so this build will turn off all thirdparty features.\n",
+      "    See install vignette for details:\n",
+      "    https://cran.r-project.org/web/packages/arrow/vignettes/install.html\n"
+    ))
+    env_var_list <- turn_off_thirdparty_features(env_var_list)
+  } else {
+    # If thirdparty_dependency_dir exists, the *_SOURCE_URL env vars

Review comment:
       How about this?
   
   ```suggestion
     } else if (dir.exists(thirdparty_dependency_dir)) {
       # Add the *_SOURCE_URL env vars
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +421,137 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON" = "OFF",
+    # The syntax to turn off XSIMD is different.
+    # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+    "EXTRA_CMAKE_FLAGS" = paste(
+      env_var_list[["EXTRA_CMAKE_FLAGS"]],
+      "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+    )
+  )
+  # Create a new env_var_list, with the values of turn_off set.
+  # replace() also adds new values if they didn't exist before
+  replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+  # This function does *not* check if existing *_SOURCE_URL variables are set.
+  # The directory tools/cpp/thirdparty/download is created by
+  # create_package_with_all_dependencies() and saved in the tar file.
+  # In all other cases, where we're not installing from that offline tar file,
+  # that directory won't exist, but tools/cpp/thirdparty/ still should.
+  # Test tools/cpp/thirdparty to avoid false negatives.
+  deps_dir <- thirdparty_dependency_dir # defined at the top
+  stopifnot(dir.exists(dirname(thirdparty_dependency_dir)))
+  if (!dir.exists(deps_dir)) {
+    return(env_var_list)
+  }
+  files <- list.files(deps_dir, full.names = FALSE)

Review comment:
       ```suggestion
     files <- list.files(thirdparty_dependency_dir, full.names = FALSE)
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +421,137 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON" = "OFF",
+    # The syntax to turn off XSIMD is different.
+    # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+    "EXTRA_CMAKE_FLAGS" = paste(
+      env_var_list[["EXTRA_CMAKE_FLAGS"]],
+      "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+    )
+  )
+  # Create a new env_var_list, with the values of turn_off set.
+  # replace() also adds new values if they didn't exist before
+  replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+  # This function does *not* check if existing *_SOURCE_URL variables are set.
+  # The directory tools/cpp/thirdparty/download is created by
+  # create_package_with_all_dependencies() and saved in the tar file.
+  # In all other cases, where we're not installing from that offline tar file,
+  # that directory won't exist, but tools/cpp/thirdparty/ still should.
+  # Test tools/cpp/thirdparty to avoid false negatives.
+  deps_dir <- thirdparty_dependency_dir # defined at the top
+  stopifnot(dir.exists(dirname(thirdparty_dependency_dir)))
+  if (!dir.exists(deps_dir)) {
+    return(env_var_list)
+  }
+  files <- list.files(deps_dir, full.names = FALSE)
+  url_env_varname <- toupper(sub("(.*?)-.*", "ARROW_\\1_URL", files))
+  # Special handling for the aws dependencies, which have extra `-`
+  aws <- grepl("^aws", files)
+  url_env_varname[aws] <- sub(
+    "AWS_SDK_CPP", "AWSSDK",
+    gsub(
+      "-", "_",
+      sub(
+        "(AWS.*)-.*", "ARROW_\\1_URL",
+        toupper(files[aws])
+      )
+    )
+  )
+  full_filenames <- file.path(normalizePath(deps_dir), files)
+
+  env_var_list <- replace(env_var_list, url_env_varname, full_filenames)
+  if (env_is("ARROW_R_DEV", "true")) {

Review comment:
       ```suggestion
     if (!quietly) {
   ```

##########
File path: r/vignettes/developing.Rmd
##########
@@ -107,6 +107,7 @@ You can choose to build and then install the Arrow library into a user-defined d
 
 It is recommended that you install the arrow library to a user-level directory to be used in development. This is so that the development version you are using doesn't overwrite a released version of Arrow you may have installed. You are also able to have more than one version of the Arrow library to link to with this approach (by using different `ARROW_HOME` directories for the different versions). This approach also matches the recommendations for other Arrow bindings like [Python](http://arrow.apache.org/docs/developers/python.html).
 
+

Review comment:
       ```suggestion
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,42 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `create_package_with_all_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed.
+This function provides a way to download them in advance.
+Doing so may be useful when installing Arrow on a computer without internet access.
+Note that Arrow _can_ be installed on a computer without internet access, but
+many useful features will be disabled, as they depend on third-party components.
+More precisely, `arrow::arrow_info()$capabilities()` will be `FALSE` for every
+capability.
+One approach to add more capabilities in an offline install is to prepare a
+package with pre-downloaded dependencies. The
+`create_package_with_all_dependencies()` function does this preparation.
+
+### Using a computer with internet access, pre-download the dependencies:
+* Install the `arrow` package
+* Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+* Copy the newly created `my_arrow_pkg.tar.gz` to the computer without internet access
+
+### On the computer without internet access, install the prepared package:
+* Install the `arrow` package from the copied file (`install.packages("my_arrow_pkg.tar.gz")`)
+  * This installation will build from source, so `cmake` must be available
+* Run `arrow_info()` to check installed capabilities
+
+
+### Using a computer with internet access, pre-download the dependencies:
+* Install the `arrow` package
+* Run `download_optional_dependencies(my_dependencies)`
+* Copy the directory `my-arrow-dependencies` to the computer without internet access
+
+### On the computer without internet access, use the pre-downloaded dependencies:
+* Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+  points to the newly copied `my_dependencies`.
+* Install the `arrow` package
+  * This installation will build from source, so `cmake` must be available
+* Run `arrow_info()` to check installed capabilities
+

Review comment:
       This is stale, right?
   
   ```suggestion
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param outfile File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param package_source File path for the input tar.gz package. Defaults to
+#' downloading from CRAN.

Review comment:
       Technically it will download from wherever `options(repos)` says, which might not be CRAN (like, you could do this with our nightly package repository too).

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param outfile File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param package_source File path for the input tar.gz package. Defaults to
+#' downloading from CRAN.
+#' @param quietly boolean, default `TRUE`. If `FALSE`, narrate progress.
+#' @return The full path to `outfile`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+#' * Copy the newly created `my_arrow_pkg.tar.gz` to the computer without internet access
+#'
+#' ### On the computer without internet access, install the prepared package:
+#' * Install the `arrow` package from the copied file (`install.packages("my_arrow_pkg.tar.gz")`)
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#'
+#' @examples
+#' \dontrun{
+#' new_pkg <- create_package_with_all_dependencies()
+#' # Note: this works when run in the same R session, but it's meant to be
+#' # copied to a different computer.
+#' install.packages(new_pkg, dependencies = c("Depends", "Imports", "LinkingTo"))
+#' }
+#' @export
+create_package_with_all_dependencies <- function(outfile = NULL, package_source = NULL, quietly = TRUE) {

Review comment:
       Any reason we need `quietly` as an argument here (other than to make it quiet by default)? Seems like you could achieve the same with `suppressMessages()`.

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param outfile File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param package_source File path for the input tar.gz package. Defaults to
+#' downloading from CRAN.
+#' @param quietly boolean, default `TRUE`. If `FALSE`, narrate progress.
+#' @return The full path to `outfile`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+#' * Copy the newly created `my_arrow_pkg.tar.gz` to the computer without internet access
+#'
+#' ### On the computer without internet access, install the prepared package:
+#' * Install the `arrow` package from the copied file (`install.packages("my_arrow_pkg.tar.gz")`)
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#'
+#' @examples
+#' \dontrun{
+#' new_pkg <- create_package_with_all_dependencies()
+#' # Note: this works when run in the same R session, but it's meant to be
+#' # copied to a different computer.
+#' install.packages(new_pkg, dependencies = c("Depends", "Imports", "LinkingTo"))
+#' }
+#' @export
+create_package_with_all_dependencies <- function(outfile = NULL, package_source = NULL, quietly = TRUE) {

Review comment:
       Also, what do you think about a signature like this? Inputs before outputs, and make clear that both arguments are the same kind of thing (a string file path).
   
   ```
   create_package_with_all_dependencies <- function(source_file = NULL, dest_file = NULL) {
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +42,24 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more
+quietly <- !env_is("ARROW_R_DEV", "true")
+
+# Default is build from source, not download a binary
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+binary_ok <- !(env_is("LIBARROW_BINARY", "false") || env_is("LIBARROW_BINARY", ""))
+
+# Check if we're doing an offline build.
+# (Note that cmake will still be downloaded if necessary
+#  https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
+download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && try_download("https://github.com", tempfile())
+
+# This path, within the tar file, might exist if
+# create_package_with_all_dependencies() was run. Otherwise, it won't, but
+# tools/cpp/thirdparty/ still will.
+thirdparty_dependency_dir <- "tools/cpp/thirdparty/download"

Review comment:
       Is there any value in allowing this to be outside of the tarball still, like `Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR", "tools/cpp/thirdparty/download")`?

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param outfile File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param package_source File path for the input tar.gz package. Defaults to
+#' downloading from CRAN.
+#' @param quietly boolean, default `TRUE`. If `FALSE`, narrate progress.
+#' @return The full path to `outfile`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package

Review comment:
       Since we're not expecting things inside `inst/` anymore, you could also source(github_url/install-arrow.R) now, right?

##########
File path: dev/tasks/r/github.linux.offline.build.yml
##########
@@ -0,0 +1,111 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# NOTE: must set "Crossbow" as name to have the badge links working in the
+# github comment reports!
+name: Crossbow
+
+on:
+  push
+
+jobs:
+  grab-dependencies:
+    name: "Download thirdparty dependencies"
+    runs-on: ubuntu-20.04
+    strategy:
+      fail-fast: false
+    env:
+      ARROW_R_DEV: "TRUE"
+      RSPM: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"
+    steps:
+      - name: Checkout Arrow
+        run: |
+          git clone --no-checkout {{ arrow.remote }} arrow
+          git -C arrow fetch -t {{ arrow.remote }} {{ arrow.branch }}
+          git -C arrow checkout FETCH_HEAD
+          git -C arrow submodule update --init --recursive
+      - name: Free Up Disk Space
+        shell: bash
+        run: arrow/ci/scripts/util_cleanup.sh
+      - name: Fetch Submodules and Tags
+        shell: bash
+        run: cd arrow && ci/scripts/util_checkout.sh
+      - uses: r-lib/actions/setup-r@v1
+      - name: Pull Arrow dependencies
+        run: |
+          cd arrow/r
+          # copy the two files we will need
+          # TODO: allow manually specifying `download_dependencies.sh` in `download_optional_dependencies()` then we won't need to install
+          mkdir -p inst/thirdparty/
+          cp -p ../cpp/thirdparty/download_dependencies.sh inst/thirdparty/
+          cp -p ../cpp/thirdparty/versions.txt inst/thirdparty/
+          mkdir thirdparty_deps
+          R -e 'source("R/install-arrow.R"); download_optional_dependencies("thirdparty_deps", download_dependencies_sh = "./inst/thirdparty/download_dependencies.sh")'

Review comment:
       Need to update these CI jobs still




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702308381



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,93 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create a source bundle that includes all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package from CRAN (or whatever you have set as the first in
+#' `getOption("repos")`)
+#' @return The full path to `dest_file`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'

Review comment:
       While we're at it, should we mention other binary platforms?
   
   ```
   #' Note: If you're using binary packages, e.g. from RStudio Package Manager on
   #' Linux or the standard CRAN binaries on Windows or Mac, you shouldn't need to
   #' use this function. You can download the appropriate binary from your package
   #' repository, and transfer that to the offline computer.
   #' If you still want to make a source bundle with this function, make sure to
   #' set the first repo in `options("repos")` to be a mirror that contains source
   #' packages (that is: something other than the RSPM binary mirror URLs).
   #' Any OS can create the source bundle, but it cannot be installed on Windows.
   #' (Instead, use a standard Windows binary package.)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698748482



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Yeah that makes sense. I was hoping to avoid the sound of "to install arrow, first install arrow". 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912694072


   @github-actions crossbow submit test-r-offline-minimal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696808171



##########
File path: r/inst/build_arrow_static.sh
##########
@@ -59,7 +59,7 @@ ${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \
     -DARROW_FILESYSTEM=ON \
     -DARROW_JEMALLOC=${ARROW_JEMALLOC:-$ARROW_DEFAULT_PARAM} \
     -DARROW_MIMALLOC=${ARROW_MIMALLOC:-ON} \
-    -DARROW_JSON=ON \
+    -DARROW_JSON=${ARROW_JSON:-ON} \

Review comment:
       Great, thanks. I'd revert here and do that change in ARROW-13768. If you turn ARROW_JSON=OFF now, the build will fail (r/src/json.cpp won't compile). If you want to do the full offline build test in this PR, will probably need to do ARROW-13768 first. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696787790



##########
File path: r/tools/nixlibs.R
##########
@@ -329,18 +288,22 @@ build_libarrow <- function(src_dir, dst_dir) {
     LDFLAGS = R_CMD_config("LDFLAGS")
   )
   env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
+  # Add env variables like ARROW_S3=ON. Order doesn't matter. Depends on `download_ok`
   env_vars <- with_s3_support(env_vars)
   env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
-  }
+  env_vars <- with_jemalloc(env_vars)

Review comment:
       Also, if we're just going to use the directory, I can simplify `download_optional_dependencies()` a bit, since it's no longer necessary to keep the `export *_SOURCE_URL` output.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-907393462


   Looking through the logs, I'm still downloading XSIMD when `TEST_OFFLINE_BUILD` is true and `ARROW_THIRDPARTY_DEPENDENCY_DIR` isn't set. The `ARROW_SIMD_LEVEL` setting is getting picked up, but somehow that doesn't translate to not using XSIMD.
   
   ```
   /usr/bin/cmake -DARROW_BOOST_USE_SHARED=OFF -DARROW_BUILD_TESTS=OFF 
   -DARROW_BUILD_SHARED=OFF -DARROW_BUILD_STATIC=ON -DARROW_COMPUTE=ON 
   -DARROW_CSV=ON -DARROW_DATASET=OFF -DARROW_DEPENDENCY_SOURCE=BUNDLED 
   -DAWSSDK_SOURCE= -DARROW_FILESYSTEM=ON -DARROW_JEMALLOC=OFF 
   -DARROW_MIMALLOC=OFF -DARROW_JSON=ON -DARROW_PARQUET=OFF 
   -DARROW_S3=OFF -DARROW_WITH_BROTLI=OFF -DARROW_WITH_BZ2=OFF 
   -DARROW_WITH_LZ4=OFF -DARROW_WITH_RE2=OFF -DARROW_WITH_SNAPPY=OFF 
   -DARROW_WITH_UTF8PROC=OFF -DARROW_WITH_ZLIB=OFF -DARROW_WITH_ZSTD=OFF 
   -DARROW_VERBOSE_THIRDPARTY_BUILD=OFF -DCMAKE_BUILD_TYPE=Release 
   -DCMAKE_INSTALL_LIBDIR=lib 
   -DCMAKE_INSTALL_PREFIX=/tmp/Rtmpt7emWm/R.INSTALL27343f31d8de/arrow/libarrow/arrow-5.0.0.9000
   -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON 
   -DCMAKE_FIND_PACKAGE_NO_PACKAGE_REGISTRY=ON -DCMAKE_UNITY_BUILD=ON 
   -DARROW_SIMD_LEVEL=NONE 
   -G 'Unix Makefiles' /tmp/Rtmpt7emWm/R.INSTALL27343f31d8de/arrow/tools/cpp
   
   Then later:
   --   ARROW_SIMD_LEVEL=NONE [default=NONE|SSE4_2|AVX2|AVX512]
   --       Compile-time SIMD optimization level
   --   ARROW_RUNTIME_SIMD_LEVEL=MAX [default=NONE|SSE4_2|AVX2|AVX512|MAX]
   --       Max runtime SIMD optimization level
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw edited a comment on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw edited a comment on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909368665


   > An idea came to me last night: what if we had a utility that would make a "fat" package, like:
   > 
   > ```r
   > function(source_package) {
   >   untar(source_package)
   >   system("inst/download_script.sh tools/thirdparty")
   >   tar()
   > }
   > ```
   > 
   > then you would just copy that arrow_x.y.z.tar.gz and install it, no need to copy other files and set env vars.
   
   I like that! I can take a try at it.
   
   * Do you want to include all of the downloaded files (87 MB), or just the ones an R build could possibly use (55 MB)? 
     * edit: Currently including all
   * Do you still want to be able to run `download_optional_dependencies` from within an installed R package? If not, we can use the copy of `download_dependencies.sh` that's in `tools/cpp/thirdparty/` (we're currently making another copy for `inst/` so it's available at runtime).
     * edit: Deleted the function and these copies of those two files
   * Should we bundle cmake while we're at it? This might be convenient, but is a bit of scope creep.
     * edit: I didn't, but let me know if you think I should.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696982967



##########
File path: r/tools/nixlibs.R
##########
@@ -209,75 +222,21 @@ find_available_binary <- function(os) {
   os
 }
 
-download_source <- function() {
-  tf1 <- tempfile()
-  src_dir <- tempfile()
-
-  # Given VERSION as x.y.z.p
-  p <- package_version(VERSION)[1, 4]
-  if (is.na(p) || p < 1000) {
-    # This is either just x.y.z or it has a small (R-only) patch version
-    # Download from the official Apache release, dropping the p
-    VERSION <- as.character(package_version(VERSION)[1, -4])
-    if (apache_download(VERSION, tf1)) {
-      untar(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/apache-arrow-", VERSION, "/cpp")
-    }
-  } else if (p != 9000) {
-    # This is a custom dev version (x.y.z.9999) or a nightly (x.y.z.20210505)
-    # (Don't try to download on the default dev .9000 version)
-    if (nightly_download(VERSION, tf1)) {
-      unzip(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/cpp")
-    }
-  }
-
-  if (dir.exists(src_dir)) {
-    cat("*** Successfully retrieved C++ source\n")
-    options(.arrow.cleanup = c(getOption(".arrow.cleanup"), src_dir))
-    # These scripts need to be executable
-    system(
-      sprintf("chmod 755 %s/build-support/*.sh", src_dir),
-      ignore.stdout = quietly, ignore.stderr = quietly
-    )
-    return(src_dir)
-  } else {
-    return(NULL)
-  }
-}
-
-nightly_download <- function(version, destfile) {
-  source_url <- paste0(arrow_repo, "src/arrow-", version, ".zip")
-  try_download(source_url, destfile)
-}
-
-apache_download <- function(version, destfile, n_mirrors = 3) {
-  apache_path <- paste0("arrow/arrow-", version, "/apache-arrow-", version, ".tar.gz")
-  apache_urls <- c(
-    # This returns a different mirror each time
-    rep("https://www.apache.org/dyn/closer.lua?action=download&filename=", n_mirrors),
-    "https://downloads.apache.org/" # The backup
+find_local_source <- function() {
+  # We'll take the first of these that exists
+  # The first case probably occurs if we're in the arrow git repo
+  # The second probably occurs if we're installing the arrow R package
+  cpp_dir_options <- c(
+    Sys.getenv("ARROW_SOURCE_HOME", ".."),
+    "tools/cpp"

Review comment:
       Thanks for catching that!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908362125


   Revision: 5a13cbf81ee66172b63341d20acf51efc03d0c97
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-802](https://github.com/ursacomputing/crossbow/branches/all?query=actions-802)
   
   |Task|Status|
   |----|------|
   |test-r-offline-minimal|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-802-azure-test-r-offline-minimal)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-802-azure-test-r-offline-minimal)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698742494



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Yes, unless you can think of a better way! As @jonkeane [pointed out](https://github.com/apache/arrow/pull/11001/#discussion_r698528943), it's possible to download those files from github, but protecting against version mismatch (between what's needed by `tools/cpp/` and what's listed in github's `versions.txt`) could be challenging.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912615851


   @github-actions crossbow submit -g r
   
   I'm running the full suite again since we're close and want to make sure this didn't (accidentally) do anything to our other builds


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909640938


   > Tests are currently having errors because of this line:
   > 
   > https://github.com/apache/arrow/blob/6daff455ad1e4c5ac4c84bda5711bdb5c30b6156/r/tools/nixlibs.R#L466
   > 
   > That directory (`tools/cpp/thirdparty`) would exist if `make build` had been run. Any suggestions?
   
   I can investigate later, though that wouldn't explain why the windows builds are failing since that script doesn't get called there


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701391590



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param outfile File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param package_source File path for the input tar.gz package. Defaults to
+#' downloading from CRAN.
+#' @param quietly boolean, default `TRUE`. If `FALSE`, narrate progress.
+#' @return The full path to `outfile`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package

Review comment:
       Yep!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696723923



##########
File path: r/inst/build_arrow_static.sh
##########
@@ -59,7 +59,7 @@ ${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \
     -DARROW_FILESYSTEM=ON \
     -DARROW_JEMALLOC=${ARROW_JEMALLOC:-$ARROW_DEFAULT_PARAM} \
     -DARROW_MIMALLOC=${ARROW_MIMALLOC:-ON} \
-    -DARROW_JSON=ON \
+    -DARROW_JSON=${ARROW_JSON:-ON} \

Review comment:
       Can we split this out to a separate JIRA? It's more involved than this because we'll have to conditionally build some of the bindings like we do with dataset and parquet, and we'll have to conditionally skip tests. See ARROW-11735 for a model.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909318657






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r697645186



##########
File path: r/tools/nixlibs.R
##########
@@ -413,10 +392,114 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
+turn_off_thirdparty_features <- function(env_vars) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC=OFF",
+    "ARROW_JEMALLOC=OFF",
+    "ARROW_PARQUET=OFF", # depends on thrift
+    "ARROW_DATASET=OFF", # depends on parquet
+    "ARROW_S3=OFF",
+    "ARROW_WITH_BROTLI=OFF",
+    "ARROW_WITH_BZ2=OFF",
+    "ARROW_WITH_LZ4=OFF",
+    "ARROW_WITH_SNAPPY=OFF",
+    "ARROW_WITH_ZLIB=OFF",
+    "ARROW_WITH_ZSTD=OFF",
+    "ARROW_WITH_RE2=OFF",
+    "ARROW_WITH_UTF8PROC=OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON=OFF",
+    # The syntax to turn off XSIMD is different.
+    'EXTRA_CMAKE_FLAGS="-DARROW_SIMD_LEVEL=NONE"'
+  )
+  if (Sys.getenv("EXTRA_CMAKE_FLAGS") != "") {
+    # Error rather than overwriting EXTRA_CMAKE_FLAGS
+    # (Correctly inserting the flag into an existing quoted string is tricky)
+    stop("Sorry, setting EXTRA_CMAKE_FLAGS is not supported at this time.")
+  }
+  paste(env_vars, paste(turn_off, collapse = " "))
+}
+
+set_thirdparty_urls <- function(env_vars) {
+  deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  files <- list.files(deps_dir, full.names = FALSE)
+  if (length(files) == 0) {
+    # This will be true if the variable is unset, if it's set but the directory
+    # doesn't exist, or if it exists but is empty.
+    return(env_vars)
+  }
+  dep_names <- c(
+    "absl", # not used; seems to be a dependency of gRPC
+    "aws-sdk-cpp",
+    "aws-checksums",
+    "aws-c-common",
+    "aws-c-event-stream",
+    "boost",
+    "brotli",
+    "bzip2",
+    "cares", # not used; "a dependency of gRPC"
+    "gbenchmark", # not used; "Google benchmark, for testing"
+    "gflags", # not used; "for command line utilities (formerly Googleflags)"
+    "glog", # not used; "for logging"
+    "grpc", # not used; "for remote procedure calls"
+    "gtest", # not used; "Googletest, for testing"
+    "jemalloc",
+    "lz4",
+    "mimalloc",
+    "orc", # not used; "for Apache ORC format support"
+    "protobuf", # not used; "Google Protocol Buffers, for data serialization"
+    "rapidjson",
+    "re2",
+    "snappy",
+    "thrift",
+    "utf8proc",
+    "xsimd",
+    "zlib",
+    "zstd"
+  )
+  dep_regex <- paste0("^(", paste(dep_names, collapse = "|"), ").*")
+  # If there were extra files in the folder (not matching our regex) drop them.
+  files <- files[grepl(dep_regex, files, perl = TRUE)]
+  # Convert e.g. "thrift-0.13.0.tar.gz" to ARROW_THRIFT_URL
+  # Note that if there's no file called thrift*, we won't add
+  # ARROW_THRIFT_URL to env_vars.
+  url_env_varname <- sub(dep_regex, "ARROW_\\1_URL", files, perl = TRUE)
+  url_env_varname <- toupper(gsub("-", "_", url_env_varname, fixed = TRUE))
+  # Special case: ARROW_AWSSDK_URL for aws-sdk-cpp-<version>.tar.gz
+  url_env_varname <- sub("ARROW_AWS_SDK_CPP_URL", "ARROW_AWSSDK_URL", url_env_varname, fixed = TRUE)
+  if (anyDuplicated(url_env_varname)) {
+    warning("Unexpected files in ", deps_dir,
+      "\nDo you have multiple copies of a dependency?",
+      .call = FALSE
+    )
+    return(env_vars)
+  }

Review comment:
       Great!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698472411



##########
File path: r/tools/nixlibs.R
##########
@@ -82,7 +91,7 @@ download_binary <- function(os = identify_os()) {
 # * `TRUE` (not case-sensitive), to try to discover your current OS, or
 # * some other string, presumably a related "distro-version" that has binaries
 #   built that work for your OS
-identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", Sys.getenv("LIBARROW_DOWNLOAD"))) {
+identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", Sys.getenv("TEST_OFFLINE_BUILD"))) {

Review comment:
       If I'm following the logic here correctly, if `LIBARROW_BINARY` is unset, this will only attempt to identify the OS when `TEST_OFFLINE_BUILD` is `TRUE`. Is that what we want here?

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,64 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities

Review comment:
       Super minor, but an attempt to clarify which steps happen on which machines. We could also make subheadings if the parentheticals are too clunky since it's the first two steps on one computer and the rest on the other.
   
   ```suggestion
   #' - Install the `arrow` package (on a computer with internet access)
   #' - Run this function (on a computer with internet access)
   #' - Copy the saved dependency files to the computer without internet access
   #' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
   #'   points to the folder. (on the computer without internet access)
   #' - Install the `arrow` package (on the computer without internet access)
   #' - Run [arrow_info()] to check installed capabilities (on the computer without internet access)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909597602


   > * I merged here (because I wanted the fixes from [ARROW-13776](https://issues.apache.org/jira/browse/ARROW-13776)), but let me know if you want to rebase for a cleaner series of commits.
   
   We generally rebase but it's fine here, doesn't look like the diff got messed up. We'll squash-merge in the end so it won't matter.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r699636743



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,66 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#' @param download_dependencies_sh location of the dependency download script,
+#' defaults to the one included with the arrow package.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `download_optional_dependencies(my_dependencies)`
+#' * Copy the directory `my-arrow-dependencies` to the computer without internet access
+#'
+#' ### On the computer without internet access, use the pre-downloaded dependencies:
+#' * Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied `my_dependencies`.
+#' * Install the `arrow` package
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(
+  deps_dir = Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR"),
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh = system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE

Review comment:
       I removed this whole function in favor of Neal's suggested approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702304607



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package.

Review comment:
       Ah, I see. Thanks! I hadn't realized there were different types of RSPM repos, and I looked at a source-repo URL (`https://packagemanager.rstudio.com/cran/__linux__/focal/latest`). 
   
   As the function is now, with a binary package it will fail when it tries to create `tools/thirdparty_dependencies/`, since the `tools/` directory doesn't exist. Do you want to make any changes?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702957028



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,93 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create a source bundle that includes all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package from CRAN (or whatever you have set as the first in
+#' `getOption("repos")`)
+#' @return The full path to `dest_file`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'

Review comment:
       Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696125528



##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +43,21 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+quietly <- !env_is("ARROW_R_DEV", "true") # try_download uses quietly global
+# * download_ok, build_ok: Use prebuilt binary, if found, otherwise try to build
+# * no download, build_ok: Build with local git checkout, if available, or
+#   sources included in r/tools/cpp/. Optional dependencies are not included,
+#   and will not be automatically downloaded.
+#   https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+# * download_ok, no build: Only use prebuilt binary, if found
+# * neither: Get the arrow-without-arrow package
+# Download and build are OK unless you say not to (or can't access github)
+download_ok <- (!env_is("LIBARROW_DOWNLOAD", "false")) && try_download("https://github.com", tempfile())
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+# But binary defaults to not OK
+binary_ok <- !identical(tolower(Sys.getenv("LIBARROW_BINARY", "false")), "false")
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more

Review comment:
       Just to confirm my understanding, is this what you meant?
   https://github.com/apache/arrow/pull/11001/files#diff-935746c34b16289a07b0d9bf7642dbd268b18059b6187f7cdec7c464be47a3deR46-R62




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698900338



##########
File path: dev/tasks/tasks.yml
##########
@@ -1033,6 +1033,14 @@ tasks:
       flags: '-e ARROW_SOURCE_HOME="/arrow" -e FORCE_BUNDLED_BUILD=TRUE -e LIBARROW_BUILD=TRUE -e ARROW_DEPENDENCY_SOURCE=SYSTEM'
       image: ubuntu-r-only-r
 
+  test-r-offline-minimal:
+      ci: azure
+      template: r/azure.linux.yml
+      params:
+        r_org: rocker
+        r_image: r-base
+        r_tag: latest
+        flags: '-e TEST_OFFLINE_BUILD=true'

Review comment:
       Azure is fine for this one. TBH, I picked Github Actions for the maximal build out of convenience since we already have a model that has dependent steps. But our CI system (AKA crossbow) is designed to be spread across a number of systems like this, so it's totally fine to use two different services for these two jobs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912747988


   Thanks! I learned a bunch doing this.
   
   I had a couple minor questions, following up on comments from @nealrichardson:
   
   1. Should I swap the argument order for `create_package_with_all_dependencies`?
   2. Should `create_package_with_all_dependencies` check `ARROW_THIRDPARTY_DEPENDENCY_DIR`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-906502794


   > In the latest commit, I removed `LIBARROW_DOWNLOAD` and added `TEST_OFFLINE_BUILD`. Does that seem right to you?
   
   I think so, but we'll know for sure once we set up CI.
   
   > 
   > I wasn't positive I got the logic right in this section of `configure`:
   > https://github.com/apache/arrow/pull/11001/files#diff-089697faebdb7820ca629a2bb316b878cc0ba18a5bfb0b60996f8dbcd1fa11e7L133-L140
   
   Yes, looks right, I just suggested a further simplification now that we can.
   
   As for CI, there will be an arrow-r-nightly change needed in order to do the rsync etc. that you added to r/Makefile, but the regular CI we want will be in arrow, in our "crossbow" nightly and on-demand builds. There's a bunch of yaml that configures templates [here](https://github.com/apache/arrow/blob/master/dev/tasks/tasks.yml#L980-L1121), if you want to take a stab at it. @jonkeane is back from vacation next week and can help with setting that up too.
   
   Also, I just want to reiterate: this is great, thank you very much for taking the initiative on this.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw removed a comment on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw removed a comment on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912693125


   @github-actions crossbow submit -g test-r-offline-minimal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-905770415


   https://issues.apache.org/jira/browse/ARROW-12981


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-910610904


   One option would be a new `make` rule that calls `R CMD install` instead of calling `R CMD install` directly. Here's an example in knitr: https://github.com/yihui/knitr/blob/master/Makefile


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701398208



##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +42,24 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more
+quietly <- !env_is("ARROW_R_DEV", "true")
+
+# Default is build from source, not download a binary
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+binary_ok <- !(env_is("LIBARROW_BINARY", "false") || env_is("LIBARROW_BINARY", ""))
+
+# Check if we're doing an offline build.
+# (Note that cmake will still be downloaded if necessary
+#  https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
+download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && try_download("https://github.com", tempfile())
+
+# This path, within the tar file, might exist if
+# create_package_with_all_dependencies() was run. Otherwise, it won't, but
+# tools/cpp/thirdparty/ still will.
+thirdparty_dependency_dir <- "tools/cpp/thirdparty/download"

Review comment:
       Sure, I guess it could make updating an offline system easier. If the dependencies haven't changed, you could reuse the same downloaded files without re-running `create_package_with_all_dependencies()`. (If dependencies have changed, `cmake` will error because the checksums won't match.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r699297589



##########
File path: r/vignettes/install.Rmd
##########
@@ -285,17 +309,28 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or (3) uninstalling
 the conflicting `zstd`.
 See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
 
+* Offline installation fails when dependencies haven't been downloaded to
+`ARROW_THIRDPARTY_DEPENDENCY_DIR`. The package currently depends on the
+third-party project RapidJSON. See `?download_optional_dependencies`.
+See discussion [here](https://issues.apache.org/jira/browse/ARROW-13768) on

Review comment:
       We should just solve this rather than document the exception, IMO

##########
File path: r/vignettes/install.Rmd
##########
@@ -342,6 +373,15 @@ By default, these are all unset. All boolean variables are case-insensitive.
   The directory will be created if it does not exist.
 * `CMAKE`: When building the C++ library from source, you can specify a
   `/path/to/cmake` to use a different version than whatever is found on the `$PATH`
+* `ARROW_THIRDPARTY_DEPENDENCY_DIR`: Directory with downloaded third-party
+  dependency files. Run `download_optional_dependencies(my-dir)` to download.
+* `TEST_OFFLINE_BUILD`: When set to `true`, the build script will not download

Review comment:
       A better place for this would be in the developing.Rmd vignette (we have another TEST_R_WITHOUT_LIBARROW env var that could also be documented there too, like this one it's not something a package user would ever want to do)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696708048



##########
File path: r/tools/nixlibs.R
##########
@@ -209,75 +222,21 @@ find_available_binary <- function(os) {
   os
 }
 
-download_source <- function() {
-  tf1 <- tempfile()
-  src_dir <- tempfile()
-
-  # Given VERSION as x.y.z.p
-  p <- package_version(VERSION)[1, 4]
-  if (is.na(p) || p < 1000) {
-    # This is either just x.y.z or it has a small (R-only) patch version
-    # Download from the official Apache release, dropping the p
-    VERSION <- as.character(package_version(VERSION)[1, -4])
-    if (apache_download(VERSION, tf1)) {
-      untar(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/apache-arrow-", VERSION, "/cpp")
-    }
-  } else if (p != 9000) {
-    # This is a custom dev version (x.y.z.9999) or a nightly (x.y.z.20210505)
-    # (Don't try to download on the default dev .9000 version)
-    if (nightly_download(VERSION, tf1)) {
-      unzip(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/cpp")
-    }
-  }
-
-  if (dir.exists(src_dir)) {
-    cat("*** Successfully retrieved C++ source\n")
-    options(.arrow.cleanup = c(getOption(".arrow.cleanup"), src_dir))
-    # These scripts need to be executable
-    system(
-      sprintf("chmod 755 %s/build-support/*.sh", src_dir),
-      ignore.stdout = quietly, ignore.stderr = quietly
-    )
-    return(src_dir)
-  } else {
-    return(NULL)
-  }
-}
-
-nightly_download <- function(version, destfile) {
-  source_url <- paste0(arrow_repo, "src/arrow-", version, ".zip")
-  try_download(source_url, destfile)
-}
-
-apache_download <- function(version, destfile, n_mirrors = 3) {
-  apache_path <- paste0("arrow/arrow-", version, "/apache-arrow-", version, ".tar.gz")
-  apache_urls <- c(
-    # This returns a different mirror each time
-    rep("https://www.apache.org/dyn/closer.lua?action=download&filename=", n_mirrors),
-    "https://downloads.apache.org/" # The backup
+find_local_source <- function() {
+  # We'll take the first of these that exists
+  # The first case probably occurs if we're in the arrow git repo
+  # The second probably occurs if we're installing the arrow R package
+  cpp_dir_options <- c(
+    Sys.getenv("ARROW_SOURCE_HOME", ".."),
+    "tools/cpp"
   )
-  downloaded <- FALSE
-  for (u in apache_urls) {
-    downloaded <- try_download(paste0(u, apache_path), destfile)
-    if (downloaded) {
-      break
-    }
-  }
-  downloaded
-}
-
-find_local_source <- function(arrow_home = Sys.getenv("ARROW_SOURCE_HOME", "..")) {
-  if (file.exists(paste0(arrow_home, "/cpp/src/arrow/api.h"))) {
-    # We're in a git checkout of arrow, so we can build it
-    cat("*** Found local C++ source\n")
-    return(paste0(arrow_home, "/cpp"))
-  } else {
+  valid_cpp_dir <- file.exists(file.path(cpp_dir_options, "src/arrow/api.h"))
+  if (!any(valid_cpp_dir)) {
     return(NULL)
   }
+  cpp_dir <- cpp_dir_options[valid_cpp_dir][1]
+  cat(paste0("*** Found local C++ source:\n    '", cpp_dir, "'\n"))
+  cpp_dir

Review comment:
       I think the intent reads more clearly this way
   
   ```suggestion
     for (cpp_dir in cpp_dir_options) {
       if (file.exists(file.path(cpp_dir, "cpp/src/arrow/api.h"))) {
         cat(paste0("*** Found local C++ source: '", paste0(cpp_dir, "/cpp"), "'\n"))
         return(paste0(cpp_dir, "/cpp"))
       }
     }
     NULL
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912694522


   Revision: ec726d6b0bc1ad4562faec3196e954ca75b1b7b0
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-818](https://github.com/ursacomputing/crossbow/branches/all?query=actions-818)
   
   |Task|Status|
   |----|------|
   |test-r-offline-minimal|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-818-azure-test-r-offline-minimal)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-818-azure-test-r-offline-minimal)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909318657


   I thought rchk was passing? Rebase to fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701413656



##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +42,24 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more
+quietly <- !env_is("ARROW_R_DEV", "true")
+
+# Default is build from source, not download a binary
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+binary_ok <- !(env_is("LIBARROW_BINARY", "false") || env_is("LIBARROW_BINARY", ""))
+
+# Check if we're doing an offline build.
+# (Note that cmake will still be downloaded if necessary
+#  https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
+download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && try_download("https://github.com", tempfile())
+
+# This path, within the tar file, might exist if
+# create_package_with_all_dependencies() was run. Otherwise, it won't, but
+# tools/cpp/thirdparty/ still will.

Review comment:
       I was worried about a case where `nixlibs.R` is being run with the wrong working directory, so `tools/` and all the sub-folders wouldn't be there. Thinking about this more, it's unlikely, and this function isn't the right place to check.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r699636743



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,66 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#' @param download_dependencies_sh location of the dependency download script,
+#' defaults to the one included with the arrow package.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `download_optional_dependencies(my_dependencies)`
+#' * Copy the directory `my-arrow-dependencies` to the computer without internet access
+#'
+#' ### On the computer without internet access, use the pre-downloaded dependencies:
+#' * Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied `my_dependencies`.
+#' * Install the `arrow` package
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(
+  deps_dir = Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR"),
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh = system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE

Review comment:
       I removed this whole function in favor of Neal's suggested approach.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702291722



##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +423,129 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON" = "OFF",
+    # The syntax to turn off XSIMD is different.
+    # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+    "EXTRA_CMAKE_FLAGS" = paste(
+      env_var_list[["EXTRA_CMAKE_FLAGS"]],
+      "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+    )
+  )
+  # Create a new env_var_list, with the values of turn_off set.
+  # replace() also adds new values if they didn't exist before
+  replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+  # This function does *not* check if existing *_SOURCE_URL variables are set.
+  # The directory tools/thirdparty_dependencies is created by
+  # create_package_with_all_dependencies() and saved in the tar file.
+  files <- list.files(thirdparty_dependency_dir, full.names = FALSE)
+  url_env_varname <- toupper(sub("(.*?)-.*", "ARROW_\\1_URL", files))
+  # Special handling for the aws dependencies, which have extra `-`
+  aws <- grepl("^aws", files)
+  url_env_varname[aws] <- sub(
+    "AWS_SDK_CPP", "AWSSDK",
+    gsub(
+      "-", "_",
+      sub(
+        "(AWS.*)-.*", "ARROW_\\1_URL",
+        toupper(files[aws])
+      )
+    )
+  )
+  full_filenames <- file.path(normalizePath(thirdparty_dependency_dir), files)
+
+  env_var_list <- replace(env_var_list, url_env_varname, full_filenames)
+  if (!quietly) {
+    env_var_list <- replace(env_var_list, "ARROW_VERBOSE_THIRDPARTY_BUILD", "ON")
+  }
+  env_var_list
+}
+
+with_mimalloc <- function(env_var_list) {
+  arrow_mimalloc <- env_is("ARROW_MIMALLOC", "on") || env_is("LIBARROW_MINIMAL", "false")
+  if (arrow_mimalloc) {

Review comment:
       OH, that's even better! Thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r699297589



##########
File path: r/vignettes/install.Rmd
##########
@@ -285,17 +309,28 @@ setting `ARROW_WITH_ZSTD=OFF` to build without `zstd`; or (3) uninstalling
 the conflicting `zstd`.
 See discussion [here](https://issues.apache.org/jira/browse/ARROW-8556).
 
+* Offline installation fails when dependencies haven't been downloaded to
+`ARROW_THIRDPARTY_DEPENDENCY_DIR`. The package currently depends on the
+third-party project RapidJSON. See `?download_optional_dependencies`.
+See discussion [here](https://issues.apache.org/jira/browse/ARROW-13768) on

Review comment:
       We should just solve this rather than document the exception, IMO

##########
File path: r/vignettes/install.Rmd
##########
@@ -342,6 +373,15 @@ By default, these are all unset. All boolean variables are case-insensitive.
   The directory will be created if it does not exist.
 * `CMAKE`: When building the C++ library from source, you can specify a
   `/path/to/cmake` to use a different version than whatever is found on the `$PATH`
+* `ARROW_THIRDPARTY_DEPENDENCY_DIR`: Directory with downloaded third-party
+  dependency files. Run `download_optional_dependencies(my-dir)` to download.
+* `TEST_OFFLINE_BUILD`: When set to `true`, the build script will not download

Review comment:
       A better place for this would be in the developing.Rmd vignette (we have another TEST_R_WITHOUT_LIBARROW env var that could also be documented there too, like this one it's not something a package user would ever want to do)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane removed a comment on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane removed a comment on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912593253


   @github-actions crossbow submit test-r-offline-maximal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908361477






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912693125


   @github-actions crossbow submit -g test-r-offline-minimal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909258531


   Oh, I like the idea of creating a fat package and then installing that instead of needing to provide the dependencies separately yourself + env var. Internally, we could use much of the code here to do that as well.
   
   My only (small) concern is making sure it's not easy to confuse which tar is which (though I guess if it's going onto a system without internet it will be pretty clear pretty quickly during install that it's not working). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912543648


   Yeah, looking at the testthat artifact that was uploaded, I realized the workflow was not quite right.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908391424


   The build succeeded, though I notice there *is* still a download step listed:
   
   https://dev.azure.com/ursacomputing/crossbow/_build/results?buildId=10731&view=logs&j=0da5d1d9-276d-5173-c4c4-9d4d4ed14fdb&t=6c939d89-0d1a-51f2-8b30-091a7a82e98c&l=364
   
   I'm digging into this to see if I can figure out what's triggering it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908361477


   @github-actions crossbow submit test-r-offline-minimal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698591911



##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access

Review comment:
       Sounds good!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r699278771



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,66 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#' @param download_dependencies_sh location of the dependency download script,
+#' defaults to the one included with the arrow package.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `download_optional_dependencies(my_dependencies)`
+#' * Copy the directory `my-arrow-dependencies` to the computer without internet access
+#'
+#' ### On the computer without internet access, use the pre-downloaded dependencies:
+#' * Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied `my_dependencies`.
+#' * Install the `arrow` package
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(
+  deps_dir = Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR"),
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh = system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE

Review comment:
       To start, I've made this an argument to the function so that we can call it without installing in CI. We could also do this as an environment variable like we do for `deps_dir` (either internally or as an argument here). I don't have strong feelings one way or the other, though since this is pretty internal-use / CI-use only we might be best off not exposing this as an argument at all.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696984954



##########
File path: r/tools/nixlibs.R
##########
@@ -209,75 +222,21 @@ find_available_binary <- function(os) {
   os
 }
 
-download_source <- function() {
-  tf1 <- tempfile()
-  src_dir <- tempfile()
-
-  # Given VERSION as x.y.z.p
-  p <- package_version(VERSION)[1, 4]
-  if (is.na(p) || p < 1000) {
-    # This is either just x.y.z or it has a small (R-only) patch version
-    # Download from the official Apache release, dropping the p
-    VERSION <- as.character(package_version(VERSION)[1, -4])
-    if (apache_download(VERSION, tf1)) {
-      untar(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/apache-arrow-", VERSION, "/cpp")
-    }
-  } else if (p != 9000) {
-    # This is a custom dev version (x.y.z.9999) or a nightly (x.y.z.20210505)
-    # (Don't try to download on the default dev .9000 version)
-    if (nightly_download(VERSION, tf1)) {
-      unzip(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/cpp")
-    }
-  }
-
-  if (dir.exists(src_dir)) {
-    cat("*** Successfully retrieved C++ source\n")
-    options(.arrow.cleanup = c(getOption(".arrow.cleanup"), src_dir))
-    # These scripts need to be executable
-    system(
-      sprintf("chmod 755 %s/build-support/*.sh", src_dir),
-      ignore.stdout = quietly, ignore.stderr = quietly
-    )
-    return(src_dir)
-  } else {
-    return(NULL)
-  }
-}
-
-nightly_download <- function(version, destfile) {
-  source_url <- paste0(arrow_repo, "src/arrow-", version, ".zip")
-  try_download(source_url, destfile)
-}
-
-apache_download <- function(version, destfile, n_mirrors = 3) {
-  apache_path <- paste0("arrow/arrow-", version, "/apache-arrow-", version, ".tar.gz")
-  apache_urls <- c(
-    # This returns a different mirror each time
-    rep("https://www.apache.org/dyn/closer.lua?action=download&filename=", n_mirrors),
-    "https://downloads.apache.org/" # The backup
+find_local_source <- function() {
+  # We'll take the first of these that exists
+  # The first case probably occurs if we're in the arrow git repo
+  # The second probably occurs if we're installing the arrow R package
+  cpp_dir_options <- c(
+    Sys.getenv("ARROW_SOURCE_HOME", ".."),
+    "tools/cpp"

Review comment:
       Maybe easier still to change things to:
   
   ```r
   cpp_dir_options <- c(
       file.path(Sys.getenv("ARROW_SOURCE_HOME", ".."), "cpp"),
       "tools/cpp"
     )
   ```
   which makes the following loop a little cleaner.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701973727



##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +301,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it defined
+    EXTRA_CMAKE_FLAGS = Sys.getenv("EXTRA_CMAKE_FLAGS"),
     # Make sure we build with the same compiler settings that R is using
     CC = R_CMD_config("CC"),
     CXX = paste(R_CMD_config("CXX11"), R_CMD_config("CXX11STD")),
     # CXXFLAGS = R_CMD_config("CXX11FLAGS"), # We don't want the same debug symbols
     LDFLAGS = R_CMD_config("LDFLAGS")
   )
-  env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
-  env_vars <- with_s3_support(env_vars)
-  env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
+  env_var_list <- with_s3_support(env_var_list)
+  env_var_list <- with_mimalloc(env_var_list)
+  # turn_off_thirdparty_features() needs to happen after with_mimalloc() and
+  # with_s3_support(), since those might turn features ON.
+  thirdparty_deps_unavailable <- !download_ok &&
+    !dir.exists(thirdparty_dependency_dir) &&
+    !env_is("ARROW_DEPENDENCY_SOURCE", "system")
+  if (is_solaris()) {
+    # Note that JSON support does work on Solaris, but will be turned off with
+    # the rest of the thirdparty dependencies (when ARROW-13768 is resolved and

Review comment:
       Rebase, if you haven't already, and then you can delete this parenthetical




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701404485



##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +42,24 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more
+quietly <- !env_is("ARROW_R_DEV", "true")
+
+# Default is build from source, not download a binary
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+binary_ok <- !(env_is("LIBARROW_BINARY", "false") || env_is("LIBARROW_BINARY", ""))
+
+# Check if we're doing an offline build.
+# (Note that cmake will still be downloaded if necessary
+#  https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
+download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && try_download("https://github.com", tempfile())
+
+# This path, within the tar file, might exist if
+# create_package_with_all_dependencies() was run. Otherwise, it won't, but
+# tools/cpp/thirdparty/ still will.
+thirdparty_dependency_dir <- "tools/cpp/thirdparty/download"

Review comment:
       Should `create_package_with_all_dependencies` also check `ARROW_THIRDPARTY_DEPENDENCY_DIR` before downloading?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912155618


   Revision: 130683aee3e41060d0ba54d746d28ea30337ef7d
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-813](https://github.com/ursacomputing/crossbow/branches/all?query=actions-813)
   
   |Task|Status|
   |----|------|
   |test-r-offline-maximal|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-813-github-test-r-offline-maximal)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-813-github-test-r-offline-maximal)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909186696


   Revision: 2410d5574db9c03d41c7faf70480c8189bcf69c9
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-805](https://github.com/ursacomputing/crossbow/branches/all?query=actions-805)
   
   |Task|Status|
   |----|------|
   |conda-linux-gcc-py36-cpu-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-linux-gcc-py36-cpu-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-linux-gcc-py36-cpu-r40)|
   |conda-linux-gcc-py37-cpu-r41|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-linux-gcc-py37-cpu-r41)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-linux-gcc-py37-cpu-r41)|
   |conda-osx-clang-py36-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-osx-clang-py36-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-osx-clang-py36-r40)|
   |conda-osx-clang-py37-r41|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-osx-clang-py37-r41)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-osx-clang-py37-r41)|
   |conda-win-vs2017-py36-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-win-vs2017-py36-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-win-vs2017-py36-r40)|
   |conda-win-vs2017-py37-r41|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-conda-win-vs2017-py37-r41)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-conda-win-vs2017-py37-r41)|
   |homebrew-r-autobrew|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-homebrew-r-autobrew)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-homebrew-r-autobrew)|
   |test-r-depsource-auto|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-depsource-auto)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-depsource-auto)|
   |test-r-depsource-system|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-depsource-system)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-depsource-system)|
   |test-r-devdocs|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-devdocs)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-devdocs)|
   |test-r-gcc-11|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-gcc-11)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-gcc-11)|
   |test-r-install-local|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-install-local)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-install-local)|
   |test-r-linux-as-cran|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-linux-as-cran)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-linux-as-cran)|
   |test-r-linux-rchk|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-linux-rchk)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-linux-rchk)|
   |test-r-linux-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-linux-valgrind)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-linux-valgrind)|
   |test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-minimal-build)|
   |test-r-offline-maximal|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-offline-maximal)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-offline-maximal)|
   |test-r-offline-minimal|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-offline-minimal)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-offline-minimal)|
   |test-r-rhub-debian-gcc-devel-lto-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rhub-debian-gcc-devel-lto-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rhub-debian-gcc-devel-lto-latest)|
   |test-r-rhub-ubuntu-gcc-release-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rhub-ubuntu-gcc-release-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rhub-ubuntu-gcc-release-latest)|
   |test-r-rocker-r-base-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rocker-r-base-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rocker-r-base-latest)|
   |test-r-rstudio-r-base-3.6-bionic|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-bionic)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-bionic)|
   |test-r-rstudio-r-base-3.6-centos7-devtoolset-8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)|
   |test-r-rstudio-r-base-3.6-centos8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-centos8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-centos8)|
   |test-r-rstudio-r-base-3.6-opensuse15|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-opensuse15)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-opensuse15)|
   |test-r-rstudio-r-base-3.6-opensuse42|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-r-rstudio-r-base-3.6-opensuse42)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-r-rstudio-r-base-3.6-opensuse42)|
   |test-r-ubuntu-21.04|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-ubuntu-21.04)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-ubuntu-21.04)|
   |test-r-version-compatibility|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-version-compatibility)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-version-compatibility)|
   |test-r-versions|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-805-github-test-r-versions)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-805-github-test-r-versions)|
   |test-ubuntu-18.04-r-sanitizer|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-805-azure-test-ubuntu-18.04-r-sanitizer)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-805-azure-test-ubuntu-18.04-r-sanitizer)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698866936



##########
File path: dev/tasks/tasks.yml
##########
@@ -1033,6 +1033,14 @@ tasks:
       flags: '-e ARROW_SOURCE_HOME="/arrow" -e FORCE_BUNDLED_BUILD=TRUE -e LIBARROW_BUILD=TRUE -e ARROW_DEPENDENCY_SOURCE=SYSTEM'
       image: ubuntu-r-only-r
 
+  test-r-offline-minimal:
+      ci: azure
+      template: r/azure.linux.yml
+      params:
+        r_org: rocker
+        r_image: r-base
+        r_tag: latest
+        flags: '-e TEST_OFFLINE_BUILD=true'

Review comment:
       I put this one on azure because the one above was on azure, but feel free to change to a different platform.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912222264


   That build is "passing", but it looks like it didn't actually run any tests.
   ```r
   if(tools::testInstalledPackage("arrow") != 0L) stop("There was a test failure.")
   #> Testing examples for package ‘arrow’
   #> no parsed files found
   #> Running specific tests for package ‘arrow’
   #>   Running ‘testthat.R’
   #> Warning message:
   #> no examples found for package ‘arrow’ 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909185913






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909315840






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909596512


   > * Do you want to include all of the downloaded files (87 MB), or just the ones an R build could possibly use (55 MB)?
   >   
   >   * edit: Currently including all
   
   Ok with me, we can revisit later if you or someone else needs a thinner fat build
   
   > * Do you still want to be able to run `download_optional_dependencies` from within an installed R package? If not, we can use the copy of `download_dependencies.sh` that's in `tools/cpp/thirdparty/` (we're currently making another copy for `inst/` so it's available at runtime).
   >   
   >   * edit: Deleted the function and these copies of those two files
   
   👍 
   
   > * Should we bundle cmake while we're at it? This might be convenient, but is a bit of scope creep.
   >   
   >   * edit: I didn't, but let me know if you think I should.
   
   No I don't think we should bundle cmake, you can get that from a package manager easy enough, or just download one more file. The (or one of the) trouble with the C++ dependencies is that you can't get many of them as system dependencies so we have to build them ourselves.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696697733



##########
File path: r/configure
##########
@@ -130,14 +130,8 @@ else
       fi
     else
       # Set some default values/backwards compatibility
-      if [ "${LIBARROW_DOWNLOAD}" = "" ] && [ "${NOT_CRAN}" != "" ]; then
-        LIBARROW_DOWNLOAD=$NOT_CRAN; export LIBARROW_DOWNLOAD
-      fi
-      if [ "${LIBARROW_BINARY}" = "" ] && [ "${LIBARROW_DOWNLOAD}" != "" ]; then
-        LIBARROW_BINARY=$LIBARROW_DOWNLOAD; export LIBARROW_BINARY
-      fi
-      if [ "${LIBARROW_MINIMAL}" = "" ] && [ "${LIBARROW_DOWNLOAD}" = "true" ]; then
-        LIBARROW_MINIMAL=false; export LIBARROW_MINIMAL
+      if [ "${LIBARROW_BINARY}" = "" ] && [ "${NOT_CRAN}" = "true" ]; then
+        LIBARROW_BINARY=true; export LIBARROW_BINARY
       fi
       if [ "${LIBARROW_MINIMAL}" = "" ] && [ "${NOT_CRAN}" = "true" ]; then
         LIBARROW_MINIMAL=false; export LIBARROW_MINIMAL

Review comment:
       Let's simplify this a little further:
   
   ```suggestion
         if [ "${NOT_CRAN}" = "true" ]; then
           if [ "${LIBARROW_BINARY}" = "" ]; then
             LIBARROW_BINARY=true; export LIBARROW_BINARY
           fi
           if [ "${LIBARROW_MINIMAL}" = "" ]; then
             LIBARROW_MINIMAL=false; export LIBARROW_MINIMAL
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698576926



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,64 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities

Review comment:
       Sure! Subheading seem like a great idea. Something like this?
   
   ```r
   #' ## Steps for an offline install with optional dependencies:
   #'
   #' ### On a computer with internet access:
   #' - Install the `arrow` package
   #' - Run this function
   #' - Copy the saved dependency files to the computer with internet access
   #'
   #' ### On the computer without internet access:
   #' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
   #'   points to the newly copied folder of dependency files.
   #' - Install the `arrow` package
   #' - Run [arrow_info()] to check installed capabilities
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908362125






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698472411



##########
File path: r/tools/nixlibs.R
##########
@@ -82,7 +91,7 @@ download_binary <- function(os = identify_os()) {
 # * `TRUE` (not case-sensitive), to try to discover your current OS, or
 # * some other string, presumably a related "distro-version" that has binaries
 #   built that work for your OS
-identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", Sys.getenv("LIBARROW_DOWNLOAD"))) {
+identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", Sys.getenv("TEST_OFFLINE_BUILD"))) {

Review comment:
       If I'm following the logic here correctly, if `LIBARROW_BINARY` is unset, this will only attempt to identify the OS when `TEST_OFFLINE_BUILD` is `TRUE`. Is that what we want here?

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,64 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities

Review comment:
       Super minor, but an attempt to clarify which steps happen on which machines. We could also make subheadings if the parentheticals are too clunky since it's the first two steps on one computer and the rest on the other.
   
   ```suggestion
   #' - Install the `arrow` package (on a computer with internet access)
   #' - Run this function (on a computer with internet access)
   #' - Copy the saved dependency files to the computer without internet access
   #' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
   #'   points to the folder. (on the computer without internet access)
   #' - Install the `arrow` package (on the computer without internet access)
   #' - Run [arrow_info()] to check installed capabilities (on the computer without internet access)
   ```

##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access

Review comment:
       For the purposes of our CI job, we will have the checkout available so could copy them over in that process. But for people trying to install with the script, that's an issue. We could attempt to grab those files from github if they aren't findable with `system.file()`, but that opens up another can of worms to make sure we're grabbing the right version of those files for the install to work. 
   
   I don't think it's the end of the world to require the double installation until we find a better solution.

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,64 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities

Review comment:
       That looks great

##########
File path: dev/tasks/tasks.yml
##########
@@ -1033,6 +1033,14 @@ tasks:
       flags: '-e ARROW_SOURCE_HOME="/arrow" -e FORCE_BUNDLED_BUILD=TRUE -e LIBARROW_BUILD=TRUE -e ARROW_DEPENDENCY_SOURCE=SYSTEM'
       image: ubuntu-r-only-r
 
+  test-r-offline-minimal:
+      ci: azure
+      template: r/azure.linux.yml
+      params:
+        r_org: rocker
+        r_image: r-base
+        r_tag: latest
+        flags: '-e TEST_OFFLINE_BUILD=true'

Review comment:
       Azure is fine for this one. TBH, I picked Github Actions for the maximal build out of convenience since we already have a model that has dependent steps. But our CI system (AKA crossbow) is designed to be spread across a number of systems like this, so it's totally fine to use two different services for these two jobs.

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,66 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#' @param download_dependencies_sh location of the dependency download script,
+#' defaults to the one included with the arrow package.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `download_optional_dependencies(my_dependencies)`
+#' * Copy the directory `my-arrow-dependencies` to the computer without internet access
+#'
+#' ### On the computer without internet access, use the pre-downloaded dependencies:
+#' * Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied `my_dependencies`.
+#' * Install the `arrow` package
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(
+  deps_dir = Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR"),
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh = system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE

Review comment:
       To start, I've made this an argument to the function so that we can call it without installing in CI. We could also do this as an environment variable like we do for `deps_dir` (either internally or as an argument here). I don't have strong feelings one way or the other, though since this is pretty internal-use / CI-use only we might be best off not exposing this as an argument at all.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r697644910



##########
File path: r/tools/nixlibs.R
##########
@@ -329,24 +290,34 @@ build_libarrow <- function(src_dir, dst_dir) {
   env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
   env_vars <- with_s3_support(env_vars)
   env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
+  # turn_off_thirdparty_features() needs to happen after with_mimalloc() and
+  # with_s3_support(), since those might turn features ON.

Review comment:
       Working with `env_var_list` is definitely easier. 
   
   Rather than have `with_mimalloc` and `with_s3_support` each check for download status, it strikes me as cleaner to have  do their own specific compiler/dependency checks before the `thirdparty_deps_unavailable` check. Then if those features (and everything else) ends up not being downloadable, they're set to OFF again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702000356



##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +301,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it defined
+    EXTRA_CMAKE_FLAGS = Sys.getenv("EXTRA_CMAKE_FLAGS"),
     # Make sure we build with the same compiler settings that R is using
     CC = R_CMD_config("CC"),
     CXX = paste(R_CMD_config("CXX11"), R_CMD_config("CXX11STD")),
     # CXXFLAGS = R_CMD_config("CXX11FLAGS"), # We don't want the same debug symbols
     LDFLAGS = R_CMD_config("LDFLAGS")
   )
-  env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
-  env_vars <- with_s3_support(env_vars)
-  env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
+  env_var_list <- with_s3_support(env_var_list)
+  env_var_list <- with_mimalloc(env_var_list)
+  # turn_off_thirdparty_features() needs to happen after with_mimalloc() and
+  # with_s3_support(), since those might turn features ON.
+  thirdparty_deps_unavailable <- !download_ok &&
+    !dir.exists(thirdparty_dependency_dir) &&
+    !env_is("ARROW_DEPENDENCY_SOURCE", "system")
+  if (is_solaris()) {
+    # Note that JSON support does work on Solaris, but will be turned off with
+    # the rest of the thirdparty dependencies (when ARROW-13768 is resolved and

Review comment:
       Thanks -- I haven't pulled in the changes from ARROW-13768 yet. (We might re-run `test-r-offline-minimal` once that's done, since it will start setting `ARROW_JSON=OFF`).
   
   I forget how this goes exactly, but I think rebasing when I've already [merged](https://github.com/apache/arrow/pull/11001/commits/b560f0b08ea0fcff110d48caa19b6c9cfe4d3066) generates a weird series of coauthored commits. Do you mind if I merge again, rather than rebasing?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702120622



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies

Review comment:
       ```suggestion
   #' Create an source bundle that includes all thirdparty dependencies
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package.
+#' @return The full path to `dest_file`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package _or_ run
+#'   `source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R")`
+#' * Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+#' * Copy the newly created `my_arrow_pkg.tar.gz` to the computer without internet access
+#'
+#' ### On the computer without internet access, install the prepared package:
+#' * Install the `arrow` package from the copied file
+#'   * `install.packages("my_arrow_pkg.tar.gz", dependencies = c("Depends", "Imports", "LinkingTo"))`
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#'
+#' @examples
+#' \dontrun{
+#' new_pkg <- create_package_with_all_dependencies()
+#' # Note: this works when run in the same R session, but it's meant to be
+#' # copied to a different computer.
+#' install.packages(new_pkg, dependencies = c("Depends", "Imports", "LinkingTo"))
+#' }
+#' @export
+create_package_with_all_dependencies <- function(dest_file = NULL, source_file = NULL) {
+  if (is.null(source_file)) {
+    pkg_download_dir <- tempfile()
+    dir.create(pkg_download_dir)
+    on.exit(unlink(pkg_download_dir, recursive = TRUE), add = TRUE)
+    downloaded <- utils::download.packages("arrow", destdir = pkg_download_dir, type = "source")

Review comment:
       This is very minor, but do we want a message here saying that we are downloading the file?

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package.

Review comment:
       ```suggestion
   #' @param source_file File path for the input tar.gz package. Defaults to
   #' downloading the package from CRAN (or whatever you have set as the first in `getOption("repos")`).
   ```
   
   In adding this clarification, I realized that if someone has set as their first repo RStudio Package Manager, this might do funny things (though, they would be getting a binary which should have *most* of everything built already, the next steps would either be ignored, or won't work.) Maybe we "just" need to document that here and tell people if they are doing that to use the binary they get from there.

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,37 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `create_package_with_all_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed.
+This function provides a way to download them in advance.
+Doing so may be useful when installing Arrow on a computer without internet access.
+Note that Arrow _can_ be installed on a computer without internet access, but

Review comment:
       ```suggestion
   Note that Arrow _can_ be installed on a computer without internet access without doing this, but
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package.
+#' @return The full path to `dest_file`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package _or_ run
+#'   `source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R")`
+#' * Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+#' * Copy the newly created `my_arrow_pkg.tar.gz` to the computer without internet access
+#'
+#' ### On the computer without internet access, install the prepared package:
+#' * Install the `arrow` package from the copied file
+#'   * `install.packages("my_arrow_pkg.tar.gz", dependencies = c("Depends", "Imports", "LinkingTo"))`
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#'
+#' @examples
+#' \dontrun{
+#' new_pkg <- create_package_with_all_dependencies()
+#' # Note: this works when run in the same R session, but it's meant to be
+#' # copied to a different computer.
+#' install.packages(new_pkg, dependencies = c("Depends", "Imports", "LinkingTo"))
+#' }
+#' @export
+create_package_with_all_dependencies <- function(dest_file = NULL, source_file = NULL) {

Review comment:
       I'm fine with the order these are in. Generally I like inputs before outputs like Neal mentioned, but you're right that for most people `source_file` will be left blank.

##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +423,129 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.

Review comment:
       ```suggestion
   ```
   
   ARROW-13768 is resolved, so we can remove this, yeah?

##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +423,129 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON" = "OFF",
+    # The syntax to turn off XSIMD is different.
+    # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+    "EXTRA_CMAKE_FLAGS" = paste(
+      env_var_list[["EXTRA_CMAKE_FLAGS"]],
+      "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+    )
+  )
+  # Create a new env_var_list, with the values of turn_off set.
+  # replace() also adds new values if they didn't exist before
+  replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+  # This function does *not* check if existing *_SOURCE_URL variables are set.
+  # The directory tools/thirdparty_dependencies is created by
+  # create_package_with_all_dependencies() and saved in the tar file.
+  files <- list.files(thirdparty_dependency_dir, full.names = FALSE)
+  url_env_varname <- toupper(sub("(.*?)-.*", "ARROW_\\1_URL", files))
+  # Special handling for the aws dependencies, which have extra `-`
+  aws <- grepl("^aws", files)
+  url_env_varname[aws] <- sub(
+    "AWS_SDK_CPP", "AWSSDK",
+    gsub(
+      "-", "_",
+      sub(
+        "(AWS.*)-.*", "ARROW_\\1_URL",
+        toupper(files[aws])
+      )
+    )
+  )
+  full_filenames <- file.path(normalizePath(thirdparty_dependency_dir), files)
+
+  env_var_list <- replace(env_var_list, url_env_varname, full_filenames)
+  if (!quietly) {
+    env_var_list <- replace(env_var_list, "ARROW_VERBOSE_THIRDPARTY_BUILD", "ON")
+  }
+  env_var_list
+}
+
+with_mimalloc <- function(env_var_list) {
+  arrow_mimalloc <- env_is("ARROW_MIMALLOC", "on") || env_is("LIBARROW_MINIMAL", "false")
+  if (arrow_mimalloc) {

Review comment:
       ```suggestion
     # but if ARROW_MIMALLOC=OFF explicitly, we are definitely off, so override
     if (env_is("ARROW_MIMALLOC", "off")) {
     if (arrow_mimalloc) {
   ```
   
   This wasn't in the original, but like S3 below it, we want to be able to do `LIBARROW_MINIMAL=FALSE ARROW_MIMALLOC=OFF` and have everything on but mimalloc off. And while we're moving this code around might also well fix this too.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696784693



##########
File path: r/tools/nixlibs.R
##########
@@ -329,18 +288,22 @@ build_libarrow <- function(src_dir, dst_dir) {
     LDFLAGS = R_CMD_config("LDFLAGS")
   )
   env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
+  # Add env variables like ARROW_S3=ON. Order doesn't matter. Depends on `download_ok`
   env_vars <- with_s3_support(env_vars)
   env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
-  }
+  env_vars <- with_jemalloc(env_vars)

Review comment:
       > it kinda feels like we're reimplementing cmake here, and there's lots of subtleties that we can get wrong
   
   Yeah....
   
   -----------------
   
   It makes sense to just point to the directory! A couple of thoughts about implementation:
   - Should this have an Arrow-specific name, like `ARROW_THIRDPARTY_DEPENDENCIES`?
   - The files in that directory have names like `<component>-<version>.tar.gz`. To create the `*_SOURCE_URL`, I'll search for the component name, and raise an error if I find 0 or >1 matches?
   - If  `*_SOURCE_URL` is set already, I'll use that value instead of looking in the directory
   
   I noticed solaris doesn't turn json off. I'm not really sure how things work there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696042759



##########
File path: r/tools/nixlibs.R
##########
@@ -271,13 +281,18 @@ apache_download <- function(version, destfile, n_mirrors = 3) {
 }
 
 find_local_source <- function(arrow_home = Sys.getenv("ARROW_SOURCE_HOME", "..")) {
+  cpp_dir <- NULL
   if (file.exists(paste0(arrow_home, "/cpp/src/arrow/api.h"))) {
     # We're in a git checkout of arrow, so we can build it
-    cat("*** Found local C++ source\n")
-    return(paste0(arrow_home, "/cpp"))
-  } else {
-    return(NULL)
+    cpp_dir <- paste0(arrow_home, "/cpp")
+  } else if (file.exists("tools/cpp/src/arrow/api.h")) {

Review comment:
       Instead of the if/else if pattern, you could call find_local_source() twice, first with no args (to get the default), then `find_local_source("tools")`. Or you could assume `arrow_home` is a vector of paths to try and iterate over it. 

##########
File path: r/tools/nixlibs.R
##########
@@ -435,22 +558,177 @@ with_s3_support <- function(env_vars) {
       cat("**** S3 support requires version >= 1.0.2 of openssl-devel (rpm), libssl-dev (deb), or openssl (brew); building with ARROW_S3=OFF\n")
       arrow_s3 <- FALSE
     }
+    download_unavailable <- remote_download_unavailable(c(
+      "ARROW_AWSSDK_URL",
+      "ARROW_AWS_C_COMMON_URL",
+      "ARROW_AWS_CHECKSUMS_URL",
+      "ARROW_AWS_C_EVENT_STREAM_URL"
+    ))
+    if (download_unavailable) {
+      cat(paste(
+        "**** S3 dependencies need to be downloaded, but can't be.",
+        "See ?arrow::download_optional_dependencies.",
+        "Building with ARROW_S3=OFF\n"
+      ))
+      arrow_s3 <- FALSE
+    }
   }
   paste(env_vars, ifelse(arrow_s3, "ARROW_S3=ON", "ARROW_S3=OFF"))
 }
 
-with_mimalloc <- function(env_vars) {
-  arrow_mimalloc <- toupper(Sys.getenv("ARROW_MIMALLOC")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
-  if (arrow_mimalloc) {
-    # User wants mimalloc. If they're using gcc, let's make sure the version is >= 4.9
-    if (isTRUE(cmake_gcc_version(env_vars) < "4.9")) {
-      cat("**** mimalloc support not available for gcc < 4.9; building with ARROW_MIMALLOC=OFF\n")
-      arrow_mimalloc <- FALSE
+# Compression features: brotli, bz2, lz4, snappy, zlib, zstd
+with_brotli <- function(env_vars) {
+  arrow_brotli <- is_feature_requested("ARROW_WITH_BROTLI")
+  if (arrow_brotli) {
+    download_unavailable <- remote_download_unavailable("ARROW_BROTLI_URL")
+    if (download_unavailable) {
+      cat("**** brotli requested but cannot be downloaded. Setting ARROW_WITH_BROTLI=OFF\n")
+      arrow_brotli <- FALSE
     }
   }
-  paste(env_vars, ifelse(arrow_mimalloc, "ARROW_MIMALLOC=ON", "ARROW_MIMALLOC=OFF"))
+  paste(env_vars, ifelse(arrow_brotli, "ARROW_WITH_BROTLI=ON", "ARROW_WITH_BROTLI=OFF"))
+}
+
+with_bz2 <- function(env_vars) {
+  arrow_brotli <- is_feature_requested("ARROW_WITH_BZ2")

Review comment:
       This function says brotli but should be bz2

##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +43,21 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+quietly <- !env_is("ARROW_R_DEV", "true") # try_download uses quietly global
+# * download_ok, build_ok: Use prebuilt binary, if found, otherwise try to build
+# * no download, build_ok: Build with local git checkout, if available, or
+#   sources included in r/tools/cpp/. Optional dependencies are not included,
+#   and will not be automatically downloaded.
+#   https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+# * download_ok, no build: Only use prebuilt binary, if found
+# * neither: Get the arrow-without-arrow package
+# Download and build are OK unless you say not to (or can't access github)
+download_ok <- (!env_is("LIBARROW_DOWNLOAD", "false")) && try_download("https://github.com", tempfile())
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+# But binary defaults to not OK
+binary_ok <- !identical(tolower(Sys.getenv("LIBARROW_BINARY", "false")), "false")
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more

Review comment:
       This comment goes above L46

##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +43,21 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+quietly <- !env_is("ARROW_R_DEV", "true") # try_download uses quietly global
+# * download_ok, build_ok: Use prebuilt binary, if found, otherwise try to build
+# * no download, build_ok: Build with local git checkout, if available, or
+#   sources included in r/tools/cpp/. Optional dependencies are not included,
+#   and will not be automatically downloaded.
+#   https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+# * download_ok, no build: Only use prebuilt binary, if found
+# * neither: Get the arrow-without-arrow package
+# Download and build are OK unless you say not to (or can't access github)
+download_ok <- (!env_is("LIBARROW_DOWNLOAD", "false")) && try_download("https://github.com", tempfile())

Review comment:
       I think we want to remove the `LIBARROW_DOWNLOAD` env var altogether, and in fact not ever download Arrow C++ source (which is what this variable governs currently). Third party dependencies should always download if you're not offline, I think, as they do now.
   
   (This is why the github actions failed: the linux jobs set LIBARROW_DOWNLOAD: false in docker-compose.yml so that they're forced to use the local C++ checkout, but now they fail to download `cmake` because you extended the offline checks there too.)
   
   We probably should replace this with a new `TEST_OFFLINE_BUILD` variable that allows us to turn off downloading in order to simulate the offline build, and have CI jobs that test both with and without the `download_optional_dependencies()` call.

##########
File path: r/tools/nixlibs.R
##########
@@ -503,12 +781,10 @@ if (!file.exists(paste0(dst_dir, "/include/arrow/api.h"))) {
     unlink(bin_file)
   } else if (build_ok) {
     # (2) Find source and build it
-    if (download_ok) {
+    src_dir <- find_local_source()
+    if (is.null(src_dir) && download_ok) {

Review comment:
       I would delete `download_source()`--if we're bundling the C++ source in the R package, then we should never need to download it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-914439085


   Thank you both for all your help and patience getting this across the finish line!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912541996


   Revision: 13d8c4e0c8f5819ef73095603ac8d005d0d86db9
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-814](https://github.com/ursacomputing/crossbow/branches/all?query=actions-814)
   
   |Task|Status|
   |----|------|
   |test-r-offline-maximal|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-814-github-test-r-offline-maximal)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-814-github-test-r-offline-maximal)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702292342



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package.

Review comment:
       I agree that in the case of RSPM, people will (hopefully) "just" grab the binary and mode that to their offline server.
   
   I had to double check this to confirm it was doing what I thought: The way that RSPM works is that one requests `"source"` packages but from a specific URL which replies with binaries (since CRAN doesn't host binary linux packages). If one used the source repo URL from RSPM (https://packagemanager.rstudio.com/all/latest) one will get a source package just fine.
   
   With a (binary) RSPM url in `repos`:
   ```
   > utils::download.packages("arrow", destdir = "./", type = "source", repos = "https://packagemanager.rstudio.com/all/__linux__/bionic/latest")
   trying URL 'https://packagemanager.rstudio.com/all/__linux__/bionic/latest/src/contrib/arrow_5.0.0.tar.gz'
   Content type 'binary/octet-stream' length 21862325 bytes (20.8 MB)
   ==================================================
   downloaded 20.8 MB
   
        [,1]    [,2]                   
   [1,] "arrow" ".//arrow_5.0.0.tar.gz"
   > 
   > # includes arrow/libs/arrow.so, no inst, etc.
   > untar("arrow_5.0.0.tar.gz", list = TRUE)
    [1] "arrow/DESCRIPTION"           "arrow/INDEX"                
    [3] "arrow/Meta/"                 "arrow/Meta/Rd.rds"          
    [5] "arrow/Meta/features.rds"     "arrow/Meta/hsearch.rds"     
    [7] "arrow/Meta/links.rds"        "arrow/Meta/nsInfo.rds"      
    [9] "arrow/Meta/package.rds"      "arrow/Meta/vignette.rds"    
   [11] "arrow/NAMESPACE"             "arrow/NEWS.md"              
   [13] "arrow/NOTICE.txt"            "arrow/R/"                   
   [15] "arrow/R/arrow"               "arrow/R/arrow.rdb"          
   [17] "arrow/R/arrow.rdx"           "arrow/build_arrow_static.sh"
   [19] "arrow/demo_flight_server.py" "arrow/doc/"                 
   [21] "arrow/doc/arrow.Rmd"         "arrow/doc/arrow.html"       
   [23] "arrow/doc/dataset.R"         "arrow/doc/dataset.Rmd"      
   [25] "arrow/doc/dataset.html"      "arrow/doc/developing.R"     
   [27] "arrow/doc/developing.Rmd"    "arrow/doc/developing.html"  
   [29] "arrow/doc/flight.Rmd"        "arrow/doc/flight.html"      
   [31] "arrow/doc/fs.Rmd"            "arrow/doc/fs.html"          
   [33] "arrow/doc/index.html"        "arrow/doc/install.Rmd"      
   [35] "arrow/doc/install.html"      "arrow/doc/python.Rmd"       
   [37] "arrow/doc/python.html"       "arrow/help/"                
   [39] "arrow/help/AnIndex"          "arrow/help/aliases.rds"     
   [41] "arrow/help/arrow.rdb"        "arrow/help/arrow.rdx"       
   [43] "arrow/help/paths.rds"        "arrow/html/"                
   [45] "arrow/html/00Index.html"     "arrow/html/R.css"           
   [47] "arrow/libs/"                 "arrow/libs/arrow.so"        
   [49] "arrow/v0.7.1.parquet"       
   > 
   > # clean up
   > unlink("./arrow_5.0.0.tar.gz")
   ```
   
   With a standard CRAN repo url in `repos`:
   ```
   > # with another CRAN mirror
   > utils::download.packages("arrow", destdir = "./", type = "source", repos = "https://cloud.r-project.org/")
   trying URL 'https://cloud.r-project.org/src/contrib/arrow_5.0.0.tar.gz'
   Content type 'application/x-gzip' length 463913 bytes (453 KB)
   ==================================================
   downloaded 453 KB
   
        [,1]    [,2]                   
   [1,] "arrow" ".//arrow_5.0.0.tar.gz"
   > 
   > # a standard source bundle
   > untar("arrow_5.0.0.tar.gz", list = TRUE)
     [1] "arrow/"                                                                  
     [2] "arrow/NAMESPACE"                                                         
     [3] "arrow/tools/"                                                            
     [4] "arrow/tools/autobrew"                                                    
     [5] "arrow/tools/nixlibs.R"                                                   
     [6] "arrow/tools/winlibs.R"                                                   
     [7] "arrow/tools/ubsan.supp"                                                  
     [8] "arrow/README.md"                                                         
     [9] "arrow/man/"                                                              
    ...                                                    
    [99] "arrow/DESCRIPTION"                                                       
   [100] "arrow/build/"                                                            
   [101] "arrow/build/vignette.rds"                                                
   [102] "arrow/tests/"                                                            
   [103] "arrow/tests/testthat/"                                                   
   ...                                                 
   [176] "arrow/src/"                                                              
   [177] "arrow/src/altrep.cpp"                                                    
   [178] "arrow/src/compute.cpp"                                                   
   ...                                                  
   [219] "arrow/vignettes/"                                                        
   ...                                                
   [227] "arrow/configure.win"                                                     
   [228] "arrow/R/"                                                                
   ...                                                      
   [284] "arrow/NEWS.md"                                                           
   [285] "arrow/MD5"                                                               
   [286] "arrow/inst/"                                                             
   [287] "arrow/inst/NOTICE.txt"                                                   
   [288] "arrow/inst/doc/"                                                         
   [289] "arrow/inst/doc/flight.html"                                              
   [290] "arrow/inst/doc/install.html"                                             
   [291] "arrow/inst/doc/developing.html"                                          
   [292] "arrow/inst/doc/arrow.html"                                               
   [293] "arrow/inst/doc/fs.html"                                                  
   [294] "arrow/inst/doc/developing.Rmd"                                           
   [295] "arrow/inst/doc/dataset.html"                                             
   [296] "arrow/inst/doc/python.html"                                              
   [297] "arrow/inst/doc/dataset.R"                                                
   [298] "arrow/inst/doc/dataset.Rmd"                                              
   [299] "arrow/inst/doc/install.Rmd"                                              
   [300] "arrow/inst/doc/flight.Rmd"                                               
   [301] "arrow/inst/doc/python.Rmd"                                               
   [302] "arrow/inst/doc/arrow.Rmd"                                                
   [303] "arrow/inst/doc/developing.R"                                             
   [304] "arrow/inst/doc/fs.Rmd"                                                   
   [305] "arrow/inst/demo_flight_server.py"                                        
   [306] "arrow/inst/v0.7.1.parquet"                                               
   [307] "arrow/inst/build_arrow_static.sh"                                        
   [308] "arrow/cleanup"                                                           
   [309] "arrow/configure"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701391056



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param outfile File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param package_source File path for the input tar.gz package. Defaults to
+#' downloading from CRAN.
+#' @param quietly boolean, default `TRUE`. If `FALSE`, narrate progress.
+#' @return The full path to `outfile`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+#' * Copy the newly created `my_arrow_pkg.tar.gz` to the computer without internet access
+#'
+#' ### On the computer without internet access, install the prepared package:
+#' * Install the `arrow` package from the copied file (`install.packages("my_arrow_pkg.tar.gz")`)
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#'
+#' @examples
+#' \dontrun{
+#' new_pkg <- create_package_with_all_dependencies()
+#' # Note: this works when run in the same R session, but it's meant to be
+#' # copied to a different computer.
+#' install.packages(new_pkg, dependencies = c("Depends", "Imports", "LinkingTo"))
+#' }
+#' @export
+create_package_with_all_dependencies <- function(outfile = NULL, package_source = NULL, quietly = TRUE) {

Review comment:
       No particular reason we need `quietly` here. I'll take it out.
   
   `source_file` and `dest_file` are definitely better argument names. I did outputs first because I was thinking the input file would rarely be specified by a user. I thought it would mainly be useful for testing (or other cases where you wanted a development version, not something from the CRAN/nightly repo).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696982101



##########
File path: r/inst/build_arrow_static.sh
##########
@@ -59,7 +59,7 @@ ${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \
     -DARROW_FILESYSTEM=ON \
     -DARROW_JEMALLOC=${ARROW_JEMALLOC:-$ARROW_DEFAULT_PARAM} \
     -DARROW_MIMALLOC=${ARROW_MIMALLOC:-ON} \
-    -DARROW_JSON=ON \
+    -DARROW_JSON=${ARROW_JSON:-ON} \

Review comment:
       Great -- I'll revert this for now




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912594670


   Revision: 939cb87a1cd7774054e6aca7a8d87a184cc663ae
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-816](https://github.com/ursacomputing/crossbow/branches/all?query=actions-816)
   
   |Task|Status|
   |----|------|
   |test-r-offline-maximal|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-816-github-test-r-offline-maximal)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-816-github-test-r-offline-maximal)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912616447


   Revision: 6bf7b8515a69c35d6e3ca76d120711c8307117fd
   
   Submitted crossbow builds: [ursacomputing/crossbow @ actions-817](https://github.com/ursacomputing/crossbow/branches/all?query=actions-817)
   
   |Task|Status|
   |----|------|
   |conda-linux-gcc-py36-cpu-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-conda-linux-gcc-py36-cpu-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-conda-linux-gcc-py36-cpu-r40)|
   |conda-linux-gcc-py37-cpu-r41|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-conda-linux-gcc-py37-cpu-r41)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-conda-linux-gcc-py37-cpu-r41)|
   |conda-osx-clang-py36-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-conda-osx-clang-py36-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-conda-osx-clang-py36-r40)|
   |conda-osx-clang-py37-r41|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-conda-osx-clang-py37-r41)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-conda-osx-clang-py37-r41)|
   |conda-win-vs2017-py36-r40|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-conda-win-vs2017-py36-r40)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-conda-win-vs2017-py36-r40)|
   |conda-win-vs2017-py37-r41|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-conda-win-vs2017-py37-r41)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-conda-win-vs2017-py37-r41)|
   |homebrew-r-autobrew|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-homebrew-r-autobrew)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-homebrew-r-autobrew)|
   |test-r-depsource-auto|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-depsource-auto)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-depsource-auto)|
   |test-r-depsource-system|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-depsource-system)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-depsource-system)|
   |test-r-devdocs|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-devdocs)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-devdocs)|
   |test-r-gcc-11|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-gcc-11)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-gcc-11)|
   |test-r-install-local|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-install-local)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-install-local)|
   |test-r-linux-as-cran|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-linux-as-cran)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-linux-as-cran)|
   |test-r-linux-rchk|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-linux-rchk)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-linux-rchk)|
   |test-r-linux-valgrind|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-linux-valgrind)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-linux-valgrind)|
   |test-r-minimal-build|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-minimal-build)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-minimal-build)|
   |test-r-offline-maximal|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-offline-maximal)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-offline-maximal)|
   |test-r-offline-minimal|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-offline-minimal)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-offline-minimal)|
   |test-r-rhub-debian-gcc-devel-lto-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-rhub-debian-gcc-devel-lto-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-rhub-debian-gcc-devel-lto-latest)|
   |test-r-rhub-ubuntu-gcc-release-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-rhub-ubuntu-gcc-release-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-rhub-ubuntu-gcc-release-latest)|
   |test-r-rocker-r-base-latest|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-rocker-r-base-latest)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-rocker-r-base-latest)|
   |test-r-rstudio-r-base-3.6-bionic|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-rstudio-r-base-3.6-bionic)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-rstudio-r-base-3.6-bionic)|
   |test-r-rstudio-r-base-3.6-centos7-devtoolset-8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-rstudio-r-base-3.6-centos7-devtoolset-8)|
   |test-r-rstudio-r-base-3.6-centos8|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-rstudio-r-base-3.6-centos8)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-rstudio-r-base-3.6-centos8)|
   |test-r-rstudio-r-base-3.6-opensuse15|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-rstudio-r-base-3.6-opensuse15)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-rstudio-r-base-3.6-opensuse15)|
   |test-r-rstudio-r-base-3.6-opensuse42|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-r-rstudio-r-base-3.6-opensuse42)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-r-rstudio-r-base-3.6-opensuse42)|
   |test-r-ubuntu-21.04|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-ubuntu-21.04)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-ubuntu-21.04)|
   |test-r-version-compatibility|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-version-compatibility)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-version-compatibility)|
   |test-r-versions|[![Github Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-817-github-test-r-versions)](https://github.com/ursacomputing/crossbow/actions?query=branch:actions-817-github-test-r-versions)|
   |test-ubuntu-18.04-r-sanitizer|[![Azure](https://dev.azure.com/ursacomputing/crossbow/_apis/build/status/ursacomputing.crossbow?branchName=actions-817-azure-test-ubuntu-18.04-r-sanitizer)](https://dev.azure.com/ursacomputing/crossbow/_build/latest?definitionId=1&branchName=actions-817-azure-test-ubuntu-18.04-r-sanitizer)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909368665






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698784232



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Yeah.... fair




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r697695071



##########
File path: dev/tasks/tasks.yml
##########
@@ -1033,6 +1033,14 @@ tasks:
       flags: '-e ARROW_SOURCE_HOME="/arrow" -e FORCE_BUNDLED_BUILD=TRUE -e LIBARROW_BUILD=TRUE -e ARROW_DEPENDENCY_SOURCE=SYSTEM'
       image: ubuntu-r-only-r
 
+  test-r-offline-minimal:
+      ci: azure
+      template: r/azure.linux.yml
+      params:
+        r_org: rocker
+        r_image: r-base
+        r_tag: latest
+        flags: '-e TEST_OFFLINE_BUILD=true'

Review comment:
       @jonkeane, when you're back and have time to look at this, it'd be great to have suggestions on the CI. I tried to add one task here, with the goal of testing the feature-light offline build.
   
   I wasn't sure if that task is correct, or how to go about testing the feature-rich offline build. Currently that build requires two installs: one to bring in the `download_optional_dependencies` function and another to build with those dependencies downloaded. @nealrichardson's comment [here](https://github.com/apache/arrow/pull/11001/#discussion_r697443624) might let us avoid the first install.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696722328



##########
File path: r/tools/nixlibs.R
##########
@@ -329,18 +288,22 @@ build_libarrow <- function(src_dir, dst_dir) {
     LDFLAGS = R_CMD_config("LDFLAGS")
   )
   env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
+  # Add env variables like ARROW_S3=ON. Order doesn't matter. Depends on `download_ok`
   env_vars <- with_s3_support(env_vars)
   env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
-  }
+  env_vars <- with_jemalloc(env_vars)

Review comment:
       This is cool, but it kinda feels like we're reimplementing cmake here, and there's lots of subtleties that we can get wrong. I wonder if there's a simpler approach:
   
   * Instead of checking for all of the `*_SOURCE_URL`s in env vars, we could add a single env var like `THIRDPARTY_DEPENDENCY_DIR`, and build all those source URLs in this script if that is set. 
   * Simplify the build configuration logic here: if `!download_ok && !dir.exists(Sys.getenv("THIRDPARTY_DEPENDENCY_DIR")) && Sys.getenv("ARROW_DEPENDENCY_SOURCE") != "SYSTEM"` turn everything off (like the solaris case here, plus any others you've identified)
   
   Later, if/when ARROW-8155 happens, we can make some of those "OFFs" become optionally on if the system has them.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r697635592



##########
File path: r/tools/nixlibs.R
##########
@@ -415,10 +389,134 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
+turn_off_thirdparty_features <- function(env_vars) {
+
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC=OFF",
+    "ARROW_JEMALLOC=OFF",
+    "ARROW_PARQUET=OFF", # depends on thrift
+    "ARROW_DATASET=OFF", # depends on parquet
+    "ARROW_S3=OFF",
+    "ARROW_WITH_BROTLI=OFF",
+    "ARROW_WITH_BZ2=OFF",
+    "ARROW_WITH_LZ4=OFF",
+    "ARROW_WITH_SNAPPY=OFF",
+    "ARROW_WITH_ZLIB=OFF",
+    "ARROW_WITH_ZSTD=OFF",
+    "ARROW_WITH_RE2=OFF",
+    "ARROW_WITH_UTF8PROC=OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON=OFF",
+    # The syntax to turn off XSIMD is different.
+    'EXTRA_CMAKE_FLAGS="-DARROW_SIMD_LEVEL=NONE"'
+  )
+  if (Sys.getenv("EXTRA_CMAKE_FLAGS") != "") {
+    # Error rather than overwriting EXTRA_CMAKE_FLAGS
+    # (Correctly inserting the flag into an existing quoted string is tricky)
+    stop("Sorry, setting EXTRA_CMAKE_FLAGS is not supported at this time.")
+  }
+  paste(env_vars, paste(turn_off, collapse = " "))
+}
+
+set_thirdparty_urls <- function(env_vars) {
+  deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  files <- list.files(deps_dir, full.names = FALSE)
+  if (length(files) == 0) {
+    # This will be true if the variable is unset, if it's set but the directory
+    # doesn't exist, or if it exists but is empty.
+    return(env_vars)
+  }
+  dep_names <- c(
+    "absl", # not used; seems to be a dependency of gRPC
+    "aws-sdk-cpp",
+    "aws-checksums",
+    "aws-c-common",
+    "aws-c-event-stream",
+    "boost",
+    "brotli",
+    "bzip2",
+    "cares", # not used; "a dependency of gRPC"
+    "gbenchmark", # not used; "Google benchmark, for testing"
+    "gflags", # not used; "for command line utilities (formerly Googleflags)"
+    "glog", # not used; "for logging"
+    "grpc", # not used; "for remote procedure calls"
+    "gtest", # not used; "Googletest, for testing"
+    "jemalloc",
+    "lz4",
+    "mimalloc",
+    "orc", # not used; "for Apache ORC format support"
+    "protobuf", # not used; "Google Protocol Buffers, for data serialization"
+    "rapidjson",
+    "re2",
+    "snappy",
+    "thrift",
+    "utf8proc",
+    "xsimd",
+    "zlib",
+    "zstd"
+  )
+  dep_regex <- paste0("^(", paste(dep_names, collapse = "|"), ").*")
+  # If there were extra files in the folder (not matching our regex) drop them.
+  files <- files[grepl(dep_regex, files, perl = TRUE)]
+  # Convert e.g. "thrift-0.13.0.tar.gz" to ARROW_THRIFT_URL
+  # Note that if there's no file called thrift*, we won't add
+  # ARROW_THRIFT_URL to env_vars.
+  url_env_varname <- sub(dep_regex, "ARROW_\\1_URL", files, perl = TRUE)
+  url_env_varname <- toupper(gsub("-", "_", url_env_varname, fixed = TRUE))
+  # Special case: ARROW_AWSSDK_URL for aws-sdk-cpp-<version>.tar.gz
+  url_env_varname <- sub("ARROW_AWS_SDK_CPP_URL", "ARROW_AWSSDK_URL", url_env_varname, fixed = TRUE)

Review comment:
       Resolving in favor of the version you wrote.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw edited a comment on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw edited a comment on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909368665






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701391352



##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,42 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `create_package_with_all_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed.
+This function provides a way to download them in advance.
+Doing so may be useful when installing Arrow on a computer without internet access.
+Note that Arrow _can_ be installed on a computer without internet access, but
+many useful features will be disabled, as they depend on third-party components.
+More precisely, `arrow::arrow_info()$capabilities()` will be `FALSE` for every
+capability.
+One approach to add more capabilities in an offline install is to prepare a
+package with pre-downloaded dependencies. The
+`create_package_with_all_dependencies()` function does this preparation.
+
+### Using a computer with internet access, pre-download the dependencies:
+* Install the `arrow` package
+* Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+* Copy the newly created `my_arrow_pkg.tar.gz` to the computer without internet access
+
+### On the computer without internet access, install the prepared package:
+* Install the `arrow` package from the copied file (`install.packages("my_arrow_pkg.tar.gz")`)
+  * This installation will build from source, so `cmake` must be available
+* Run `arrow_info()` to check installed capabilities
+
+
+### Using a computer with internet access, pre-download the dependencies:
+* Install the `arrow` package
+* Run `download_optional_dependencies(my_dependencies)`
+* Copy the directory `my-arrow-dependencies` to the computer without internet access
+
+### On the computer without internet access, use the pre-downloaded dependencies:
+* Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+  points to the newly copied `my_dependencies`.
+* Install the `arrow` package
+  * This installation will build from source, so `cmake` must be available
+* Run `arrow_info()` to check installed capabilities
+

Review comment:
       Yes, thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912089234


   @github-actions crossbow submit test-r-offline-minimal test-r-offline-maximal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908454699


   
   > One option for the two-stage testing process would be to do something similar to [our version compatibility tests](https://github.com/apache/arrow/blob/master/dev/tasks/r/github.linux.version.compatibility.yml). There we use Github Actions with two jobs, one to write files (here, download the dependencies) and one to read them/test against them (here, install offline). I'm also happy to toss something together for this if you would like + are ok with me sending a couple of commits to your branch.
   
   That would be great!
   
   
   > I also added a few minor comments that I noticed as I read through the PR to see what's been going so far. I will echo @nealrichardson comments that this is a really fantastic PR, in an area of Arrow that is not particularly well documented for contributors, thank you!
   
   Thank you both! I appreciate all the time you've spent providing review and feedback!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909368665


   > An idea came to me last night: what if we had a utility that would make a "fat" package, like:
   > 
   > ```r
   > function(source_package) {
   >   untar(source_package)
   >   system("inst/download_script.sh tools/thirdparty")
   >   tar()
   > }
   > ```
   > 
   > then you would just copy that arrow_x.y.z.tar.gz and install it, no need to copy other files and set env vars.
   
   I like that! I can take a try at it.
   
   * Do you want to include all of the downloaded files (87 MB), or just the ones an R build could possibly use (55 MB)? 
   * Do you still want to be able to run `download_optional_dependencies` from within an installed R package? If not, we can use the copy of `download_dependencies.sh` that's in `tools/cpp/thirdparty/` (we're currently making another copy for `inst/` so it's available at runtime).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908812580


   @github-actions autotune 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702058563



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param outfile File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param package_source File path for the input tar.gz package. Defaults to
+#' downloading from CRAN.
+#' @param quietly boolean, default `TRUE`. If `FALSE`, narrate progress.
+#' @return The full path to `outfile`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### Using a computer with internet access, pre-download the dependencies:
+#' * Install the `arrow` package
+#' * Run `create_package_with_all_dependencies("my_arrow_pkg.tar.gz")`
+#' * Copy the newly created `my_arrow_pkg.tar.gz` to the computer without internet access
+#'
+#' ### On the computer without internet access, install the prepared package:
+#' * Install the `arrow` package from the copied file (`install.packages("my_arrow_pkg.tar.gz")`)
+#'   * This installation will build from source, so `cmake` must be available
+#' * Run [arrow_info()] to check installed capabilities
+#'
+#'
+#' @examples
+#' \dontrun{
+#' new_pkg <- create_package_with_all_dependencies()
+#' # Note: this works when run in the same R session, but it's meant to be
+#' # copied to a different computer.
+#' install.packages(new_pkg, dependencies = c("Depends", "Imports", "LinkingTo"))
+#' }
+#' @export
+create_package_with_all_dependencies <- function(outfile = NULL, package_source = NULL, quietly = TRUE) {

Review comment:
       Does that seem reasonable? I'm happy to swap the argument order if you'd prefer.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-907658348


   > Looking through the logs, I'm still downloading XSIMD when `TEST_OFFLINE_BUILD` is true and `ARROW_THIRDPARTY_DEPENDENCY_DIR` isn't set. The `ARROW_SIMD_LEVEL` setting is getting picked up, but somehow that doesn't translate to not using XSIMD.
   > 
   
   Looking at https://github.com/apache/arrow/blob/master/cpp/cmake_modules/ThirdpartyToolchain.cmake#L1929-L1935, it looks like you have to set the RUNTIME level to NONE also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912593756


   @github-actions crossbow submit test-r-offline-maximal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-905793851


   These builds are failing because they set `LIBARROW_DOWNLOAD` is `false` and they need to download cmake, but my changes block downloading cmake when `LIBARROW_DOWNLOAD` is `false` (or when github.com can't be reached). Should I allow cmake to be downloaded here and assume offline builds have cmake installed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702047773



##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +301,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it defined
+    EXTRA_CMAKE_FLAGS = Sys.getenv("EXTRA_CMAKE_FLAGS"),
     # Make sure we build with the same compiler settings that R is using
     CC = R_CMD_config("CC"),
     CXX = paste(R_CMD_config("CXX11"), R_CMD_config("CXX11STD")),
     # CXXFLAGS = R_CMD_config("CXX11FLAGS"), # We don't want the same debug symbols
     LDFLAGS = R_CMD_config("LDFLAGS")
   )
-  env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
-  env_vars <- with_s3_support(env_vars)
-  env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
+  env_var_list <- with_s3_support(env_var_list)
+  env_var_list <- with_mimalloc(env_var_list)
+  # turn_off_thirdparty_features() needs to happen after with_mimalloc() and
+  # with_s3_support(), since those might turn features ON.
+  thirdparty_deps_unavailable <- !download_ok &&
+    !dir.exists(thirdparty_dependency_dir) &&
+    !env_is("ARROW_DEPENDENCY_SOURCE", "system")
+  if (is_solaris()) {
+    # Note that JSON support does work on Solaris, but will be turned off with
+    # the rest of the thirdparty dependencies (when ARROW-13768 is resolved and

Review comment:
       Sure, whatever you want, and if it messes things up, we can rebase -i and sort it out after.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw edited a comment on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw edited a comment on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909368665


   > An idea came to me last night: what if we had a utility that would make a "fat" package, like:
   > 
   > ```r
   > function(source_package) {
   >   untar(source_package)
   >   system("inst/download_script.sh tools/thirdparty")
   >   tar()
   > }
   > ```
   > 
   > then you would just copy that arrow_x.y.z.tar.gz and install it, no need to copy other files and set env vars.
   
   I like that! I can take a try at it.
   
   * Do you want to include all of the downloaded files (87 MB), or just the ones an R build could possibly use (55 MB)? 
   * Do you still want to be able to run `download_optional_dependencies` from within an installed R package? If not, we can use the copy of `download_dependencies.sh` that's in `tools/cpp/thirdparty/` (we're currently making another copy for `inst/` so it's available at runtime).
   * Should we bundle cmake while we're at it? This might be convenient, but is a bit of scope creep.
   
    (edit: add third bullet)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698528943



##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access

Review comment:
       For the purposes of our CI job, we will have the checkout available so could copy them over in that process. But for people trying to install with the script, that's an issue. We could attempt to grab those files from github if they aren't findable with `system.file()`, but that opens up another can of worms to make sure we're grabbing the right version of those files for the install to work. 
   
   I don't think it's the end of the world to require the double installation until we find a better solution.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698579859



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,64 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities

Review comment:
       That looks great




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r697568448



##########
File path: r/tools/nixlibs.R
##########
@@ -413,10 +392,114 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
+turn_off_thirdparty_features <- function(env_vars) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC=OFF",
+    "ARROW_JEMALLOC=OFF",
+    "ARROW_PARQUET=OFF", # depends on thrift
+    "ARROW_DATASET=OFF", # depends on parquet
+    "ARROW_S3=OFF",
+    "ARROW_WITH_BROTLI=OFF",
+    "ARROW_WITH_BZ2=OFF",
+    "ARROW_WITH_LZ4=OFF",
+    "ARROW_WITH_SNAPPY=OFF",
+    "ARROW_WITH_ZLIB=OFF",
+    "ARROW_WITH_ZSTD=OFF",
+    "ARROW_WITH_RE2=OFF",
+    "ARROW_WITH_UTF8PROC=OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON=OFF",
+    # The syntax to turn off XSIMD is different.
+    'EXTRA_CMAKE_FLAGS="-DARROW_SIMD_LEVEL=NONE"'
+  )
+  if (Sys.getenv("EXTRA_CMAKE_FLAGS") != "") {
+    # Error rather than overwriting EXTRA_CMAKE_FLAGS
+    # (Correctly inserting the flag into an existing quoted string is tricky)
+    stop("Sorry, setting EXTRA_CMAKE_FLAGS is not supported at this time.")
+  }
+  paste(env_vars, paste(turn_off, collapse = " "))
+}
+
+set_thirdparty_urls <- function(env_vars) {
+  deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  files <- list.files(deps_dir, full.names = FALSE)
+  if (length(files) == 0) {
+    # This will be true if the variable is unset, if it's set but the directory

Review comment:
       Sure!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908454699


   
   > One option for the two-stage testing process would be to do something similar to [our version compatibility tests](https://github.com/apache/arrow/blob/master/dev/tasks/r/github.linux.version.compatibility.yml). There we use Github Actions with two jobs, one to write files (here, download the dependencies) and one to read them/test against them (here, install offline). I'm also happy to toss something together for this if you would like + are ok with me sending a couple of commits to your branch.
   
   That would be great!
   
   
   > I also added a few minor comments that I noticed as I read through the PR to see what's been going so far. I will echo @nealrichardson comments that this is a really fantastic PR, in an area of Arrow that is not particularly well documented for contributors, thank you!
   
   Thank you both! I appreciate all the time you've spent providing review and feedback!
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696121898



##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +43,21 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+quietly <- !env_is("ARROW_R_DEV", "true") # try_download uses quietly global
+# * download_ok, build_ok: Use prebuilt binary, if found, otherwise try to build
+# * no download, build_ok: Build with local git checkout, if available, or
+#   sources included in r/tools/cpp/. Optional dependencies are not included,
+#   and will not be automatically downloaded.
+#   https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+# * download_ok, no build: Only use prebuilt binary, if found
+# * neither: Get the arrow-without-arrow package
+# Download and build are OK unless you say not to (or can't access github)
+download_ok <- (!env_is("LIBARROW_DOWNLOAD", "false")) && try_download("https://github.com", tempfile())

Review comment:
       
   > CI jobs that test both with and without the `download_optional_dependencies()` call.
   
   
   Is the right place for those this file?
   https://github.com/ursa-labs/arrow-r-nightly/blob/master/.github/workflows/build-and-test-all.yml




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698750855



##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +300,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it defined

Review comment:
       Later in the code I have:
   https://github.com/apache/arrow/blob/98b5601f94ff0f0caf240c6e1b914d4e8e49f98e/r/tools/nixlibs.R#L447-L452
   
   If we don't add `EXTRA_CMAKE_FLAGS` to the vector, that section could instead be
   ```r
       # The syntax to turn off XSIMD is different.
       # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
       "EXTRA_CMAKE_FLAGS" = paste(
         Sys.getenv("EXTRA_CMAKE_FLAGS"),
         "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
       )
   ```
   
   I did it the first way to have the `EXTRA_CMAKE_FLAGS` collected at the same time as the other existing build flags, but if you think the second way is cleaner, I'm happy to change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698674825



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       ```suggestion
   #' - If you don't already have the `arrow` package installed, get this function by
   #' `source("https://raw.githubusercontent.com/apache/arrow/master/r/R/install-arrow.R")`
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -373,7 +374,15 @@ ensure_cmake <- function() {
     )
     cmake_tar <- tempfile()
     cmake_dir <- tempfile()
-    try_download(cmake_binary_url, cmake_tar)
+    download_successful <- try_download(cmake_binary_url, cmake_tar)
+    if (!download_successful) {
+      cat(paste0(
+        "*** cmake was not found locally and download failed.\n",
+        "    Make sure cmake is installed and available on your PATH\n",
+        "    (or download '", cmake_binary_url,
+        "' and define the CMAKE environment variable).\n"
+      ))

Review comment:
       ```suggestion
         cat(paste0(
           "*** cmake was not found locally and download failed.\n",
           "    Make sure cmake >= 3.10 is installed and available on your PATH,\n",
           "    or download ", cmake_binary_url, "\n",
           "    and define the CMAKE environment variable.\n"
         ))
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package
+#' - Run this function
+#' - Copy the saved dependency files to the computer with internet access
+#'
+#' ### On the computer without internet access:
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied folder of dependency files.
+#' - Install the `arrow` package
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir = NULL) {
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  if (is.null(deps_dir) && Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR") != "") {
+    deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  }
+
+  dir.create(deps_dir, showWarnings = FALSE, recursive = TRUE)
+  # Run download_dependencies.sh
+  cat(paste0("*** Downloading optional dependencies to ", deps_dir, "\n"))
+  return_status <- system2(download_dependencies_sh,
+    args = deps_dir, stdout = FALSE, stderr = FALSE
+  )
+  if (isTRUE(return_status == 0)) {
+    cat(paste0(
+      "**** Set environment variable on offline machine and re-build arrow:\n",

Review comment:
       Should this message also tell you to copy the directory to the other machine?

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package
+#' - Run this function
+#' - Copy the saved dependency files to the computer with internet access
+#'
+#' ### On the computer without internet access:
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied folder of dependency files.
+#' - Install the `arrow` package
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir = NULL) {
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  if (is.null(deps_dir) && Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR") != "") {
+    deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  }

Review comment:
       ```suggestion
   download_optional_dependencies <- function(deps_dir = Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")) {
     # This script is copied over from arrow/cpp/... to arrow/r/inst/...
     download_dependencies_sh <- system.file(
       "thirdparty/download_dependencies.sh",
       package = "arrow",
       mustWork = TRUE
     )
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -304,10 +316,12 @@ By default, these are all unset. All boolean variables are case-insensitive.
   won't look for Arrow libraries on your system and instead will look to download/build them.
   Use this if you have a version mismatch between installed system libraries
   and the version of the R package you're installing.
-* `LIBARROW_DOWNLOAD`: Unless set to `false`, the build script
-  will attempt to download C++ binary or source bundles.
+* `TEST_OFFLINE_BUILD`: Unless set to `true`, the build script
+  will download prebuilt C++ binary or third-party source bundles as necessary.
   If you're in a checkout of the `apache/arrow` git repository

Review comment:
       ```suggestion
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,14 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `download_optional_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed. This function
+provides a way to download them in advance. Relevant environment variables are
+`ARROW_THIRDPARTY_DEPENDENCY_DIR` for the directory of downloaded dependencies
+and `TEST_OFFLINE_BUILD` to force the build process not to download.

Review comment:
       I don't think we should document this in this vignette--users should not worry with this env var, it's for us for testing

##########
File path: r/vignettes/install.Rmd
##########
@@ -304,10 +316,12 @@ By default, these are all unset. All boolean variables are case-insensitive.
   won't look for Arrow libraries on your system and instead will look to download/build them.
   Use this if you have a version mismatch between installed system libraries
   and the version of the R package you're installing.
-* `LIBARROW_DOWNLOAD`: Unless set to `false`, the build script
-  will attempt to download C++ binary or source bundles.
+* `TEST_OFFLINE_BUILD`: Unless set to `true`, the build script
+  will download prebuilt C++ binary or third-party source bundles as necessary.
   If you're in a checkout of the `apache/arrow` git repository
-  and want to build the C++ library from the local source, make this `false`.
+  and want to build the C++ library from the local source, make this `false` or
+  not set. If building the C++ library from source with cmake unavailable, cmake

Review comment:
       ```suggestion
     If building the C++ library from source with cmake unavailable, cmake
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -29,17 +29,8 @@ if (getRversion() < 3.4 && is.null(getOption("download.file.method"))) {
 options(.arrow.cleanup = character()) # To collect dirs to rm on exit
 on.exit(unlink(getOption(".arrow.cleanup")))
 
+

Review comment:
       ```suggestion
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +300,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it defined

Review comment:
       Why?

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,14 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `download_optional_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed. This function
+provides a way to download them in advance. Relevant environment variables are

Review comment:
       These sentences should probably mention the offline/airgapped server use case and how you'd use it. 

##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +422,144 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON" = "OFF",
+    # The syntax to turn off XSIMD is different.
+    # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+    "EXTRA_CMAKE_FLAGS" = paste(
+      env_var_list[["EXTRA_CMAKE_FLAGS"]],
+      "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+    )
+  )
+  # Create a new env_var_list, with the values of turn_off set.
+  # replace() also adds new values if they didn't exist before
+  replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+  # This function is run in most typical cases -- when download_ok is TRUE *or*
+  # ARROW_THIRDPARTY_DEPENDENCY_DIR is set. It does *not* check if existing
+  # *_SOURCE_URL variables are set. (It is also run whenever ARROW_DEPENDENCY_SOURCE
+  # is "SYSTEM", but doesn't affect the build in that case.)
+  deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  if (deps_dir == "") {
+    return(env_var_list)
+  }
+  files <- list.files(deps_dir, full.names = FALSE)
+  if (length(files) == 0) {
+    # This will be true if the directory doesn't exist, or if it exists but is empty.
+    # Here the build will continue, but will likely fail when the downloads are
+    # unavailable. The user will end up with the arrow-without-arrow package.
+    cat(paste0(
+      "*** Error: ARROW_THIRDPARTY_DEPENDENCY_DIR was set but has no files.\n",

Review comment:
       ```suggestion
         "*** Warning: ARROW_THIRDPARTY_DEPENDENCY_DIR was set but has no files.\n",
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -52,6 +43,24 @@ try_download <- function(from_url, to_file) {
   !inherits(status, "try-error") && status == 0
 }
 
+build_ok <- !env_is("LIBARROW_BUILD", "false")
+# But binary defaults to not OK
+binary_ok <- !identical(tolower(Sys.getenv("LIBARROW_BINARY", "false")), "false")
+# For local debugging, set ARROW_R_DEV=TRUE to make this script print more
+
+quietly <- !env_is("ARROW_R_DEV", "true") # try_download uses quietly global
+# * download_ok, build_ok: Use prebuilt binary, if found, otherwise try to build
+# * !download_ok, build_ok: Build with local git checkout, if available, or
+#   sources included in r/tools/cpp/. Optional dependencies are not included,
+#   and will not be automatically downloaded.
+#   cmake will still be downloaded if necessary
+#   https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+# * download_ok, !build_ok: Only use prebuilt binary, if found
+# * neither: Get the arrow-without-arrow package
+# Download and build are OK unless you say not to (or can't access github)
+download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && try_download("https://github.com", tempfile())
+
+

Review comment:
       ```suggestion
   # For local debugging, set ARROW_R_DEV=TRUE to make this script print more
   quietly <- !env_is("ARROW_R_DEV", "true")
   
   # Default is build from source, not download a binary
   build_ok <- !env_is("LIBARROW_BUILD", "false")
   binary_ok <- !identical(tolower(Sys.getenv("LIBARROW_BINARY", "false")), "false")
   
   # Check if we're doing an offline build.
   # (Note that cmake will still be downloaded if necessary
   #  https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds)
   download_ok <- !env_is("TEST_OFFLINE_BUILD", "true") && try_download("https://github.com", tempfile())
   
   ```

##########
File path: r/vignettes/install.Rmd
##########
@@ -343,6 +357,7 @@ By default, these are all unset. All boolean variables are case-insensitive.
 * `CMAKE`: When building the C++ library from source, you can specify a
   `/path/to/cmake` to use a different version than whatever is found on the `$PATH`
 
+

Review comment:
       ```suggestion
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Oh, I guess you're also relying on the package installation to deliver the download_dependencies.sh and versions.txt scripts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-905889478


   In the latest commit, I removed `LIBARROW_DOWNLOAD` and added `TEST_OFFLINE_BUILD`. Does that seem right to you?
   
   I wasn't positive I got the logic right in this section of `configure`:
   https://github.com/apache/arrow/pull/11001/files#diff-089697faebdb7820ca629a2bb316b878cc0ba18a5bfb0b60996f8dbcd1fa11e7L133-L140


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696991841



##########
File path: r/vignettes/install.Rmd
##########
@@ -303,10 +307,12 @@ By default, these are all unset. All boolean variables are case-insensitive.
   won't look for Arrow libraries on your system and instead will look to download/build them.
   Use this if you have a version mismatch between installed system libraries
   and the version of the R package you're installing.
-* `LIBARROW_DOWNLOAD`: Unless set to `false`, the build script
-  will attempt to download C++ binary or source bundles.
+* `TEST_OFFLINE_BUILD`: Unless set to `true`, the build script

Review comment:
       Should this env var also be prefixed? e.g. `ARROW_TEST_OFFLINE_BUILD`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912541351


   @github-actions crossbow submit test-r-offline-maximal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912595955


   Aaah, actually the original configuration was fine (though I've adjusted the "Dump test logs" step to always be run (so that it's easier to confirm without downloading the artifacts).
   
   This took a [bit of RTFM](https://rdrr.io/r/tools/testInstalledPackage.html), but the output of `tools::testInstalledPackage()` (run against arrow) is placed in `arrow-tests` of the working directory. So if the tests were not run, the output file would not exist or be blank, but we do see output from it showing that the tests are running. When this run finished I will (double) check that we're not seeing a bunch of skips for various features that are disabled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912612653


   Ok, this looks good. Though I'm seeing [that S3 was disabled](https://github.com/ursacomputing/crossbow/runs/3507264567?check_suite_focus=true#step:6:20), I'll make sure we've got the dependencies installed on the host so that we can catch that too.
   
   Here are the skips:
   ```
   ══ Skipped tests ═══════════════════════════════════════════════════════════════
   • ARROW-11090 (date/datetime arithmetic) (1)
   • ARROW-12632: ExecuteScalarExpression cannot Execute non-scalar expression (1)
   • ARROW-13364 (1)
   • ARROW-13691 - na.rm not yet implemented for VarianceOptions (2)
   • ARROW-13799: factor() should error but instead we get a string error message in its place (1)
   • Arrow C++ not built with s3 (4)
   • Flight server is not running (1)
   • Implement more aggressive implicit casting for scalars (ARROW-11402) (1)
   • Ingest_POSIXct only implemented for REALSXP (1)
   • Minio is not running (1)
   • Need halffloat support: https://issues.apache.org/jira/browse/ARROW-3802 (1)
   • Need to substitute in user defined function too (1)
   • RE2 does not support backreferences in pattern (https://github.com/google/re2/issues/101) (1)
   • Sorting by only a single timestamp column fails (ARROW-12087) (1)
   • TODO: (if anyone uses RangeEquals) (1)
   • Table with 0 cols doesn't know how many rows it should have (2)
   • These tests are flaking: https://github.com/duckdb/duckdb/issues/2100 (1)
   • This OS either does not support changing languages to fr or it caches translations (1)
   • Work around masking of data type functions (ARROW-12322) (1)
   • count() is not a generic so we have to get here through summarize() (1)
   • empty test (1)
   • environment variable ARROW_LARGE_MEMORY_TESTS (2)
   • https://issues.apache.org/jira/browse/ARROW-7653 (1)
   • packageVersion("stringr") > "1.4.0" is not TRUE (1)
   • test now faulty - code no longer gives error & outputs a empty tibble (1)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-908412461


   Ah, nevermind I see now that the rapidjson download is related to [ARROW-13768](https://issues.apache.org/jira/browse/ARROW-13768) which would need to be resolved before we can disable that totally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r701505602



##########
File path: dev/tasks/r/github.linux.offline.build.yml
##########
@@ -0,0 +1,112 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+# NOTE: must set "Crossbow" as name to have the badge links working in the
+# github comment reports!
+name: Crossbow
+
+on:
+  push
+
+jobs:
+  grab-dependencies:
+    name: "Download thirdparty dependencies"
+    runs-on: ubuntu-20.04
+    strategy:
+      fail-fast: false
+    env:
+      ARROW_R_DEV: "TRUE"
+      RSPM: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"
+    steps:
+      - name: Checkout Arrow
+        run: |
+          git clone --no-checkout {{ arrow.remote }} arrow
+          git -C arrow fetch -t {{ arrow.remote }} {{ arrow.branch }}
+          git -C arrow checkout FETCH_HEAD
+          git -C arrow submodule update --init --recursive
+      - name: Free Up Disk Space
+        shell: bash
+        run: arrow/ci/scripts/util_cleanup.sh
+      - name: Fetch Submodules and Tags
+        shell: bash
+        run: cd arrow && ci/scripts/util_checkout.sh
+      - uses: r-lib/actions/setup-r@v1
+      - name: Pull Arrow dependencies
+        run: |
+          cd arrow/r
+          # This is `make build`, but with no vignettes and not running `make doc`
+          cp ../NOTICE.txt inst/NOTICE.txt
+          rsync --archive --delete ../cpp tools/
+          cp -p ../.env tools/
+          cp -p ../NOTICE.txt tools/
+          cp -p ../LICENSE.txt tools/
+          R CMD build --no-build-vignettes --no-manual .
+          built_tar=$(ls -1 arrow*.tar.gz | head -n 1)
+          R -e "source('R/install-arrow.R'); create_package_with_all_dependencies(dest_file = 'arrow_with_deps.tar.gz', source_file = \"${built_tar}\")"
+        shell: bash
+      - name: Upload the third party dependency artifacts
+        uses: actions/upload-artifact@v2
+        with:
+          name: thirdparty_deps
+          path: arrow/r/arrow_with_deps.tar.gz
+
+  intall-offline:
+    name: "Install offline"
+    needs: [grab-dependencies]
+    runs-on: ubuntu-20.04
+    strategy:
+      fail-fast: false
+    env:
+      ARROW_R_DEV: "TRUE"
+      RSPM: "https://packagemanager.rstudio.com/cran/__linux__/focal/latest"
+    steps:
+      - name: Checkout Arrow
+        run: |
+          git clone --no-checkout {{ arrow.remote }} arrow
+          git -C arrow fetch -t {{ arrow.remote }} {{ arrow.branch }}
+          git -C arrow checkout FETCH_HEAD
+          git -C arrow submodule update --init --recursive
+      - uses: r-lib/actions/setup-r@v1
+      - name: Download artifacts
+        uses: actions/download-artifact@v2
+        with:
+          name: thirdparty_deps
+          path: arrow/r/arrow_with_deps.tar.gz

Review comment:
       I think the issue is `setwd(untar_dir)` should be `setwd(file.path(untar_dir, "arrow"))` in `create_package_with_all_dependencies`
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912593253


   @github-actions crossbow submit test-r-offline-maximal


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r696709923



##########
File path: r/tools/nixlibs.R
##########
@@ -209,75 +222,21 @@ find_available_binary <- function(os) {
   os
 }
 
-download_source <- function() {
-  tf1 <- tempfile()
-  src_dir <- tempfile()
-
-  # Given VERSION as x.y.z.p
-  p <- package_version(VERSION)[1, 4]
-  if (is.na(p) || p < 1000) {
-    # This is either just x.y.z or it has a small (R-only) patch version
-    # Download from the official Apache release, dropping the p
-    VERSION <- as.character(package_version(VERSION)[1, -4])
-    if (apache_download(VERSION, tf1)) {
-      untar(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/apache-arrow-", VERSION, "/cpp")
-    }
-  } else if (p != 9000) {
-    # This is a custom dev version (x.y.z.9999) or a nightly (x.y.z.20210505)
-    # (Don't try to download on the default dev .9000 version)
-    if (nightly_download(VERSION, tf1)) {
-      unzip(tf1, exdir = src_dir)
-      unlink(tf1)
-      src_dir <- paste0(src_dir, "/cpp")
-    }
-  }
-
-  if (dir.exists(src_dir)) {
-    cat("*** Successfully retrieved C++ source\n")
-    options(.arrow.cleanup = c(getOption(".arrow.cleanup"), src_dir))
-    # These scripts need to be executable
-    system(
-      sprintf("chmod 755 %s/build-support/*.sh", src_dir),
-      ignore.stdout = quietly, ignore.stderr = quietly
-    )
-    return(src_dir)
-  } else {
-    return(NULL)
-  }
-}
-
-nightly_download <- function(version, destfile) {
-  source_url <- paste0(arrow_repo, "src/arrow-", version, ".zip")
-  try_download(source_url, destfile)
-}
-
-apache_download <- function(version, destfile, n_mirrors = 3) {
-  apache_path <- paste0("arrow/arrow-", version, "/apache-arrow-", version, ".tar.gz")
-  apache_urls <- c(
-    # This returns a different mirror each time
-    rep("https://www.apache.org/dyn/closer.lua?action=download&filename=", n_mirrors),
-    "https://downloads.apache.org/" # The backup
+find_local_source <- function() {
+  # We'll take the first of these that exists
+  # The first case probably occurs if we're in the arrow git repo
+  # The second probably occurs if we're installing the arrow R package
+  cpp_dir_options <- c(
+    Sys.getenv("ARROW_SOURCE_HOME", ".."),
+    "tools/cpp"

Review comment:
       Since the other one doesn't have `cpp` in the path, this one shouldn't either. We could change it so that we expect the env var to point directly to the cpp dir instead of the apache/arrow top level; that's probably safe to do because I can't imagine anyone is using it, but technically it would be a breaking change.
   
   ```suggestion
       "tools"
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702201942



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,91 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create an install package with all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package.

Review comment:
       I think (hope) things would still work as expected with RStudio Package Manager. I specified `type = "source"` in `download.packages`, and it seems like RStudio Package Manager can give a standard source package. 
   
   I'm happy to document the RStudio Package Manager case, though I imagine most people would just install arrow, have the binary package just work, and not go digging for this function.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698569604



##########
File path: r/tools/nixlibs.R
##########
@@ -82,7 +91,7 @@ download_binary <- function(os = identify_os()) {
 # * `TRUE` (not case-sensitive), to try to discover your current OS, or
 # * some other string, presumably a related "distro-version" that has binaries
 #   built that work for your OS
-identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", Sys.getenv("LIBARROW_DOWNLOAD"))) {
+identify_os <- function(os = Sys.getenv("LIBARROW_BINARY", Sys.getenv("TEST_OFFLINE_BUILD"))) {

Review comment:
       Good catch. I think it should actually just look at `LIBARROW_BINARY`:
   
   ```r
   identify_os <- function(os = Sys.getenv("LIBARROW_BINARY")) {
     ...
   ```
   
   It's maybe worth noting that:
   * `identify_os` won't be called at all when `TEST_OFFLINE_BUILD` is `true` (but could be called if it was set to anything else)
   * At an earlier step, `configure` sets `LIBARROW_BINARY=true` if it was unset and `NOT_CRAN` is `true`
   
   https://github.com/apache/arrow/blob/5a13cbf81ee66172b63341d20acf51efc03d0c97/r/tools/nixlibs.R#L581-L583
   
   
   

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,64 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities

Review comment:
       Sure! Subheading seem like a great idea. Something like this?
   
   ```r
   #' ## Steps for an offline install with optional dependencies:
   #'
   #' ### On a computer with internet access:
   #' - Install the `arrow` package
   #' - Run this function
   #' - Copy the saved dependency files to the computer with internet access
   #'
   #' ### On the computer without internet access:
   #' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
   #'   points to the newly copied folder of dependency files.
   #' - Install the `arrow` package
   #' - Run [arrow_info()] to check installed capabilities
   ```
   

##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access

Review comment:
       Sounds good!

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Yes, unless you can think of a better way! As @jonkeane [pointed out](https://github.com/apache/arrow/pull/11001/#discussion_r698528943), it's possible to download those files from github, but protecting against version mismatch (between what's needed by `tools/cpp/` and what's listed in github's `versions.txt`) could be challenging.

##########
File path: r/tools/nixlibs.R
##########
@@ -320,33 +300,54 @@ build_libarrow <- function(src_dir, dst_dir) {
     BUILD_DIR = build_dir,
     DEST_DIR = dst_dir,
     CMAKE = cmake,
+    # EXTRA_CMAKE_FLAGS will often be "", but it's convenient later to have it defined

Review comment:
       Later in the code I have:
   https://github.com/apache/arrow/blob/98b5601f94ff0f0caf240c6e1b914d4e8e49f98e/r/tools/nixlibs.R#L447-L452
   
   If we don't add `EXTRA_CMAKE_FLAGS` to the vector, that section could instead be
   ```r
       # The syntax to turn off XSIMD is different.
       # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
       "EXTRA_CMAKE_FLAGS" = paste(
         Sys.getenv("EXTRA_CMAKE_FLAGS"),
         "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
       )
   ```
   
   I did it the first way to have the `EXTRA_CMAKE_FLAGS` collected at the same time as the other existing build flags, but if you think the second way is cleaner, I'm happy to change it.

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package
+#' - Run this function
+#' - Copy the saved dependency files to the computer with internet access
+#'
+#' ### On the computer without internet access:
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the newly copied folder of dependency files.
+#' - Install the `arrow` package
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir = NULL) {
+  # This script is copied over from arrow/cpp/... to arrow/r/inst/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  if (is.null(deps_dir) && Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR") != "") {
+    deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  }
+
+  dir.create(deps_dir, showWarnings = FALSE, recursive = TRUE)
+  # Run download_dependencies.sh
+  cat(paste0("*** Downloading optional dependencies to ", deps_dir, "\n"))
+  return_status <- system2(download_dependencies_sh,
+    args = deps_dir, stdout = FALSE, stderr = FALSE
+  )
+  if (isTRUE(return_status == 0)) {
+    cat(paste0(
+      "**** Set environment variable on offline machine and re-build arrow:\n",

Review comment:
       As I'm thinking about what to write, I feel like I'm just duplicating the help text. What about this message instead? (Or no message at all.)
   ```
   **** Download successful to <directory>
        See ?download_optional_dependencies for more details.
   ```

##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,68 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#' Defaults to the value of `ARROW_THIRDPARTY_DEPENDENCY_DIR`, if that
+#' environment variable is set.
+#'
+#' @return `deps_dir`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Do not put other files in this directory.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' ## Steps for an offline install with optional dependencies:
+#'
+#' ### On a computer with internet access:
+#' - Install the `arrow` package

Review comment:
       Yeah.... fair

##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,14 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `download_optional_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed. This function
+provides a way to download them in advance. Relevant environment variables are
+`ARROW_THIRDPARTY_DEPENDENCY_DIR` for the directory of downloaded dependencies
+and `TEST_OFFLINE_BUILD` to force the build process not to download.

Review comment:
       Should I also remove it from the summary at the end of this vignette? It seems helpful to mention it somewhere, but I could also move the comment to the Developing vignette.

##########
File path: dev/tasks/tasks.yml
##########
@@ -1033,6 +1033,14 @@ tasks:
       flags: '-e ARROW_SOURCE_HOME="/arrow" -e FORCE_BUNDLED_BUILD=TRUE -e LIBARROW_BUILD=TRUE -e ARROW_DEPENDENCY_SOURCE=SYSTEM'
       image: ubuntu-r-only-r
 
+  test-r-offline-minimal:
+      ci: azure
+      template: r/azure.linux.yml
+      params:
+        r_org: rocker
+        r_image: r-base
+        r_tag: latest
+        flags: '-e TEST_OFFLINE_BUILD=true'

Review comment:
       I put this one on azure because the one above was on azure, but feel free to change to a different platform.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909608550


   Tests are currently having errors because of this line:
   
   https://github.com/apache/arrow/blob/6daff455ad1e4c5ac4c84bda5711bdb5c30b6156/r/tools/nixlibs.R#L466
   
   That directory (`tools/cpp/thirdparty`) would exist if `make build` had been run. Any suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702899374



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,93 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create a source bundle that includes all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package from CRAN (or whatever you have set as the first in
+#' `getOption("repos")`)
+#' @return The full path to `dest_file`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'

Review comment:
       Sure, I'm a little bit less worried about those since they will either just work (on MacOS) or have other issues (windows). But adding them for completeness is good. I've re-arranged the wording a little bit to be more general part first ("if you've got access to full-featured binaries, use those!") and then the special note about RSPM/linux second (since that's the only platform where `type="source"` isn't respected)
   
   ```
   #' If you're using binary packages you shouldn't need to use this function. You 
   #' should download the appropriate binary from your package repository, transfer 
   #' that to the offline computer, and install that. Any OS can create the source 
   #' bundle, but it cannot be installed on Windows. (Instead, use a standard 
   #' Windows binary package.)
   #'
   #' Note if you're using RStudio Package Manager on Linux: If you still want to 
   #' make a source bundle with this function, make sure to set the first repo in 
   #' `options("repos")` to be a mirror that contains source packages (that is: 
   #' something other than the RSPM binary mirror URLs). 
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702205745



##########
File path: r/tools/nixlibs.R
##########
@@ -413,66 +423,129 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
-with_s3_support <- function(env_vars) {
-  arrow_s3 <- toupper(Sys.getenv("ARROW_S3")) == "ON" || tolower(Sys.getenv("LIBARROW_MINIMAL")) == "false"
+turn_off_thirdparty_features <- function(env_var_list) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC" = "OFF",
+    "ARROW_JEMALLOC" = "OFF",
+    "ARROW_PARQUET" = "OFF", # depends on thrift
+    "ARROW_DATASET" = "OFF", # depends on parquet
+    "ARROW_S3" = "OFF",
+    "ARROW_WITH_BROTLI" = "OFF",
+    "ARROW_WITH_BZ2" = "OFF",
+    "ARROW_WITH_LZ4" = "OFF",
+    "ARROW_WITH_SNAPPY" = "OFF",
+    "ARROW_WITH_ZLIB" = "OFF",
+    "ARROW_WITH_ZSTD" = "OFF",
+    "ARROW_WITH_RE2" = "OFF",
+    "ARROW_WITH_UTF8PROC" = "OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON" = "OFF",
+    # The syntax to turn off XSIMD is different.
+    # Pull existing value of EXTRA_CMAKE_FLAGS first (must be defined)
+    "EXTRA_CMAKE_FLAGS" = paste(
+      env_var_list[["EXTRA_CMAKE_FLAGS"]],
+      "-DARROW_SIMD_LEVEL=NONE -DARROW_RUNTIME_SIMD_LEVEL=NONE"
+    )
+  )
+  # Create a new env_var_list, with the values of turn_off set.
+  # replace() also adds new values if they didn't exist before
+  replace(env_var_list, names(turn_off), turn_off)
+}
+
+set_thirdparty_urls <- function(env_var_list) {
+  # This function does *not* check if existing *_SOURCE_URL variables are set.
+  # The directory tools/thirdparty_dependencies is created by
+  # create_package_with_all_dependencies() and saved in the tar file.
+  files <- list.files(thirdparty_dependency_dir, full.names = FALSE)
+  url_env_varname <- toupper(sub("(.*?)-.*", "ARROW_\\1_URL", files))
+  # Special handling for the aws dependencies, which have extra `-`
+  aws <- grepl("^aws", files)
+  url_env_varname[aws] <- sub(
+    "AWS_SDK_CPP", "AWSSDK",
+    gsub(
+      "-", "_",
+      sub(
+        "(AWS.*)-.*", "ARROW_\\1_URL",
+        toupper(files[aws])
+      )
+    )
+  )
+  full_filenames <- file.path(normalizePath(thirdparty_dependency_dir), files)
+
+  env_var_list <- replace(env_var_list, url_env_varname, full_filenames)
+  if (!quietly) {
+    env_var_list <- replace(env_var_list, "ARROW_VERBOSE_THIRDPARTY_BUILD", "ON")
+  }
+  env_var_list
+}
+
+with_mimalloc <- function(env_var_list) {
+  arrow_mimalloc <- env_is("ARROW_MIMALLOC", "on") || env_is("LIBARROW_MINIMAL", "false")
+  if (arrow_mimalloc) {

Review comment:
       I added a small helper, `is_feature_requested()`, to do this consistently




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson edited a comment on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson edited a comment on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-909640938


   > Tests are currently having errors because of this line:
   > 
   > https://github.com/apache/arrow/blob/6daff455ad1e4c5ac4c84bda5711bdb5c30b6156/r/tools/nixlibs.R#L466
   > 
   > That directory (`tools/cpp/thirdparty`) would exist if `make build` had been run. Any suggestions?
   
   I can investigate later, though that wouldn't explain why the windows builds are failing since that script doesn't get called there
   
   edit: are you sure you want that? the next line checks if (!dir.exists()) and returns


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r702305430



##########
File path: r/R/install-arrow.R
##########
@@ -137,3 +136,93 @@ reload_arrow <- function() {
     message("Please restart R to use the 'arrow' package.")
   }
 }
+
+
+#' Create a source bundle that includes all thirdparty dependencies
+#'
+#' @param dest_file File path for the new tar.gz package. Defaults to
+#' `arrow_V.V.V_with_deps.tar.gz` in the current directory (`V.V.V` is the version)
+#' @param source_file File path for the input tar.gz package. Defaults to
+#' downloading the package from CRAN (or whatever you have set as the first in
+#' `getOption("repos")`)
+#' @return The full path to `dest_file`, invisibly
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download the required dependencies for you.
+#' These downloaded dependencies are only used in the build if
+#' `ARROW_DEPENDENCY_SOURCE` is unset, `BUNDLED`, or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'

Review comment:
       ```suggestion
   #'
   #' Note: If you're using RStudio Package Manager to download binary packages on
   #' linux you shouldn't need to use this function. You can download the appropriate
   #' binary from RStudio Package Manager, and transfer that to the offline computer.
   #' If you still do want to make a source bundle with this function, make sure to
   #' set the first repo in `options("repos")` to be a mirror that contains source
   #' packages (that is: something other than the RStudio Package Manager binary 
   #' mirror URLs). 
   #'
   ```
   
   How about adding this to the docs. We could do something like sniff to see if `libarrow.so` is in the package downloaded, but that is fragile + this is a pretty unique workflow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw edited a comment on pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw edited a comment on pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#issuecomment-912747988


   Thanks! I learned a bunch doing this.
   
   I had a couple minor questions, following up on comments from @nealrichardson:
   
   1. Should I swap the argument order for `create_package_with_all_dependencies`?
   2. Should `create_package_with_all_dependencies` check `ARROW_THIRDPARTY_DEPENDENCY_DIR`?
   3. Before this is finalized, do you want to change any of the new names I've made up? (edit)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r697443624



##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access

Review comment:
       I'd put this function in `install-arrow.R` and then recommend that you can just source that file (like we note for `install_arrow()`), no installation required.

##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir) {
+  # This script is copied over from arrow/cpp/... to arrow/r/tools/cpp/...

Review comment:
       ```suggestion
     # This script is copied over from arrow/cpp/... to arrow/r/inst/...
   ```

##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir) {
+  # This script is copied over from arrow/cpp/... to arrow/r/tools/cpp/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  # Make sure the directory is sort of reasonable before creating it
+  deps_dir <- trimws(deps_dir)
+  stopifnot(nchar(deps_dir) >= 1)

Review comment:
       I don't think you need this: `dir.create()` seems to validate enough: 
   
   ```
   > dir.create(4)
   Error in dir.create(4) : invalid 'path' argument
   > dir.create(NULL)
   Error in dir.create(NULL) : invalid 'path' argument
   > dir.create(c("a", "b"))
   Error in dir.create(c("a", "b")) : invalid 'path' argument
   ```
   
   ```suggestion
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -329,24 +290,34 @@ build_libarrow <- function(src_dir, dst_dir) {
   env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
   env_vars <- with_s3_support(env_vars)
   env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
+  # turn_off_thirdparty_features() needs to happen after with_mimalloc() and
+  # with_s3_support(), since those might turn features ON.
+  thirdparty_deps_unavailable <- !download_ok &&
+    !dir.exists(Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")) &&
+    !env_is("ARROW_DEPENDENCY_SOURCE", "system")
+  if (thirdparty_deps_unavailable || is_solaris()) {
+    # Note that JSON support does work on Solaris, but will be turned off with
+    # the rest of the thirdparty dependencies (when ARROW-13768 is resolved and
+    # JSON can be turned off at all). All other dependencies don't compile
+    # (e.g thrift, jemalloc, and xsimd) or do compile but `ar` fails to build
+    # libarrow_bundled_dependencies (e.g. re2 and utf8proc).
+    env_vars <- turn_off_thirdparty_features(env_vars)

Review comment:
       How about adding a message pointing the user to how to handle thirdparty deps if offline?
   
   ```suggestion
     if (thirdparty_deps_unavailable || is_solaris()) {
       # Note that JSON support does work on Solaris, but will be turned off with
       # the rest of the thirdparty dependencies (when ARROW-13768 is resolved and
       # JSON can be turned off at all). All other dependencies don't compile
       # (e.g thrift, jemalloc, and xsimd) or do compile but `ar` fails to build
       # libarrow_bundled_dependencies (e.g. re2 and utf8proc).
       env_vars <- turn_off_thirdparty_features(env_vars)
     } else if (thirdparty_deps_unavailable) {
       cat("*** Something something we're offline so building without many deps/features; see vignette\n")
       env_vars <- turn_off_thirdparty_features(env_vars)
     }
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -329,24 +290,34 @@ build_libarrow <- function(src_dir, dst_dir) {
   env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
   env_vars <- with_s3_support(env_vars)
   env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
+  # turn_off_thirdparty_features() needs to happen after with_mimalloc() and
+  # with_s3_support(), since those might turn features ON.
+  thirdparty_deps_unavailable <- !download_ok &&
+    !dir.exists(Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")) &&
+    !env_is("ARROW_DEPENDENCY_SOURCE", "system")
+  if (thirdparty_deps_unavailable || is_solaris()) {
+    # Note that JSON support does work on Solaris, but will be turned off with
+    # the rest of the thirdparty dependencies (when ARROW-13768 is resolved and
+    # JSON can be turned off at all). All other dependencies don't compile
+    # (e.g thrift, jemalloc, and xsimd) or do compile but `ar` fails to build
+    # libarrow_bundled_dependencies (e.g. re2 and utf8proc).
+    env_vars <- turn_off_thirdparty_features(env_vars)
   }
+  # If $ARROW_THIRDPARTY_DEPENDENCY_DIR has files, add their *_SOURCE_URL env vars
+  env_vars <- set_thirdparty_urls(env_vars)
+
   cat("**** arrow", ifelse(quietly, "", paste("with", env_vars)), "\n")
   status <- suppressWarnings(system(
     paste(env_vars, "inst/build_arrow_static.sh"),
     ignore.stdout = quietly, ignore.stderr = quietly
   ))
   if (status != 0) {
     # It failed :(
-    cat("**** Error building Arrow C++. Re-run with ARROW_R_DEV=true for debug information.\n")
+    cat(

Review comment:
       Good call 👍 

##########
File path: r/tools/nixlibs.R
##########
@@ -413,10 +392,114 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
+turn_off_thirdparty_features <- function(env_vars) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC=OFF",
+    "ARROW_JEMALLOC=OFF",
+    "ARROW_PARQUET=OFF", # depends on thrift
+    "ARROW_DATASET=OFF", # depends on parquet
+    "ARROW_S3=OFF",
+    "ARROW_WITH_BROTLI=OFF",
+    "ARROW_WITH_BZ2=OFF",
+    "ARROW_WITH_LZ4=OFF",
+    "ARROW_WITH_SNAPPY=OFF",
+    "ARROW_WITH_ZLIB=OFF",
+    "ARROW_WITH_ZSTD=OFF",
+    "ARROW_WITH_RE2=OFF",
+    "ARROW_WITH_UTF8PROC=OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON=OFF",
+    # The syntax to turn off XSIMD is different.
+    'EXTRA_CMAKE_FLAGS="-DARROW_SIMD_LEVEL=NONE"'
+  )
+  if (Sys.getenv("EXTRA_CMAKE_FLAGS") != "") {
+    # Error rather than overwriting EXTRA_CMAKE_FLAGS
+    # (Correctly inserting the flag into an existing quoted string is tricky)
+    stop("Sorry, setting EXTRA_CMAKE_FLAGS is not supported at this time.")
+  }
+  paste(env_vars, paste(turn_off, collapse = " "))
+}
+
+set_thirdparty_urls <- function(env_vars) {
+  deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  files <- list.files(deps_dir, full.names = FALSE)
+  if (length(files) == 0) {
+    # This will be true if the variable is unset, if it's set but the directory
+    # doesn't exist, or if it exists but is empty.
+    return(env_vars)
+  }
+  dep_names <- c(
+    "absl", # not used; seems to be a dependency of gRPC
+    "aws-sdk-cpp",
+    "aws-checksums",
+    "aws-c-common",
+    "aws-c-event-stream",
+    "boost",
+    "brotli",
+    "bzip2",
+    "cares", # not used; "a dependency of gRPC"
+    "gbenchmark", # not used; "Google benchmark, for testing"
+    "gflags", # not used; "for command line utilities (formerly Googleflags)"
+    "glog", # not used; "for logging"
+    "grpc", # not used; "for remote procedure calls"
+    "gtest", # not used; "Googletest, for testing"
+    "jemalloc",
+    "lz4",
+    "mimalloc",
+    "orc", # not used; "for Apache ORC format support"
+    "protobuf", # not used; "Google Protocol Buffers, for data serialization"
+    "rapidjson",
+    "re2",
+    "snappy",
+    "thrift",
+    "utf8proc",
+    "xsimd",
+    "zlib",
+    "zstd"
+  )
+  dep_regex <- paste0("^(", paste(dep_names, collapse = "|"), ").*")
+  # If there were extra files in the folder (not matching our regex) drop them.
+  files <- files[grepl(dep_regex, files, perl = TRUE)]
+  # Convert e.g. "thrift-0.13.0.tar.gz" to ARROW_THRIFT_URL
+  # Note that if there's no file called thrift*, we won't add
+  # ARROW_THRIFT_URL to env_vars.
+  url_env_varname <- sub(dep_regex, "ARROW_\\1_URL", files, perl = TRUE)
+  url_env_varname <- toupper(gsub("-", "_", url_env_varname, fixed = TRUE))
+  # Special case: ARROW_AWSSDK_URL for aws-sdk-cpp-<version>.tar.gz
+  url_env_varname <- sub("ARROW_AWS_SDK_CPP_URL", "ARROW_AWSSDK_URL", url_env_varname, fixed = TRUE)
+  if (anyDuplicated(url_env_varname)) {
+    warning("Unexpected files in ", deps_dir,
+      "\nDo you have multiple copies of a dependency?",
+      .call = FALSE
+    )
+    return(env_vars)
+  }

Review comment:
       This is a gnarly system of regexes but it works and doesn't require hard-coding the list of dependencies, which I think would lead to issues in the future when versions.txt changes.
   
   ```suggestion
     url_env_varname <- toupper(sub("(.*?)-.*", "ARROW_\\1_URL", files))
     # Special handling for the aws dependencies
     aws <- grepl("^aws", files)
     url_env_varname[aws] <- sub("AWS_SDK_CPP", "AWSSDK", 
       gsub("-", "_", 
         sub("(AWS.*)-.*", "ARROW_\\1_URL", 
           toupper(files[aws])
         )
       )
     )
   ```
   
   I tested this by doing 
   
   ```
   source versions.txt && echo $DEPENDENCIES > tmp.txt
   ```
   
   (newlines and spaces were mangled if I tried to use `system()` from R)
   
   and then in R:
   
   ```
   versions <- matrix(unlist(strsplit(readLines("tmp.txt"), " ")), ncol=3, byrow=TRUE)
   files <- versions[,2]
   urls <- versions[,1]
   ...
   identical(url_env_varname, urls)
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -329,24 +290,34 @@ build_libarrow <- function(src_dir, dst_dir) {
   env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
   env_vars <- with_s3_support(env_vars)
   env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
+  # turn_off_thirdparty_features() needs to happen after with_mimalloc() and
+  # with_s3_support(), since those might turn features ON.
+  thirdparty_deps_unavailable <- !download_ok &&
+    !dir.exists(Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")) &&
+    !env_is("ARROW_DEPENDENCY_SOURCE", "system")
+  if (thirdparty_deps_unavailable || is_solaris()) {
+    # Note that JSON support does work on Solaris, but will be turned off with
+    # the rest of the thirdparty dependencies (when ARROW-13768 is resolved and
+    # JSON can be turned off at all). All other dependencies don't compile
+    # (e.g thrift, jemalloc, and xsimd) or do compile but `ar` fails to build
+    # libarrow_bundled_dependencies (e.g. re2 and utf8proc).
+    env_vars <- turn_off_thirdparty_features(env_vars)
   }
+  # If $ARROW_THIRDPARTY_DEPENDENCY_DIR has files, add their *_SOURCE_URL env vars
+  env_vars <- set_thirdparty_urls(env_vars)

Review comment:
       Maybe put this inside an `else` block, just for readability

##########
File path: r/R/util.R
##########
@@ -183,3 +183,63 @@ repeat_value_as_array <- function(object, n) {
   }
   return(Scalar$create(object)$as_array(n))
 }
+
+
+#' Download all optional Arrow dependencies
+#'
+#' @param deps_dir Directory to save files into. Will be created if necessary.
+#'
+#' @return TRUE/FALSE for whether the downloads were successful
+#'
+#' This function is used for setting up an offline build. If it's possible to
+#' download at build time, don't use this function. Instead, let `cmake`
+#' download them for you.
+#' If the files already exist in `deps_dir`, they will be re-downloaded and
+#' overwritten. Other files are not changed.
+#' These saved files are only used in the build if `ARROW_DEPENDENCY_SOURCE`
+#' is `BUNDLED` or `AUTO`.
+#' https://arrow.apache.org/docs/developers/cpp/building.html#offline-builds
+#'
+#' Steps for an offline install with optional dependencies:
+#' - Install the `arrow` package on a computer with internet access
+#' - Run this function
+#' - Copy the saved dependency files to a computer without internet access
+#' - Create a environment variable called `ARROW_THIRDPARTY_DEPENDENCY_DIR` that
+#'   points to the folder.
+#' - Install the `arrow` package on the computer without internet access
+#' - Run [arrow_info()] to check installed capabilities
+#'
+#' @examples
+#' \dontrun{
+#' download_optional_dependencies("arrow-thirdparty")
+#' list.files("arrow-thirdparty", "thrift-*") # "thrift-0.13.0.tar.gz" or similar
+#' }
+#' @export
+download_optional_dependencies <- function(deps_dir) {
+  # This script is copied over from arrow/cpp/... to arrow/r/tools/cpp/...
+  download_dependencies_sh <- system.file(
+    "thirdparty/download_dependencies.sh",
+    package = "arrow",
+    mustWork = TRUE
+  )
+  # Make sure the directory is sort of reasonable before creating it
+  deps_dir <- trimws(deps_dir)
+  stopifnot(nchar(deps_dir) >= 1)
+  dir.create(deps_dir, showWarnings = FALSE, recursive = TRUE)
+
+  # Run download_dependencies.sh
+  cat(paste0("*** Downloading optional dependencies to ", deps_dir, "\n"))
+  return_status <- system2(download_dependencies_sh,
+    args = deps_dir, stdout = FALSE, stderr = FALSE
+  )
+  download_successful <- isTRUE(return_status == 0)
+  if (download_successful) {
+    cat(paste0(
+      "**** Set environment variable on offline machine and re-build arrow:\n",
+      "export ARROW_THIRDPARTY_DEPENDENCY_DIR=<downloaded directory>\n"
+    ))
+  } else {
+    warning("Failed to download optional dependencies")

Review comment:
       ```suggestion
       stop("Failed to download optional dependencies", call. = FALSE)
   ```

##########
File path: r/tests/testthat/test-install-arrow.R
##########
@@ -37,3 +37,20 @@ r_only({
     })
   })
 })
+
+
+r_only({
+  test_that("download_optional_dependencies", {
+    skip_if_offline()
+    deps_dir <- tempfile()
+    download_successful <- expect_output(
+      download_optional_dependencies(deps_dir),
+      "export ARROW_THRIFT_URL"

Review comment:
       I think this test is outdated. And actually I think we should delete it: we'll test it in the offline build CI job. I don't want to download all of these files every time I run the test suite.

##########
File path: r/vignettes/install.Rmd
##########
@@ -303,10 +307,12 @@ By default, these are all unset. All boolean variables are case-insensitive.
   won't look for Arrow libraries on your system and instead will look to download/build them.
   Use this if you have a version mismatch between installed system libraries
   and the version of the R package you're installing.
-* `LIBARROW_DOWNLOAD`: Unless set to `false`, the build script
-  will attempt to download C++ binary or source bundles.
+* `TEST_OFFLINE_BUILD`: Unless set to `true`, the build script

Review comment:
       🤷 perhaps so; I don't expect anyone to use it other than us in testing

##########
File path: r/tools/nixlibs.R
##########
@@ -329,24 +290,34 @@ build_libarrow <- function(src_dir, dst_dir) {
   env_vars <- paste0(names(env_var_list), '="', env_var_list, '"', collapse = " ")
   env_vars <- with_s3_support(env_vars)
   env_vars <- with_mimalloc(env_vars)
-  if (tolower(Sys.info()[["sysname"]]) %in% "sunos") {
-    # jemalloc doesn't seem to build on Solaris
-    # nor does thrift, so turn off parquet,
-    # and arrowExports.cpp requires parquet for dataset (ARROW-11994), so turn that off
-    # xsimd doesn't compile, so set SIMD level to NONE to skip it
-    # re2 and utf8proc do compile,
-    # but `ar` fails to build libarrow_bundled_dependencies, so turn them off
-    # so that there are no bundled deps
-    env_vars <- paste(env_vars, "ARROW_JEMALLOC=OFF ARROW_PARQUET=OFF ARROW_DATASET=OFF ARROW_WITH_RE2=OFF ARROW_WITH_UTF8PROC=OFF EXTRA_CMAKE_FLAGS=-DARROW_SIMD_LEVEL=NONE")
+  # turn_off_thirdparty_features() needs to happen after with_mimalloc() and
+  # with_s3_support(), since those might turn features ON.

Review comment:
       We could have those check download_ok too. Also worth considering if it would be easier to work with `env_var_list` throughout here and only paste to make `env_vars` when calling `system()` (that's scope creep but just pointing it out since you're fighting against it here and it might be more natural to carry around a list that you can update rather than opaque strings)

##########
File path: r/tools/nixlibs.R
##########
@@ -501,12 +572,10 @@ if (!file.exists(paste0(dst_dir, "/include/arrow/api.h"))) {
     unlink(bin_file)
   } else if (build_ok) {
     # (2) Find source and build it
-    if (download_ok) {
+    src_dir <- find_local_source()
+    if (is.null(src_dir) && download_ok) {
       src_dir <- download_source()
     }

Review comment:
       ```suggestion
   ```

##########
File path: r/tools/nixlibs.R
##########
@@ -413,10 +392,114 @@ cmake_version <- function(cmd = "cmake") {
   )
 }
 
+turn_off_thirdparty_features <- function(env_vars) {
+  # Because these are done as environment variables (as opposed to build flags),
+  # setting these to "OFF" overrides any previous setting. We don't need to
+  # check the existing value.
+  turn_off <- c(
+    "ARROW_MIMALLOC=OFF",
+    "ARROW_JEMALLOC=OFF",
+    "ARROW_PARQUET=OFF", # depends on thrift
+    "ARROW_DATASET=OFF", # depends on parquet
+    "ARROW_S3=OFF",
+    "ARROW_WITH_BROTLI=OFF",
+    "ARROW_WITH_BZ2=OFF",
+    "ARROW_WITH_LZ4=OFF",
+    "ARROW_WITH_SNAPPY=OFF",
+    "ARROW_WITH_ZLIB=OFF",
+    "ARROW_WITH_ZSTD=OFF",
+    "ARROW_WITH_RE2=OFF",
+    "ARROW_WITH_UTF8PROC=OFF",
+    # NOTE: this code sets the environment variable ARROW_JSON to "OFF", but
+    # that setting is will *not* be honored by build_arrow_static.sh until
+    # ARROW-13768 is resolved.
+    "ARROW_JSON=OFF",
+    # The syntax to turn off XSIMD is different.
+    'EXTRA_CMAKE_FLAGS="-DARROW_SIMD_LEVEL=NONE"'
+  )
+  if (Sys.getenv("EXTRA_CMAKE_FLAGS") != "") {
+    # Error rather than overwriting EXTRA_CMAKE_FLAGS
+    # (Correctly inserting the flag into an existing quoted string is tricky)
+    stop("Sorry, setting EXTRA_CMAKE_FLAGS is not supported at this time.")
+  }
+  paste(env_vars, paste(turn_off, collapse = " "))
+}
+
+set_thirdparty_urls <- function(env_vars) {
+  deps_dir <- Sys.getenv("ARROW_THIRDPARTY_DEPENDENCY_DIR")
+  files <- list.files(deps_dir, full.names = FALSE)
+  if (length(files) == 0) {
+    # This will be true if the variable is unset, if it's set but the directory

Review comment:
       Should we error explicitly if the variable is set but the dir doesn't exist or is empty?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] karldw commented on a change in pull request #11001: ARROW-12981: [R] Install source package from CRAN alone

Posted by GitBox <gi...@apache.org>.
karldw commented on a change in pull request #11001:
URL: https://github.com/apache/arrow/pull/11001#discussion_r698790189



##########
File path: r/vignettes/install.Rmd
##########
@@ -102,6 +102,14 @@ satisfy C++ dependencies.
 
 > Note that, unlike packages like `tensorflow`, `blogdown`, and others that require external dependencies, you do not need to run `install_arrow()` after a successful `arrow` installation.
 
+The `install-arrow.R` file also includes the `download_optional_dependencies()`
+function. Normally, when installing on a computer with internet access, the
+build process will download third-party dependencies as needed. This function
+provides a way to download them in advance. Relevant environment variables are
+`ARROW_THIRDPARTY_DEPENDENCY_DIR` for the directory of downloaded dependencies
+and `TEST_OFFLINE_BUILD` to force the build process not to download.

Review comment:
       Should I also remove it from the summary at the end of this vignette? It seems helpful to mention it somewhere, but I could also move the comment to the Developing vignette.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org