You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/20 17:54:12 UTC

[GitHub] [arrow] wjones127 opened a new pull request, #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

wjones127 opened a new pull request, #13542:
URL: https://github.com/apache/arrow/pull/13542

   This PR:
   
    * Moves minio integration tests into a generic suite that is now run on minio (S3 emulator) and GCS testbench (GCS emulator). This is run in CI.
    * Sets the default retry timeout to 15 seconds to mitigate issue described by ARROW-17020.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r975147068


##########
.github/workflows/r.yml:
##########
@@ -275,6 +275,19 @@ jobs:
           working-directory: 'r'
           extra-packages: |
             any::rcmdcheck
+      - name: Install MinIO
+        shell: bash
+        run: |
+          mkdir -p "$HOME/.local/bin"
+          curl \
+            --output "$HOME/.local/bin/minio.exe" \
+            https://dl.min.io/server/minio/release/windows-amd64/archive/minio.RELEASE.2022-05-26T05-48-41Z
+          chmod +x "$HOME/.local/bin/minio.exe"
+          echo "$HOME/.local/bin" >> $GITHUB_PATH
+      # TODO: figure out why the GCS tests are hanging
+      # - name: Install Google Cloud Storage Testbench
+      #   shell: bash
+      #   run: ci/scripts/install_gcs_testbench.sh default

Review Comment:
   Is this something you're trying to solve in this PR?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r926934595


##########
r/tests/testthat/test-gcs.R:
##########
@@ -58,3 +58,53 @@ test_that("GcsFileSystem$create() input validation", {
     'Invalid options for GcsFileSystem: "role_arn"'
   )
 })
+
+if (system('python -c "import testbench"') == 0) {
+  testbench_port <- Sys.getenv("TESTBENCH_PORT", "9001")
+
+  pid_minio <- sys::exec_background("python", c("-m", "testbench", "--port", testbench_port),
+    std_out = FALSE,
+    std_err = FALSE # TODO: is there a good place to send output?
+  )
+  withr::defer(tools::pskill(pid_minio))

Review Comment:
   Understood! Use it!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r918552313


##########
r/tests/testthat/helper-filesystems.R:
##########
@@ -0,0 +1,177 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+test_filesystem <- function(name, fs, path_formatter, uri_formatter) {
+  test_that(sprintf("read/write Feather on %s", name), {
+    write_feather(example_data, uri_formatter("test.feather"))
+    expect_identical(read_feather(uri_formatter("test.feather")), example_data)
+  })
+
+  test_that("read/write Feather by filesystem, not URI", {

Review Comment:
   TODO: we need to name the filesystem in each test so we can differentiate test runs in CI output.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r926021096


##########
r/tests/testthat/test-gcs.R:
##########
@@ -17,6 +17,8 @@
 
 skip_if_not_available("gcs")
 
+source_file("helper-filesystems.R")

Review Comment:
   You are correct



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13542:
URL: https://github.com/apache/arrow/pull/13542#issuecomment-1177972468

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 closed pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
wjones127 closed pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench
URL: https://github.com/apache/arrow/pull/13542


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13542:
URL: https://github.com/apache/arrow/pull/13542#issuecomment-1177972435

   https://issues.apache.org/jira/browse/ARROW-16879


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
wjones127 commented on PR #13542:
URL: https://github.com/apache/arrow/pull/13542#issuecomment-1181285460

   I've included the change set from https://github.com/apache/arrow/pull/13577. The C++ changes will go away when I rebase after merging that PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r975472686


##########
.github/workflows/r.yml:
##########
@@ -275,6 +275,19 @@ jobs:
           working-directory: 'r'
           extra-packages: |
             any::rcmdcheck
+      - name: Install MinIO
+        shell: bash
+        run: |
+          mkdir -p "$HOME/.local/bin"
+          curl \
+            --output "$HOME/.local/bin/minio.exe" \
+            https://dl.min.io/server/minio/release/windows-amd64/archive/minio.RELEASE.2022-05-26T05-48-41Z
+          chmod +x "$HOME/.local/bin/minio.exe"
+          echo "$HOME/.local/bin" >> $GITHUB_PATH
+      # TODO: figure out why the GCS tests are hanging
+      # - name: Install Google Cloud Storage Testbench
+      #   shell: bash
+      #   run: ci/scripts/install_gcs_testbench.sh default

Review Comment:
   Ah I had created this earlier https://issues.apache.org/jira/browse/ARROW-17149
   
   ```suggestion
         # TODO(ARROW-17149): figure out why the GCS tests are hanging on Windows
         # - name: Install Google Cloud Storage Testbench
         #   shell: bash
         #   run: ci/scripts/install_gcs_testbench.sh default
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r926015904


##########
r/tests/testthat/test-gcs.R:
##########
@@ -17,6 +17,8 @@
 
 skip_if_not_available("gcs")
 
+source_file("helper-filesystems.R")

Review Comment:
   For some reason I found I had to add it. Let me try removing it again and see what happens.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r926921250


##########
r/tests/testthat/test-gcs.R:
##########
@@ -58,3 +58,53 @@ test_that("GcsFileSystem$create() input validation", {
     'Invalid options for GcsFileSystem: "role_arn"'
   )
 })
+
+if (system('python -c "import testbench"') == 0) {
+  testbench_port <- Sys.getenv("TESTBENCH_PORT", "9001")
+
+  pid_minio <- sys::exec_background("python", c("-m", "testbench", "--port", testbench_port),
+    std_out = FALSE,
+    std_err = FALSE # TODO: is there a good place to send output?
+  )
+  withr::defer(tools::pskill(pid_minio))

Review Comment:
   I originally wrote with `on.exit()`, but I found that it was not successfully killing the minio process at the end. `withr::defer()` did, so I went with that. I wish I understood why.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] nealrichardson commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r926014084


##########
r/tests/testthat/test-gcs.R:
##########
@@ -17,6 +17,8 @@
 
 skip_if_not_available("gcs")
 
+source_file("helper-filesystems.R")

Review Comment:
   Aren't all helper files loaded by testthat anyway? (i.e. you shouldn't need this)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r926753854


##########
r/tests/testthat/helper-filesystems.R:
##########
@@ -0,0 +1,188 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+#' Run standard suite of integration tests for a filesystem
+#'
+#' @param name Name of filesystem to be printed in test name
+#' @param fs A `FileSystem` instance to test with
+#' @param path_formatter A function that takes a sequence of path segments and
+#' returns a absolute path.
+#' @param uri_formatter A function that takes a sequence of path segments and
+#' returns a URI containing the filesystem scheme (e.g. 's3://', 'gs://'), the
+#' absolute path, and any necessary connection options as URL query parameters.
+test_filesystem <- function(name, fs, path_formatter, uri_formatter) {
+  # NOTE: it's important that we label these tests with name of filesystem so
+  # that we can differentiate the different calls to these test in the output.
+  test_that(sprintf("read/write Feather on %s using URIs", name), {
+    write_feather(example_data, uri_formatter("test.feather"))
+    expect_identical(read_feather(uri_formatter("test.feather")), example_data)
+  })
+
+  test_that(sprintf("read/write Feather on %s using Filesystem", name), {
+    write_feather(example_data, fs$path(path_formatter("test2.feather")))
+    expect_identical(
+      read_feather(fs$path(path_formatter("test2.feather"))),
+      example_data
+    )
+  })
+
+  library(dplyr)

Review Comment:
   I get why you do this, but it's very weird to `library()` anything within a function because it modifies the global state. I would just check that the namespace has already been attached  (`if (!("package:dplyr" %in% search()) abort("library(dplyr) required for test_filesystem()")` should do it).



##########
r/tests/testthat/test-gcs.R:
##########
@@ -58,3 +58,53 @@ test_that("GcsFileSystem$create() input validation", {
     'Invalid options for GcsFileSystem: "role_arn"'
   )
 })
+
+if (system('python -c "import testbench"') == 0) {
+  testbench_port <- Sys.getenv("TESTBENCH_PORT", "9001")
+
+  pid_minio <- sys::exec_background("python", c("-m", "testbench", "--port", testbench_port),
+    std_out = FALSE,
+    std_err = FALSE # TODO: is there a good place to send output?
+  )
+  withr::defer(tools::pskill(pid_minio))

Review Comment:
   Do we use `withr::defer()` anywhere else? We tend to use `on.exit(..., add = TRUE)` elsewhere (if we really need `withr::defer()` then go for it, I just don't know what it does).



##########
r/tests/testthat/test-gcs.R:
##########
@@ -58,3 +58,53 @@ test_that("GcsFileSystem$create() input validation", {
     'Invalid options for GcsFileSystem: "role_arn"'
   )
 })
+
+if (system('python -c "import testbench"') == 0) {

Review Comment:
   Can you do
   
   ```
   skip_on_cran()
   skip_if_not(system('python -c "import testbench"') == 0)
   
   # no if/else needed
   ```
   
   (A bit more idiomatic an will give you a decent skip message)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
wjones127 commented on code in PR #13542:
URL: https://github.com/apache/arrow/pull/13542#discussion_r918547853


##########
ci/scripts/r_test.sh:
##########
@@ -100,6 +100,12 @@ SCRIPT="as_cran <- !identical(tolower(Sys.getenv('NOT_CRAN')), 'true')
       pid_minio <- sys::exec_background('minio', c('server', minio_dir))
       on.exit(tools::pskill(pid_minio), add = TRUE)
     }
+
+    if (requireNamespace('reticulate', quietly = TRUE) && reticulate::py_module_available('testbench')) {
+      message('Running testbench for GCS tests (if build supports them)')
+      pid_minio <- sys::exec_background('python', c('-m', 'testbench', '--port', '9001'))
+      on.exit(tools::pskill(pid_minio), add = TRUE)

Review Comment:
   ```suggestion
         pid_testbench <- sys::exec_background('python', c('-m', 'testbench', '--port', '9001'))
         on.exit(tools::pskill(pid_testbench), add = TRUE)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on pull request #13542: ARROW-16879: [R][CI] Test R GCS bindings with testbench

Posted by GitBox <gi...@apache.org>.
wjones127 commented on PR #13542:
URL: https://github.com/apache/arrow/pull/13542#issuecomment-1180801174

   Tests seem to be failing right now because of a difference in how GCS and S3 handle paths:
   
   ``` r
   library(arrow)
   #> 
   #> Attaching package: 'arrow'
   #> The following object is masked from 'package:utils':
   #> 
   #>     timestamp
   
   example_data <- arrow_table(x = Array$create(c(1, 2, 3)))
   
   testbench_port <- Sys.getenv("TESTBENCH_PORT", "9001")
   fs <- GcsFileSystem$create(
     endpoint_override = sprintf("localhost:%s", testbench_port),
     retry_limit_seconds = 1,
     scheme = "http",
     anonymous = TRUE # Will fail to resolve host name if anonymous isn't TRUE
   )
   fs$CreateDir("test")
   write_parquet(example_data, fs$path("test/test.parquet"))
   write_parquet(example_data, fs$path("test/test.parquet/"))
   # GCS seems to handle them as separate paths
   fs$ls("test")
   #> [1] "test/test.parquet" "test/test.parquet"
   
   
   minio_key <- Sys.getenv("MINIO_ACCESS_KEY", "minioadmin")
   minio_secret <- Sys.getenv("MINIO_SECRET_KEY", "minioadmin")
   minio_port <- Sys.getenv("MINIO_PORT", "9000")
   fs <- S3FileSystem$create(
     access_key = minio_key,
     secret_key = minio_secret,
     scheme = "http",
     endpoint_override = paste0("localhost:", minio_port),
     allow_bucket_creation = TRUE,
     allow_bucket_deletion = TRUE
   )
   fs$CreateDir("test")
   write_parquet(example_data, fs$path("test/test.parquet"))
   write_parquet(example_data, fs$path("test/test.parquet/"))
   # S3 implementation seems to remove the last slash
   fs$ls("test")
   #> [1] "test/test.parquet"
   ```
   
   <sup>Created on 2022-07-11 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org