You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "amoeba (via GitHub)" <gi...@apache.org> on 2023/03/09 21:21:39 UTC

[GitHub] [arrow] amoeba opened a new pull request, #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

amoeba opened a new pull request, #34524:
URL: https://github.com/apache/arrow/pull/34524

   ### Rationale for this change
   
   Existing documentation for this argument was misleading.
   
   ### What changes are included in this PR?
   
   A change in functionality, matching tests, and updated documentation are included. `json_credentials` can now either be a literal string containing credentials or a string containing a path to credentials. In the latter case, credentials will be automatically read in from the fileystem.
   
   ### Are these changes tested?
   
   Yes
   
   ### Are there any user-facing changes?
   
   Yes, though not breaking. This affects user-facing APIs and documentation and is both a bug fix and new functionality.
   
   Closes https://github.com/apache/arrow/issues/34421
   Closes https://github.com/apache/arrow/issues/33106


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on PR #34524:
URL: https://github.com/apache/arrow/pull/34524#issuecomment-1476721218

   CI failure is real failure,
   
   ```
   -- Failure ('test-gcs.R:107:3'): GcsFileSystem$create() can read json_credentials --
   `object` (`actual`) not equal to `expected` (`expected`).
   
   `actual`:   "{\"key\" : \"valué\"}"
   `expected`: "{\"key\" : \"valué\"}" 
   Backtrace:
       x
    1. \-arrow:::expect_equal(fs$options$json_credentials, "{\"key\" : \"valué\"}") at test-gcs.R:107:2
    2.   \-testthat::expect_equal(...) at D:\a\arrow\arrow\r\check\arrow.Rcheck\tests\testthat\helper-expectation.R:42:4
   ```
   
   I'll look into this and ping you again when it's ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on a diff in pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on code in PR #34524:
URL: https://github.com/apache/arrow/pull/34524#discussion_r1142453062


##########
r/tests/testthat/test-gcs.R:
##########
@@ -91,6 +91,26 @@ test_that("GcsFileSystem$create() input validation", {
   )
 })
 
+test_that("GcsFileSystem$create() can read json_credentials", {
+  # From string
+  fs <- GcsFileSystem$create(json_credentials = "fromstring")
+  expect_equal(fs$options$json_credentials, "fromstring")
+
+  # From disk
+  cred_path <- tempfile()
+  on.exit(unlink(cred_path))
+  con <- file(cred_path, open = "wb")
+  writeBin('{"key" : "valu\u00e9"}', con)
+  close(con)
+
+  # This calls readLines which complains about embedded nuls and missing a
+  # final newline (See ?readLines)
+  suppressWarnings({

Review Comment:
   Good point. I found I had to run the character vector through iconv to get exactly the content into the file (via writeBin). See https://github.com/apache/arrow/pull/34524/commits/617d8b341c13a1c5d5b9a073963ad1ee1edd0d73. Otherwise, you always get a \00 (which R calls an embedded nul) at the end.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34524:
URL: https://github.com/apache/arrow/pull/34524#issuecomment-1462841906

   * Closes: #34421


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on a diff in pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on code in PR #34524:
URL: https://github.com/apache/arrow/pull/34524#discussion_r1142802326


##########
r/tests/testthat/test-gcs.R:
##########
@@ -91,6 +91,26 @@ test_that("GcsFileSystem$create() input validation", {
   )
 })
 
+test_that("GcsFileSystem$create() can read json_credentials", {
+  # From string
+  fs <- GcsFileSystem$create(json_credentials = "fromstring")
+  expect_equal(fs$options$json_credentials, "fromstring")
+
+  # From disk
+  cred_path <- tempfile()
+  on.exit(unlink(cred_path))
+  con <- file(cred_path, open = "wb")
+  writeBin('{"key" : "valu\u00e9"}', con)
+  close(con)
+
+  # This calls readLines which complains about embedded nuls and missing a
+  # final newline (See ?readLines)
+  suppressWarnings({

Review Comment:
   Thanks for this. I tried a few combinations that still used readLines and couldn't get something I liked that also passes CI. Even after ensuring the the test file has the exact UTF-8 bytes I want, `readLines(path, encoding = "UTF-8")` seems to still decode as Latin1. I stopped short of using a connection or mutating options. I can't reproduce this on my own Windows VM so testing solutions in CI is slow.
   
   I ended up swapping out readLines for your above `read_utf8` helper and previously updated the docs to indicate that the file must be UTF-8 encoded. I'll check CI later tonight/tomorrow.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on PR #34524:
URL: https://github.com/apache/arrow/pull/34524#issuecomment-1474392156

   Thanks for the review @paleolimbot. Let me know what you think about the readLines and encoding thing. Other than that I think this is good to go.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34524:
URL: https://github.com/apache/arrow/pull/34524#issuecomment-1480372646

   Benchmark runs are scheduled for baseline = ce0d20c06aeb5c484888422cb4ce42e4d24c588c and contender = d526fd9d06598707e99346d67adf9a1153bb4601. d526fd9d06598707e99346d67adf9a1153bb4601 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/49cb10bb626d4f5092bfeadf8843ad7b...480c3c7b81654d9d9033146a1295beb8/)
   [Finished :arrow_down:0.42% :arrow_up:0.03%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/89571499569947c4a98cfeca5340c4c2...73a90752493a49038c3c822067e7f09e/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/5fa60d3d559e4aa8a1d964874520b3d4...37f38a6d9cfc4395be73d020cb782070/)
   [Finished :arrow_down:0.03% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/292de9cc282c43b6abe1ed4b0411c2eb...c4988742022e4726a1606f6d452be17f/)
   Buildkite builds:
   [Finished] [`d526fd9d` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2556)
   [Finished] [`d526fd9d` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2586)
   [Finished] [`d526fd9d` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2554)
   [Finished] [`d526fd9d` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2577)
   [Finished] [`ce0d20c0` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2555)
   [Finished] [`ce0d20c0` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2585)
   [Finished] [`ce0d20c0` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2553)
   [Finished] [`ce0d20c0` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2576)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on a diff in pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on code in PR #34524:
URL: https://github.com/apache/arrow/pull/34524#discussion_r1140694772


##########
r/tests/testthat/test-gcs.R:
##########
@@ -91,6 +91,20 @@ test_that("GcsFileSystem$create() input validation", {
   )
 })
 
+test_that("GcsFileSystem$create() can read json_credentials", {
+  # From string
+  fs <- GcsFileSystem$create(json_credentials = "fromstring")
+  expect_equal(fs$options$json_credentials, "fromstring")
+
+  # From disk
+  cred_path <- tempfile()
+  on.exit(unlink(cred_path))
+
+  writeLines("fromdisk", cred_path)

Review Comment:
   Done in https://github.com/apache/arrow/pull/34524/commits/f3018b0351082f26f5bb0c6e69ee87749041a3f4. I didn't go too wild with it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on a diff in pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on code in PR #34524:
URL: https://github.com/apache/arrow/pull/34524#discussion_r1140646873


##########
r/R/filesystem.R:
##########
@@ -572,6 +572,11 @@ GcsFileSystem$create <- function(anonymous = FALSE, retry_limit_seconds = 15, ..
 
   options$retry_limit_seconds <- retry_limit_seconds
 
+  # Handle reading json_credentials from the filesystem
+  if ("json_credentials" %in% names(options) && file.exists(options[["json_credentials"]])) {
+    options[["json_credentials"]] <- paste(readLines(options[["json_credentials"]]), collapse = "")

Review Comment:
   > I forget the exact details but I think readLines() assumes system encoding (rather than UTF-8, which matters on Windows). Any thought as to which one of those is a better assumption if the value points to a file? (The workaround would be for the user to read in the file themselves first with an explicit encoding).
   
   Thanks for catching this point. You're right about readLines' behavior. To your question, I'm not entirely sure. If I test on my Windows 11 VM, the system-wide encoding is Windows 1252, but JSON files downloaded via Edge produces either ASCII or UTF-8 files. Maybe this just works or maybe just works on newer Win10/11 builds?
   
   > Also, did you want to collapse = "\n"?
   
   I suppose that's slightly better, though I think either works just the same. I'll change it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #34524:
URL: https://github.com/apache/arrow/pull/34524#discussion_r1131865760


##########
r/R/filesystem.R:
##########
@@ -572,6 +572,11 @@ GcsFileSystem$create <- function(anonymous = FALSE, retry_limit_seconds = 15, ..
 
   options$retry_limit_seconds <- retry_limit_seconds
 
+  # Handle reading json_credentials from the filesystem
+  if ("json_credentials" %in% names(options) && file.exists(options[["json_credentials"]])) {
+    options[["json_credentials"]] <- paste(readLines(options[["json_credentials"]]), collapse = "")

Review Comment:
   I forget the exact details but I think `readLines()` assumes system encoding (rather than UTF-8, which matters on Windows). Any thought as to which one of those is a better assumption if the value points to a file? (The workaround would be for the user to read in the file themselves first with an explicit encoding).



##########
r/R/filesystem.R:
##########
@@ -572,6 +572,11 @@ GcsFileSystem$create <- function(anonymous = FALSE, retry_limit_seconds = 15, ..
 
   options$retry_limit_seconds <- retry_limit_seconds
 
+  # Handle reading json_credentials from the filesystem
+  if ("json_credentials" %in% names(options) && file.exists(options[["json_credentials"]])) {
+    options[["json_credentials"]] <- paste(readLines(options[["json_credentials"]]), collapse = "")

Review Comment:
   Also, did you want to `collapse = "\n"`?



##########
r/tests/testthat/test-gcs.R:
##########
@@ -91,6 +91,20 @@ test_that("GcsFileSystem$create() input validation", {
   )
 })
 
+test_that("GcsFileSystem$create() can read json_credentials", {
+  # From string
+  fs <- GcsFileSystem$create(json_credentials = "fromstring")
+  expect_equal(fs$options$json_credentials, "fromstring")
+
+  # From disk
+  cred_path <- tempfile()
+  on.exit(unlink(cred_path))
+
+  writeLines("fromdisk", cred_path)

Review Comment:
   Maybe put some non-ASCII text and here? In test-csv there should be an example of how to do this (there are some `\u` and `charToRaw()` and `writeBin` things and I always forget the details).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #34524:
URL: https://github.com/apache/arrow/pull/34524#discussion_r1142590165


##########
r/tests/testthat/test-gcs.R:
##########
@@ -91,6 +91,26 @@ test_that("GcsFileSystem$create() input validation", {
   )
 })
 
+test_that("GcsFileSystem$create() can read json_credentials", {
+  # From string
+  fs <- GcsFileSystem$create(json_credentials = "fromstring")
+  expect_equal(fs$options$json_credentials, "fromstring")
+
+  # From disk
+  cred_path <- tempfile()
+  on.exit(unlink(cred_path))
+  con <- file(cred_path, open = "wb")
+  writeBin('{"key" : "valu\u00e9"}', con)
+  close(con)
+
+  # This calls readLines which complains about embedded nuls and missing a
+  # final newline (See ?readLines)
+  suppressWarnings({

Review Comment:
   ...in case it's helpful: https://github.com/apache/arrow/blob/main/r/tests/testthat/test-csv.R#L318-L328
   
   ...and
   
   ```
   read_utf8 <- function(file) {
     res <- readBin(file, "raw", n = file.info(file)$size)
     res <- rawToChar(res)
     Encoding(res) <- "UTF-8"
     res
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on a diff in pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on code in PR #34524:
URL: https://github.com/apache/arrow/pull/34524#discussion_r1145259720


##########
r/tests/testthat/test-gcs.R:
##########
@@ -91,6 +91,26 @@ test_that("GcsFileSystem$create() input validation", {
   )
 })
 
+test_that("GcsFileSystem$create() can read json_credentials", {
+  # From string
+  fs <- GcsFileSystem$create(json_credentials = "fromstring")
+  expect_equal(fs$options$json_credentials, "fromstring")
+
+  # From disk
+  cred_path <- tempfile()
+  on.exit(unlink(cred_path))
+  con <- file(cred_path, open = "wb")
+  writeBin('{"key" : "valu\u00e9"}', con)
+  close(con)
+
+  # This calls readLines which complains about embedded nuls and missing a
+  # final newline (See ?readLines)
+  suppressWarnings({

Review Comment:
   Hey @paleolimbot this is ready for a review now that all checks pass.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] amoeba commented on pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "amoeba (via GitHub)" <gi...@apache.org>.
amoeba commented on PR #34524:
URL: https://github.com/apache/arrow/pull/34524#issuecomment-1480087957

   Glad you caught it, thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot merged pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot merged PR #34524:
URL: https://github.com/apache/arrow/pull/34524


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #34524: GH-34421: [R] Let GcsFileSystem take a path for json_credentials

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #34524:
URL: https://github.com/apache/arrow/pull/34524#discussion_r1142216825


##########
r/tests/testthat/test-gcs.R:
##########
@@ -91,6 +91,26 @@ test_that("GcsFileSystem$create() input validation", {
   )
 })
 
+test_that("GcsFileSystem$create() can read json_credentials", {
+  # From string
+  fs <- GcsFileSystem$create(json_credentials = "fromstring")
+  expect_equal(fs$options$json_credentials, "fromstring")
+
+  # From disk
+  cred_path <- tempfile()
+  on.exit(unlink(cred_path))
+  con <- file(cred_path, open = "wb")
+  writeBin('{"key" : "valu\u00e9"}', con)
+  close(con)
+
+  # This calls readLines which complains about embedded nuls and missing a
+  # final newline (See ?readLines)
+  suppressWarnings({

Review Comment:
   Rather than suppress this warning, you could (1) use something other than `readLines()` or (2) include a final newline in your test data. I'm guessing these are files downloaded from the google cloud console 99% of the time, so including a final newline would probably be sufficient.
   
   (otherwise we might suppress warnings that are actual problems)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org