You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "cboettig (via GitHub)" <gi...@apache.org> on 2023/02/02 17:23:25 UTC

[GitHub] [arrow] cboettig opened a new pull request, #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

cboettig opened a new pull request, #34009:
URL: https://github.com/apache/arrow/pull/34009

   I think this is another option to address #33918.
   
   This retains the existing behavior when no additional arguments (region, endpoint, etc) are supplied.  
   
   If any optional arguments are supplied, this approach will by-pass the `$from_uri` call.  I think this is reasonable, given that the `from_uri()` call is implicitly ignoring optional arguments already, which is giving rise to the current bugs.  
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1414103549

   :warning: GitHub issue #33904 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "thisisnic (via GitHub)" <gi...@apache.org>.
thisisnic commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1418658868

   CI timeout will be unrelated to this PR as we've seen it do that on another one; have manually re-triggered it for you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot merged pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot merged PR #34009:
URL: https://github.com/apache/arrow/pull/34009


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cboettig commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "cboettig (via GitHub)" <gi...@apache.org>.
cboettig commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1426103718

   ah, good catch!  
   
   Good to know that the URI format requires all optional arguments to be included in the URI and aren't read from config or manual options even when using `$create()`, that's a bit surprising to me but guess it makes sense.  (My play.minio example must have passed because again it used a bucket name available on both AWS and play.minio endpoints! whoops!)
   
   Yup, all looks good now to me as well and works as expected.  Thanks much!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #34009:
URL: https://github.com/apache/arrow/pull/34009#discussion_r1096567202


##########
r/R/filesystem.R:
##########
@@ -462,12 +462,13 @@ s3_bucket <- function(bucket, ...) {
   if (!is_url(bucket)) {
     bucket <- paste0("s3://", bucket)
   }
-  fs_and_path <- FileSystem$from_uri(bucket)
-  fs <- fs_and_path$fs
+
+  if (!length(args)) {
+    fs_and_path <- FileSystem$from_uri(bucket)
+    fs <- fs_and_path$fs
+  } else {
   # If there are no additional S3Options, we can use that filesystem
-  # Otherwise, take the region that was detected and make a new fs with the args
-  if (length(args)) {
-    args$region <- fs$region
+  # If user specifies args, they must specify region as arg, env var, or config
     fs <- exec(S3FileSystem$create, !!!args)

Review Comment:
   Looks like you need to define `fs_and_path` here somehow?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cboettig commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "cboettig (via GitHub)" <gi...@apache.org>.
cboettig commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1426254287

   :tada: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cboettig commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "cboettig (via GitHub)" <gi...@apache.org>.
cboettig commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1416928068

   @paleolimbot ah simple, `fs_and_path$path` is just the `bucket`, should be fixed now.  
   
   Also checked locally this time (apologies for not doing so before, had to fiddle a bit to get `devtools::load_all()` to compile with S3 support).  Seems to be working on my end.
   
   One of the CI tasks seems to have timed out, not sure precisely but doesn't look like a related issue?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1414103498

   * Closes: #33904


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cboettig commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "cboettig (via GitHub)" <gi...@apache.org>.
cboettig commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1424659287

   It looks like things are passing now, except for again a stalled checks that don't seem related to the pull request(?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cboettig commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "cboettig (via GitHub)" <gi...@apache.org>.
cboettig commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1414116176

   I think this work-around is much preferable to a warning for now.  If you introduce a warning that says `endpoint_override` is not supported by s3_bucket, I have a good number of users across a few different packages who will be confused by the warning, because it works just fine as long as the bucket name is not unique.  (and yes, users on custom endpoints will often pick common bucket names already in use somewhere on AWS, since they have no reason to need unique names).  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] cboettig commented on a diff in pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "cboettig (via GitHub)" <gi...@apache.org>.
cboettig commented on code in PR #34009:
URL: https://github.com/apache/arrow/pull/34009#discussion_r1101889582


##########
r/tests/testthat/test-s3-minio.R:
##########
@@ -61,6 +61,13 @@ fs$CreateDir(now)
 # Clean up when we're all done
 withr::defer(fs$DeleteDir(now))
 
+test_that("Confirm s3_bucket works with endpoint_override", {
+  
+  
+  bucket <- s3_bucket("test", endpoint_override = "https://play.min.io")

Review Comment:
   @paleolimbot good question, that's something I'm trying to figure out too.  It failed to find the bucket in the previous test, https://github.com/apache/arrow/actions/runs/4093029707/jobs/7073398005, but I don't think that's because of any change I made -- in my own local testing I could never reproduce that error.  I see the desire to use localhost, but it's not quite clear how to ensure the localhost deploy works before running the test.
   
   It's not so clear to me that any of the other tests rely on the localhost bucket existing?  
   
   https://play.min.io is the official test service for the MINIO platform, not sure if that's ideal choice here for the tests, but then the test suite here isn't really built out and my objective is just submit a patch for this confirmed bug in the codebase.   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1426633364

   Benchmark runs are scheduled for baseline = 1140ad39b26e02ae645e3462ff8d76b7c00dff27 and contender = 8fed97fa951032a7e686407d29107cf48f6ab5f5. 8fed97fa951032a7e686407d29107cf48f6ab5f5 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/44955fc6919b4677a36e5068390954f6...dcbeee00eec043829fdaa0fff47a0ad4/)
   [Finished :arrow_down:0.46% :arrow_up:0.06%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/45652c2abbef4dd8b19f9eb3c4d8bd72...c42dd6da0fa14e368df088137a0aaec4/)
   [Finished :arrow_down:0.51% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/eb8fb96c92c549b9aea3e13f2ff7b634...9cf7f9a54d6d4f2889413ba47992eb4b/)
   [Finished :arrow_down:0.44% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4b768be75d084c69ad11f6b286d65e38...98b700999a0f46fb95d5cc62cbdc9643/)
   Buildkite builds:
   [Finished] [`8fed97fa` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2360)
   [Finished] [`8fed97fa` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2389)
   [Finished] [`8fed97fa` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2358)
   [Finished] [`8fed97fa` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2380)
   [Finished] [`1140ad39` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2359)
   [Finished] [`1140ad39` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2388)
   [Finished] [`1140ad39` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2357)
   [Finished] [`1140ad39` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2379)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1425906185

   It seemed there were a few issue...the notable one being that `bucket` was prefixed with `s3://` even if using `endpoint_override`. We'll see if CI agrees that the change was actually a fix...could you also take a look to make sure I didn't clobber your intent?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34009:
URL: https://github.com/apache/arrow/pull/34009#issuecomment-1426633730

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/eb8fb96c92c549b9aea3e13f2ff7b634...9cf7f9a54d6d4f2889413ba47992eb4b/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #34009:
URL: https://github.com/apache/arrow/pull/34009#discussion_r1102190923


##########
r/tests/testthat/test-s3-minio.R:
##########
@@ -61,6 +61,13 @@ fs$CreateDir(now)
 # Clean up when we're all done
 withr::defer(fs$DeleteDir(now))
 
+test_that("Confirm s3_bucket works with endpoint_override", {
+  
+  
+  bucket <- s3_bucket("test", endpoint_override = "https://play.min.io")

Review Comment:
   Hmm...the localhost version should definitely work. I will do a local pull tomorrow morning and see what I can come up with 🙂 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #34009: GH-33904: [R] improve behavior of s3_bucket - work-around

Posted by "paleolimbot (via GitHub)" <gi...@apache.org>.
paleolimbot commented on code in PR #34009:
URL: https://github.com/apache/arrow/pull/34009#discussion_r1101537227


##########
r/tests/testthat/test-s3-minio.R:
##########
@@ -61,6 +61,13 @@ fs$CreateDir(now)
 # Clean up when we're all done
 withr::defer(fs$DeleteDir(now))
 
+test_that("Confirm s3_bucket works with endpoint_override", {
+  
+  
+  bucket <- s3_bucket("test", endpoint_override = "https://play.min.io")

Review Comment:
   Was something wrong with the localhost version?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org