You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/07 11:08:35 UTC

[GitHub] [arrow] milesgranger opened a new pull request, #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

milesgranger opened a new pull request, #14601:
URL: https://github.com/apache/arrow/pull/14601

   Will fix [ARROW-17985](https://issues.apache.org/jira/browse/ARROW-17985)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
milesgranger commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1323260085

   I can try to setup a Windows VM environment locally and see if I can debug it any further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1326123986

   Benchmark runs are scheduled for baseline = 16ef5a83945b3d9917054afe439bd2230ed368ff and contender = 7ae4705c629655eb37b4200229d30f3a2af9e154. 7ae4705c629655eb37b4200229d30f3a2af9e154 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7fe1bffa993942bdb3a4e5c722a439a8...cba30a29dd924aeaabf0e42bd158b3ba/)
   [Finished :arrow_down:0.3% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/eb63ddbb059d4fe1901efad87613e0a2...49238c46827e49c7a647d232f37e3279/)
   [Finished :arrow_down:0.54% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/897ae912910f47a3811122777d626df1...929727fe4ade452ca76dc3ce27421e33/)
   [Finished :arrow_down:0.45% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4d61b8c6ecce498f97138698488daaa8...f9c6dff6036f4c50aff49fa574b5d319/)
   Buildkite builds:
   [Finished] [`7ae4705c` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1909)
   [Finished] [`7ae4705c` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1931)
   [Finished] [`7ae4705c` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1900)
   [Finished] [`7ae4705c` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1923)
   [Finished] [`16ef5a83` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1908)
   [Finished] [`16ef5a83` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1930)
   [Finished] [`16ef5a83` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1899)
   [Finished] [`16ef5a83` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1922)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019208468


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2303,10 +2303,14 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     auto outcome = impl_->client_->HeadBucket(req);
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
-        return ErrorToStatus(
-            std::forward_as_tuple("When getting information for bucket '", path.bucket,
-                                  "': "),
-            "HeadBucket", outcome.GetError());
+        auto msg = "When getting information for bucket '" + path.bucket + "': ";
+        // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
+        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));

Review Comment:
   Another possibility would be to condition this on the HTTP status code, which seems to be 301 in case the bucket exists in another region.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
AlenkaF commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1306851400

   The code looks good to me!
   
   AppVeyor is having some trouble though, `AWS Error NETWORK_CONNECTION` in `test_s3fs_wrong_region()`:
   
   ```python
   ___________________________ test_s3fs_wrong_region ____________________________
       def test_s3fs_wrong_region():
           from pyarrow.fs import S3FileSystem
       
           # wrong region for bucket
           fs = S3FileSystem(region='eu-north-1')
       
           msg = ("When getting information for bucket 'voltrondata-labs-datasets': "
                  "Looks like the configured region is 'eu-north-1' while the "
                  "bucket is located in 'us-east-2': *")
           with pytest.raises(OSError, match=msg):
   >           fs.get_file_info("voltrondata-labs-datasets")
   pyarrow\tests\test_fs.py:1329: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   >   info = GetResultValue(self.fs.GetFileInfo(path))
   pyarrow\_fs.pyx:571: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   >   return check_status(status)
   pyarrow\error.pxi:144: 
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
   >   raise IOError(message)
   E   OSError: When resolving region for bucket 'voltrondata-labs-datasets': AWS Error NETWORK_CONNECTION during HeadBucket operation: Encountered network error when sending http request
   pyarrow\error.pxi:115: OSError
   During handling of the above exception, another exception occurred:
       def test_s3fs_wrong_region():
           from pyarrow.fs import S3FileSystem
       
           # wrong region for bucket
           fs = S3FileSystem(region='eu-north-1')
       
           msg = ("When getting information for bucket 'voltrondata-labs-datasets': "
                  "Looks like the configured region is 'eu-north-1' while the "
                  "bucket is located in 'us-east-2': *")
           with pytest.raises(OSError, match=msg):
   >           fs.get_file_info("voltrondata-labs-datasets")
   E           AssertionError: Regex pattern did not match.
   E            Regex: "When getting information for bucket 'voltrondata-labs-datasets': Looks like the configured region is 'eu-north-1' while the bucket is located in 'us-east-2': *"
   E            Input: "When resolving region for bucket 'voltrondata-labs-datasets': AWS Error NETWORK_CONNECTION during HeadBucket operation: Encountered network error when sending http request"
   pyarrow\tests\test_fs.py:1329: AssertionError
   ```
   
   I will re-run the build to see if it helps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
milesgranger commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1306857822

   Thanks @AlenkaF, is there a way to mark tests as flaky or otherwise try a couple times?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1022526260


##########
cpp/src/arrow/filesystem/s3_internal.h:
##########
@@ -158,6 +170,13 @@ Status ErrorToStatus(const std::string& prefix, const std::string& operation,
   if (error_type == Aws::S3::S3Errors::UNKNOWN) {
     ss << " (HTTP status " << static_cast<int>(error.GetResponseCode()) << ")";
   }
+  const auto maybe_region = BucketRegionFromError(error);
+  if (maybe_region.has_value() && region.has_value()) {

Review Comment:
   Only do this when a region was passed to start with? (so moving `region.has_value()` before `BucketRegionFromError`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou merged pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou merged PR #14601:
URL: https://github.com/apache/arrow/pull/14601


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1025381713


##########
cpp/src/arrow/filesystem/s3_internal.h:
##########
@@ -76,6 +76,17 @@ inline bool IsConnectError(const Aws::Client::AWSError<Error>& error) {
   return false;
 }
 
+inline std::optional<std::string> BucketRegionFromError(
+    const Aws::Client::AWSError<Aws::S3::S3Errors>& error) {
+  const auto headers = error.GetResponseHeaders();

Review Comment:
   Nit
   ```suggestion
     const auto& headers = error.GetResponseHeaders();
   ```



##########
cpp/src/arrow/filesystem/s3_internal.h:
##########
@@ -158,8 +170,20 @@ Status ErrorToStatus(const std::string& prefix, const std::string& operation,
   if (error_type == Aws::S3::S3Errors::UNKNOWN) {
     ss << " (HTTP status " << static_cast<int>(error.GetResponseCode()) << ")";
   }
+
+  // Possibly an error due to wrong region configuration from client and bucket.
+  std::optional<std::string> wrong_region_msg = std::nullopt;
+  if (region.has_value()) {
+    const auto maybe_region = BucketRegionFromError(error);

Review Comment:
   Technically, this will only compile if `ErrorType` is `Aws::S3::S3Errors`. However, you could add a fallback for non-S3 errors:
   ```c++
   template <typename ErrorType>
   std::optional<std::string> BucketRegionFromError(const Aws::Client::AWSError<ErrorType>&) {
     return std::nullopt;
   }
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1028357593


##########
cpp/src/arrow/filesystem/s3_internal.h:
##########
@@ -158,8 +170,20 @@ Status ErrorToStatus(const std::string& prefix, const std::string& operation,
   if (error_type == Aws::S3::S3Errors::UNKNOWN) {
     ss << " (HTTP status " << static_cast<int>(error.GetResponseCode()) << ")";
   }
+
+  // Possibly an error due to wrong region configuration from client and bucket.
+  std::optional<std::string> wrong_region_msg = std::nullopt;
+  if (region.has_value()) {
+    const auto maybe_region = BucketRegionFromError(error);

Review Comment:
   `constexpr` marks values computable at compile-time, which is not the case here :-)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019242095


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2303,10 +2303,14 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     auto outcome = impl_->client_->HeadBucket(req);
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
-        return ErrorToStatus(
-            std::forward_as_tuple("When getting information for bucket '", path.bucket,
-                                  "': "),
-            "HeadBucket", outcome.GetError());
+        auto msg = "When getting information for bucket '" + path.bucket + "': ";
+        // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
+        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));

Review Comment:
   Note that we only do this extra API call if there is an error (so it doesn't impact code that doesn't error), and also only if the bucket exists (because if it doesn't exist (eg a typo), we return FileType::NotFound). So that might already limit the cases where this extra call would actually be a problem.
   
   Another option could be to only check for this is the AWS Error is UNKNOWN (since it doesn't seem to have a specific error code for "buckets exists but not on this region"). That would avoid doing the check if it is another know error like a network connection or unknown host.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019247304


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2303,10 +2303,14 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     auto outcome = impl_->client_->HeadBucket(req);
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
-        return ErrorToStatus(
-            std::forward_as_tuple("When getting information for bucket '", path.bucket,
-                                  "': "),
-            "HeadBucket", outcome.GetError());
+        auto msg = "When getting information for bucket '" + path.bucket + "': ";
+        // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
+        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));

Review Comment:
   Also, perhaps you can try inspecting the headers of the HeadBucket error response? The answer might be there already.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1020060039


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2304,11 +2304,16 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
         auto msg = "When getting information for bucket '" + path.bucket + "': ";
+
         // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
-        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));
-        if (region != impl_->options().region) {
-          msg += "Looks like the configured region is '" + impl_->options().region +
-                 "' while the bucket is located in '" + region + "': ";
+        const auto headers = outcome.GetError().GetResponseHeaders();
+        const auto it = headers.find("x-amz-bucket-region");
+        if (it != headers.end()) {
+          const std::string region(it->second.begin(), it->second.end());
+          if (region != impl_->options().region) {
+            msg += "Looks like the configured region is '" + impl_->options().region +
+                   "' while the bucket is located in '" + region + "': ";
+          }

Review Comment:
   Added my initial impressions on factoring it out into `ErrorToStatus` but required callers to send in the current region as `impl_` wasn't available. https://github.com/apache/arrow/pull/14601/commits/be3af79e1efe2d9d4c14d3ad5506eeddaef7edda
   
   If this is okay, then I will press on finding other places which might also want to pass in region when calling `ErrorToStatus`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1323680778

   The MinGW failures are unrelated and probably https://github.com/aws/aws-sdk-cpp/issues/2204


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1305448625

   https://issues.apache.org/jira/browse/ARROW-17985


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019246048


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2303,10 +2303,14 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     auto outcome = impl_->client_->HeadBucket(req);
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
-        return ErrorToStatus(
-            std::forward_as_tuple("When getting information for bucket '", path.bucket,
-                                  "': "),
-            "HeadBucket", outcome.GetError());
+        auto msg = "When getting information for bucket '" + path.bucket + "': ";
+        // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
+        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));

Review Comment:
   "UNKNOWN" is not really specific enough. However, the HTTP status code may be a better clue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1322625214

   Actually, perhaps you need to rebase/merge from master and see if the tests need fixing?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1022552474


##########
cpp/src/arrow/filesystem/s3_internal.h:
##########
@@ -158,6 +170,13 @@ Status ErrorToStatus(const std::string& prefix, const std::string& operation,
   if (error_type == Aws::S3::S3Errors::UNKNOWN) {
     ss << " (HTTP status " << static_cast<int>(error.GetResponseCode()) << ")";
   }
+  const auto maybe_region = BucketRegionFromError(error);
+  if (maybe_region.has_value() && region.has_value()) {

Review Comment:
   https://github.com/apache/arrow/pull/14601/commits/d1b6471570d7fc1f82ca1a885c8d96ad664f528f



##########
cpp/src/arrow/filesystem/s3_internal.h:
##########
@@ -158,6 +170,13 @@ Status ErrorToStatus(const std::string& prefix, const std::string& operation,
   if (error_type == Aws::S3::S3Errors::UNKNOWN) {
     ss << " (HTTP status " << static_cast<int>(error.GetResponseCode()) << ")";
   }
+  const auto maybe_region = BucketRegionFromError(error);
+  if (maybe_region.has_value() && region.has_value()) {
+    if (maybe_region.value() != region.value()) {
+      ss << " Looks like the configured region is '" + region.value() +

Review Comment:
   https://github.com/apache/arrow/pull/14601/commits/6c476ca1c386d215d68375b9ef33c0475aea9292



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1308551898

   Can you rebase or merge master to see if we get a green CI?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019203778


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2303,10 +2303,14 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     auto outcome = impl_->client_->HeadBucket(req);
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
-        return ErrorToStatus(
-            std::forward_as_tuple("When getting information for bucket '", path.bucket,
-                                  "': "),
-            "HeadBucket", outcome.GetError());
+        auto msg = "When getting information for bucket '" + path.bucket + "': ";
+        // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
+        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));

Review Comment:
   This will issue an additional API call. I'm not sure it's desirable to do this ourselves, rather than simply enhance the original error message with a clue as to the possible cause?
   
   For example:
   > When getting information for bucket 'XXX' (perhaps you configured the wrong region?): ...
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1322624276

   Well, looks like the test fails on AppVeyor.
   https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/45449303#L2831
   
   This may be due to different AWS SDK versions, or some other factor...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1323304789

   I started AppVeyor again, perhaps it was a fluke? I agree the network connection error is a bit unexpected...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019209543


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2303,10 +2303,14 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     auto outcome = impl_->client_->HeadBucket(req);
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
-        return ErrorToStatus(
-            std::forward_as_tuple("When getting information for bucket '", path.bucket,
-                                  "': "),
-            "HeadBucket", outcome.GetError());
+        auto msg = "When getting information for bucket '" + path.bucket + "': ";
+        // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
+        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));

Review Comment:
   But in general I think it's a bad idea to issue this additional API call, especially as HeadBucket fails fast, while region resolution can be slow (~400 ms here).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1323240675

   Unfortunately still failing ..
   
   It's giving a network connection error (`OSError: When getting information for bucket 'voltrondata-labs-datasets': AWS Error NETWORK_CONNECTION during HeadBucket operation: Encountered network error when sending http request`). I find it strange that it would give such an error if it is actually for a wrong bucket/region, since that is not really related to the network connection (but if it's actually a connection issue, it also wouldn't be just this test that is failing)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] AlenkaF commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
AlenkaF commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1306865753

   > Thanks @AlenkaF, is there a way to mark tests as flaky or otherwise try a couple times?
   
   Hm, not sure about that. 
   Looking at the code base for PyArrow, flaky tests seem to be commented out, example:
   https://github.com/apache/arrow/blob/a02a3369da4c9f9df4d1dbb3c22af046a72d907a/python/pyarrow/tests/test_plasma.py#L1032
   
   > Edit: Also noticed I missed the s3 pytest marker.
   👍 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1022536661


##########
cpp/src/arrow/filesystem/s3_internal.h:
##########
@@ -158,6 +170,13 @@ Status ErrorToStatus(const std::string& prefix, const std::string& operation,
   if (error_type == Aws::S3::S3Errors::UNKNOWN) {
     ss << " (HTTP status " << static_cast<int>(error.GetResponseCode()) << ")";
   }
+  const auto maybe_region = BucketRegionFromError(error);
+  if (maybe_region.has_value() && region.has_value()) {
+    if (maybe_region.value() != region.value()) {
+      ss << " Looks like the configured region is '" + region.value() +

Review Comment:
   Could we add this additional context fully at the end of the error message? 
   
   Right now the AWS part gives something like "AWS Error UNKNOWN (HTTP status 301) during HeadObject operation: No response body.", which sounds like a logical sentence, and this additional sentence is adding there somewhere in the middle.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019242095


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2303,10 +2303,14 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     auto outcome = impl_->client_->HeadBucket(req);
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
-        return ErrorToStatus(
-            std::forward_as_tuple("When getting information for bucket '", path.bucket,
-                                  "': "),
-            "HeadBucket", outcome.GetError());
+        auto msg = "When getting information for bucket '" + path.bucket + "': ";
+        // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
+        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));

Review Comment:
   Note that we only do this extra API call if there is an error (so it doesn't impact code that doesn't error), and also only if the bucket exists (because if it doesn't exist (eg a typo), we return FileType::NotFound). So that might already limit the cases where this extra call would actually be a problem.
   
   Another option could be to only check for this is the AWS Error is UNKNOWN (since it doesn't seem to have a specific error code for "buckets exists but not on this region")



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019247898


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2303,10 +2303,14 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     auto outcome = impl_->client_->HeadBucket(req);
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
-        return ErrorToStatus(
-            std::forward_as_tuple("When getting information for bucket '", path.bucket,
-                                  "': "),
-            "HeadBucket", outcome.GetError());
+        auto msg = "When getting information for bucket '" + path.bucket + "': ";
+        // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
+        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));

Review Comment:
   Last thing: ideally we do not do this only for HeadBucket, but for all similar operations (such as HeadObject).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019444516


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2304,11 +2304,16 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
         auto msg = "When getting information for bucket '" + path.bucket + "': ";
+
         // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
-        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));
-        if (region != impl_->options().region) {
-          msg += "Looks like the configured region is '" + impl_->options().region +
-                 "' while the bucket is located in '" + region + "': ";
+        const auto headers = outcome.GetError().GetResponseHeaders();
+        const auto it = headers.find("x-amz-bucket-region");
+        if (it != headers.end()) {
+          const std::string region(it->second.begin(), it->second.end());
+          if (region != impl_->options().region) {
+            msg += "Looks like the configured region is '" + impl_->options().region +
+                   "' while the bucket is located in '" + region + "': ";
+          }

Review Comment:
   Can we factor this out somehow and reuse it in other places where there could be a similar error?
   (`HeadObject` below is an obvious candidate IMHO)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1019430511


##########
cpp/src/arrow/filesystem/s3fs.cc:
##########
@@ -2303,10 +2303,14 @@ Result<FileInfo> S3FileSystem::GetFileInfo(const std::string& s) {
     auto outcome = impl_->client_->HeadBucket(req);
     if (!outcome.IsSuccess()) {
       if (!IsNotFound(outcome.GetError())) {
-        return ErrorToStatus(
-            std::forward_as_tuple("When getting information for bucket '", path.bucket,
-                                  "': "),
-            "HeadBucket", outcome.GetError());
+        auto msg = "When getting information for bucket '" + path.bucket + "': ";
+        // Bucket exists, but failed to call HeadBucket, perhaps wrong region?
+        ARROW_ASSIGN_OR_RAISE(auto region, impl_->client_->GetBucketRegion(path.bucket));

Review Comment:
   Ah, it was in the headers after all. :+1: https://github.com/apache/arrow/pull/14601/commits/eccb0b521b7281c6ec09d9793ab2cee3137130bd



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
milesgranger commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1026209688


##########
cpp/src/arrow/filesystem/s3_internal.h:
##########
@@ -158,8 +170,20 @@ Status ErrorToStatus(const std::string& prefix, const std::string& operation,
   if (error_type == Aws::S3::S3Errors::UNKNOWN) {
     ss << " (HTTP status " << static_cast<int>(error.GetResponseCode()) << ")";
   }
+
+  // Possibly an error due to wrong region configuration from client and bucket.
+  std::optional<std::string> wrong_region_msg = std::nullopt;
+  if (region.has_value()) {
+    const auto maybe_region = BucketRegionFromError(error);

Review Comment:
   Thanks! Added in https://github.com/apache/arrow/pull/14601/commits/34a49760010c5a6c6e5c3c31bcb0a05d90b26648, but then as someone new to C++, is it just as good to combine them with a `constexpr` as in https://github.com/apache/arrow/pull/14601/commits/23f8fe72d8003a17ef59500aa0ea574a6c59fb01? I have no real opinion, but curious if performance or style/standard has one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on code in PR #14601:
URL: https://github.com/apache/arrow/pull/14601#discussion_r1028358854


##########
cpp/src/arrow/filesystem/s3_internal.h:
##########
@@ -158,8 +170,20 @@ Status ErrorToStatus(const std::string& prefix, const std::string& operation,
   if (error_type == Aws::S3::S3Errors::UNKNOWN) {
     ss << " (HTTP status " << static_cast<int>(error.GetResponseCode()) << ")";
   }
+
+  // Possibly an error due to wrong region configuration from client and bucket.
+  std::optional<std::string> wrong_region_msg = std::nullopt;
+  if (region.has_value()) {
+    const auto maybe_region = BucketRegionFromError(error);

Review Comment:
   Oops, sorry, I misunderstood your comment (should have clicked the links before answering...).
   Yes, the `if constexpr` change is fine!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
pitrou commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1323332013

   Ok, the failure occurred again.
   I don't think it's worth gold-plating this either. The test could probably be skipped if "NETWORK_CONNECTION" is part of the error message.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
milesgranger commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1323333261

   Okay, I'll update the test. :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] milesgranger commented on pull request #14601: ARROW-17985: [C++][Python] Improve s3fs error message when wrong region

Posted by GitBox <gi...@apache.org>.
milesgranger commented on PR #14601:
URL: https://github.com/apache/arrow/pull/14601#issuecomment-1322629248

   Thanks for the help! 
   
   > Encountered network error when sending http request
   
   I would have expect an incompatible SDK to complain differently.. but it's a strange world. Fingers crossed the update from master helps.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org