You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/20 06:02:38 UTC

[GitHub] [arrow] emkornfield opened a new pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

emkornfield opened a new pull request #12475:
URL: https://github.com/apache/arrow/pull/12475


   This is mostly adapted from s3. One place that it differs
   is it doesn't allow for username and password. The only thing
   that I could see supporting in the future is taking username as
   user to impersonate, but we can see if there is demand for that feature.
   
   I also added in two useful features in s3:
   - Option for setting a default location to create a bucket in.
   - Default key-value metadata to use for object creation if none
     is specified.
   
   Also fixed a typo in s3fs error message.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r811949614



##########
File path: cpp/src/arrow/filesystem/filesystem.cc
##########
@@ -686,6 +689,16 @@ Result<std::shared_ptr<FileSystem>> FileSystemFromUriReal(const Uri& uri,
     }
     return std::make_shared<LocalFileSystem>(options, io_context);
   }
+  if (scheme == "gs") {

Review comment:
       Can/should we also accept "gcs" for compatibility with https://gcsfs.readthedocs.io/en/latest/#integration ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r812627764



##########
File path: cpp/src/arrow/filesystem/gcsfs.cc
##########
@@ -610,21 +619,26 @@ class GcsFileSystem::Impl {
 
 bool GcsOptions::Equals(const GcsOptions& other) const {
   return credentials == other.credentials &&
-         endpoint_override == other.endpoint_override && scheme == other.scheme;
+         endpoint_override == other.endpoint_override && scheme == other.scheme &&
+         default_bucket_location == other.default_bucket_location;
 }
 
 GcsOptions GcsOptions::Defaults() {
   return GcsOptions{
       std::make_shared<GcsCredentials>(google::cloud::MakeGoogleDefaultCredentials()),
       {},
-      "https"};
+      "https",
+      /*default_bucket_location=*/{},
+      /*default_metadata=*/nullptr};

Review comment:
       done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r812627031



##########
File path: cpp/src/arrow/filesystem/filesystem.cc
##########
@@ -686,6 +689,16 @@ Result<std::shared_ptr<FileSystem>> FileSystemFromUriReal(const Uri& uri,
     }
     return std::make_shared<LocalFileSystem>(options, io_context);
   }
+  if (scheme == "gs") {

Review comment:
       done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#issuecomment-1048785099


   Benchmark runs are scheduled for baseline = 53e12965981a419021ce1c19784e4e6766e5b137 and contender = 824456872a4d3a1d847351cd8bea6325bafdf448. 824456872a4d3a1d847351cd8bea6325bafdf448 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4b83b26c05484fe6bb62a62b8f1e2875...b4cf5c3229a84be597aea2bdc2a2e5eb/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/0d9e378b30384b74bfff8382717545ba...84c2fc202dbe4816b8c3902f7a79110d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/63f423bf4fa44f9e85000967f6ae7cd7...c7b6c4f766184bf086808c6140ef13be/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/30301a93e7ce4ab79e9fa7a5e93c46e7...833b4512cf2e434b8b55a88eb31405ce/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#issuecomment-1048785099


   Benchmark runs are scheduled for baseline = 53e12965981a419021ce1c19784e4e6766e5b137 and contender = 824456872a4d3a1d847351cd8bea6325bafdf448. 824456872a4d3a1d847351cd8bea6325bafdf448 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4b83b26c05484fe6bb62a62b8f1e2875...b4cf5c3229a84be597aea2bdc2a2e5eb/)
   [Finished :arrow_down:0.42% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/0d9e378b30384b74bfff8382717545ba...84c2fc202dbe4816b8c3902f7a79110d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/63f423bf4fa44f9e85000967f6ae7cd7...c7b6c4f766184bf086808c6140ef13be/)
   [Finished :arrow_down:0.68% :arrow_up:0.34%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/30301a93e7ce4ab79e9fa7a5e93c46e7...833b4512cf2e434b8b55a88eb31405ce/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] coryan commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
coryan commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r812218202



##########
File path: cpp/src/arrow/filesystem/gcsfs.cc
##########
@@ -642,14 +658,73 @@ GcsOptions GcsOptions::FromImpersonatedServiceAccount(
                         google::cloud::MakeImpersonateServiceAccountCredentials(
                             base_credentials.credentials, target_service_account)),
                     {},
-                    "https"};
+                    "https",
+                    /*default_bucket_location=*/{},
+                    /*default_metadata=*/nullptr};
 }
 
 GcsOptions GcsOptions::FromServiceAccountCredentials(const std::string& json_object) {
   return GcsOptions{std::make_shared<GcsCredentials>(
                         google::cloud::MakeServiceAccountCredentials(json_object)),
                     {},
-                    "https"};
+                    "https",
+                    /*default_bucket_location=*/{},
+                    /*default_metadata=*/nullptr};
+}
+
+Result<GcsOptions> GcsOptions::FromUri(const arrow::internal::Uri& uri,
+                                       std::string* out_path) {
+  const auto bucket = uri.host();
+  auto path = uri.path();
+  if (bucket.empty()) {
+    if (!path.empty()) {
+      return Status::Invalid("Missing bucket name in GCS URI");
+    }
+  } else {
+    if (path.empty()) {
+      path = bucket;
+    } else {
+      if (path[0] != '/') {
+        return Status::Invalid("GCS URI should be absolute, not relative");
+      }
+      path = bucket + path;
+    }
+  }
+  if (out_path != nullptr) {
+    *out_path = std::string(internal::RemoveTrailingSlash(path));
+  }
+
+  std::unordered_map<std::string, std::string> options_map;
+  ARROW_ASSIGN_OR_RAISE(const auto options_items, uri.query_items());
+  for (const auto& kv : options_items) {
+    options_map.emplace(kv.first, kv.second);
+  }
+
+  if (!uri.password().empty() || !uri.username().empty()) {
+    return Status::Invalid("GCS does not accept username or password.");

Review comment:
       The JSON API, which is the API used under the hood by the client libraries (C++ or any other language) does not have passwords.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#issuecomment-1048785099


   Benchmark runs are scheduled for baseline = 53e12965981a419021ce1c19784e4e6766e5b137 and contender = 824456872a4d3a1d847351cd8bea6325bafdf448. 824456872a4d3a1d847351cd8bea6325bafdf448 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4b83b26c05484fe6bb62a62b8f1e2875...b4cf5c3229a84be597aea2bdc2a2e5eb/)
   [Finished :arrow_down:0.42% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/0d9e378b30384b74bfff8382717545ba...84c2fc202dbe4816b8c3902f7a79110d/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/63f423bf4fa44f9e85000967f6ae7cd7...c7b6c4f766184bf086808c6140ef13be/)
   [Finished :arrow_down:0.68% :arrow_up:0.34%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/30301a93e7ce4ab79e9fa7a5e93c46e7...833b4512cf2e434b8b55a88eb31405ce/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r812290371



##########
File path: cpp/src/arrow/filesystem/gcsfs.h
##########
@@ -34,6 +35,13 @@ struct ARROW_EXPORT GcsOptions {
 
   std::string endpoint_override;
   std::string scheme;
+  // \brief Location to use for creating buckets.

Review comment:
       I've always thought of multi-regions as magic regions instead of a different concept.  Unfortunately, people do use multi-regions for analytics workloads and experience the associated pain points.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r812213352



##########
File path: cpp/src/arrow/filesystem/gcsfs.cc
##########
@@ -642,14 +658,73 @@ GcsOptions GcsOptions::FromImpersonatedServiceAccount(
                         google::cloud::MakeImpersonateServiceAccountCredentials(
                             base_credentials.credentials, target_service_account)),
                     {},
-                    "https"};
+                    "https",
+                    /*default_bucket_location=*/{},
+                    /*default_metadata=*/nullptr};
 }
 
 GcsOptions GcsOptions::FromServiceAccountCredentials(const std::string& json_object) {
   return GcsOptions{std::make_shared<GcsCredentials>(
                         google::cloud::MakeServiceAccountCredentials(json_object)),
                     {},
-                    "https"};
+                    "https",
+                    /*default_bucket_location=*/{},
+                    /*default_metadata=*/nullptr};
+}
+
+Result<GcsOptions> GcsOptions::FromUri(const arrow::internal::Uri& uri,
+                                       std::string* out_path) {
+  const auto bucket = uri.host();
+  auto path = uri.path();
+  if (bucket.empty()) {
+    if (!path.empty()) {
+      return Status::Invalid("Missing bucket name in GCS URI");
+    }
+  } else {
+    if (path.empty()) {
+      path = bucket;
+    } else {
+      if (path[0] != '/') {
+        return Status::Invalid("GCS URI should be absolute, not relative");
+      }
+      path = bucket + path;
+    }
+  }
+  if (out_path != nullptr) {
+    *out_path = std::string(internal::RemoveTrailingSlash(path));
+  }
+
+  std::unordered_map<std::string, std::string> options_map;
+  ARROW_ASSIGN_OR_RAISE(const auto options_items, uri.query_items());
+  for (const auto& kv : options_items) {
+    options_map.emplace(kv.first, kv.second);
+  }
+
+  if (!uri.password().empty() || !uri.username().empty()) {
+    return Status::Invalid("GCS does not accept username or password.");

Review comment:
       There isn't a concept of a standalone password as far as I can tell.  The closest thing to it is a JSON object that contains Key information.  @coryan to verify my understanding.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#issuecomment-1048785099


   Benchmark runs are scheduled for baseline = 53e12965981a419021ce1c19784e4e6766e5b137 and contender = 824456872a4d3a1d847351cd8bea6325bafdf448. 824456872a4d3a1d847351cd8bea6325bafdf448 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4b83b26c05484fe6bb62a62b8f1e2875...b4cf5c3229a84be597aea2bdc2a2e5eb/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/0d9e378b30384b74bfff8382717545ba...84c2fc202dbe4816b8c3902f7a79110d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/63f423bf4fa44f9e85000967f6ae7cd7...c7b6c4f766184bf086808c6140ef13be/)
   [Finished :arrow_down:0.68% :arrow_up:0.34%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/30301a93e7ce4ab79e9fa7a5e93c46e7...833b4512cf2e434b8b55a88eb31405ce/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#issuecomment-1073357759


   This change hadn't been released.  I think with this change datsets should work.  I'll hopefully have a PR up this week for pyarrow wrappers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] grisaitis commented on pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
grisaitis commented on pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#issuecomment-1073356273


   Is any additional work required to make this accessible from pyarrow / `pyarrow.fs.GcsFileSystem`? Would love to start using this from pyarrow. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#issuecomment-1048785099


   Benchmark runs are scheduled for baseline = 53e12965981a419021ce1c19784e4e6766e5b137 and contender = 824456872a4d3a1d847351cd8bea6325bafdf448. 824456872a4d3a1d847351cd8bea6325bafdf448 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4b83b26c05484fe6bb62a62b8f1e2875...b4cf5c3229a84be597aea2bdc2a2e5eb/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/0d9e378b30384b74bfff8382717545ba...84c2fc202dbe4816b8c3902f7a79110d/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/63f423bf4fa44f9e85000967f6ae7cd7...c7b6c4f766184bf086808c6140ef13be/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/30301a93e7ce4ab79e9fa7a5e93c46e7...833b4512cf2e434b8b55a88eb31405ce/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou closed pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
pitrou closed pull request #12475:
URL: https://github.com/apache/arrow/pull/12475


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#issuecomment-1046169333


   CC @coryan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] coryan commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
coryan commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r812216401



##########
File path: cpp/src/arrow/filesystem/gcsfs.h
##########
@@ -34,6 +35,13 @@ struct ARROW_EXPORT GcsOptions {
 
   std::string endpoint_override;
   std::string scheme;
+  // \brief Location to use for creating buckets.

Review comment:
       There are locations that are not regions:
   
   https://cloud.google.com/storage/docs/locations#location-mr
   
   Probably not of much interest for analytics workloads, but who knows.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r811953577



##########
File path: cpp/src/arrow/filesystem/gcsfs.cc
##########
@@ -610,21 +619,26 @@ class GcsFileSystem::Impl {
 
 bool GcsOptions::Equals(const GcsOptions& other) const {
   return credentials == other.credentials &&
-         endpoint_override == other.endpoint_override && scheme == other.scheme;
+         endpoint_override == other.endpoint_override && scheme == other.scheme &&
+         default_bucket_location == other.default_bucket_location;
 }
 
 GcsOptions GcsOptions::Defaults() {
   return GcsOptions{
       std::make_shared<GcsCredentials>(google::cloud::MakeGoogleDefaultCredentials()),
       {},
-      "https"};
+      "https",
+      /*default_bucket_location=*/{},
+      /*default_metadata=*/nullptr};

Review comment:
       Instead of spelling all default parameter values explicitly, it would probably more readable and maintainable to set the non-default attributes individually e.g.;
   ```c++
   GcsOptions options{};
   options.credentials = std::make_shared<GcsCredentials>(google::cloud::MakeInsecureCredentials());
   options.schema = "http";
   return options;
   ```
   

##########
File path: cpp/src/arrow/filesystem/gcsfs.h
##########
@@ -34,6 +35,13 @@ struct ARROW_EXPORT GcsOptions {
 
   std::string endpoint_override;
   std::string scheme;
+  // \brief Location to use for creating buckets.

Review comment:
       What kind of value can a location have? Is it a "region" like in S3?

##########
File path: cpp/src/arrow/filesystem/gcsfs.h
##########
@@ -89,6 +97,11 @@ struct ARROW_EXPORT GcsOptions {
   ///
   /// [aip/4112]: https://google.aip.dev/auth/4112
   static GcsOptions FromServiceAccountCredentials(const std::string& json_object);
+
+  // Initialize from URIs such as (gs://bucket/object).

Review comment:
       ```suggestion
     /// Initialize from URIs such as "gs://bucket/object".
   ```

##########
File path: cpp/src/arrow/filesystem/gcsfs.cc
##########
@@ -642,14 +658,73 @@ GcsOptions GcsOptions::FromImpersonatedServiceAccount(
                         google::cloud::MakeImpersonateServiceAccountCredentials(
                             base_credentials.credentials, target_service_account)),
                     {},
-                    "https"};
+                    "https",
+                    /*default_bucket_location=*/{},
+                    /*default_metadata=*/nullptr};
 }
 
 GcsOptions GcsOptions::FromServiceAccountCredentials(const std::string& json_object) {
   return GcsOptions{std::make_shared<GcsCredentials>(
                         google::cloud::MakeServiceAccountCredentials(json_object)),
                     {},
-                    "https"};
+                    "https",
+                    /*default_bucket_location=*/{},
+                    /*default_metadata=*/nullptr};
+}
+
+Result<GcsOptions> GcsOptions::FromUri(const arrow::internal::Uri& uri,
+                                       std::string* out_path) {
+  const auto bucket = uri.host();
+  auto path = uri.path();
+  if (bucket.empty()) {
+    if (!path.empty()) {
+      return Status::Invalid("Missing bucket name in GCS URI");
+    }
+  } else {
+    if (path.empty()) {
+      path = bucket;
+    } else {
+      if (path[0] != '/') {
+        return Status::Invalid("GCS URI should be absolute, not relative");
+      }
+      path = bucket + path;
+    }
+  }
+  if (out_path != nullptr) {
+    *out_path = std::string(internal::RemoveTrailingSlash(path));
+  }
+
+  std::unordered_map<std::string, std::string> options_map;
+  ARROW_ASSIGN_OR_RAISE(const auto options_items, uri.query_items());
+  for (const auto& kv : options_items) {
+    options_map.emplace(kv.first, kv.second);
+  }
+
+  if (!uri.password().empty() || !uri.username().empty()) {
+    return Status::Invalid("GCS does not accept username or password.");

Review comment:
       Is it because the GCS C++ APIs don't allow doing this, or we don't want to wire them (for security perhaps)?

##########
File path: cpp/src/arrow/filesystem/gcsfs.h
##########
@@ -34,6 +35,13 @@ struct ARROW_EXPORT GcsOptions {
 
   std::string endpoint_override;
   std::string scheme;
+  // \brief Location to use for creating buckets.

Review comment:
       ```suggestion
     /// \brief Location to use for creating buckets.
   ```

##########
File path: cpp/src/arrow/filesystem/filesystem.cc
##########
@@ -686,6 +689,16 @@ Result<std::shared_ptr<FileSystem>> FileSystemFromUriReal(const Uri& uri,
     }
     return std::make_shared<LocalFileSystem>(options, io_context);
   }
+  if (scheme == "gs") {

Review comment:
       Can we also accept "gcs" for compatibility with https://gcsfs.readthedocs.io/en/latest/#integration ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#issuecomment-1046169361






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r812214641



##########
File path: cpp/src/arrow/filesystem/gcsfs.h
##########
@@ -34,6 +35,13 @@ struct ARROW_EXPORT GcsOptions {
 
   std::string endpoint_override;
   std::string scheme;
+  // \brief Location to use for creating buckets.

Review comment:
       Yes, they are [regions](https://cloud.google.com/storage/docs/locations).  I went back and forth on whether to call this location or region, but chose location because it corresponded with the API.  I don't have a strong preference.  @coryan any thoughts on this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #12475: ARROW-14893: [C++] Allow creating GCS filesystem from URI

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #12475:
URL: https://github.com/apache/arrow/pull/12475#discussion_r812215191



##########
File path: cpp/src/arrow/filesystem/filesystem.cc
##########
@@ -686,6 +689,16 @@ Result<std::shared_ptr<FileSystem>> FileSystemFromUriReal(const Uri& uri,
     }
     return std::make_shared<LocalFileSystem>(options, io_context);
   }
+  if (scheme == "gs") {

Review comment:
       Seems reasonable to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org