You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "coryan (via GitHub)" <gi...@apache.org> on 2023/06/22 00:45:36 UTC

[GitHub] [arrow] coryan opened a new pull request, #36228: GH-36227: [C++] New GcsOption to set the project id

coryan opened a new pull request, #36228:
URL: https://github.com/apache/arrow/pull/36228

   ### Rationale for this change
   
   This fixes #36227, originally motivated by the problems in #36119, but seems like a valuable feature in any case.
   
   ### What changes are included in this PR?
   
   - Refactor some code to make it testable.
   - Add a new `std::optional<std::string>` field to the `GcsOptions` class.
   
   ### Are these changes tested?
   
   Yes, I expanded the unit tests.
   
   ### Are there any user-facing changes?
   
   Yes. I updated the field documentation.  If I missed some documentation please let me know.
   
   I am also not familiar with the steps required to update the Python wrappers, if there is some documentation to follow I would appreciate it.  I can expand this PR or send a separate one, your call.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36228:
URL: https://github.com/apache/arrow/pull/36228#issuecomment-1605875426

   Conbench analyzed the 5 benchmark runs on commit `ec413b77`.
   
   There were no benchmark performance regressions. 🎉
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14530339074) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coryan commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "coryan (via GitHub)" <gi...@apache.org>.
coryan commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1237949929


##########
cpp/src/arrow/filesystem/gcsfs_internal.h:
##########
@@ -30,6 +32,15 @@
 
 namespace arrow {
 namespace fs {
+struct GcsCredentialsHolder {

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coryan commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "coryan (via GitHub)" <gi...@apache.org>.
coryan commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1246695797


##########
cpp/src/arrow/filesystem/gcsfs.h:
##########
@@ -77,6 +83,13 @@ struct ARROW_EXPORT GcsOptions {
   /// This will be ignored if non-empty metadata is passed to OpenOutputStream.
   std::shared_ptr<const KeyValueMetadata> default_metadata;
 
+  /// \brief The project to use for creating buckets.
+  ///
+  /// If not set, the library uses the GOOGLE_CLOUD_PROJECT environment
+  /// variable. Most I/O operations do not need a project id, only applications
+  /// that create new buckets need a project id.
+  std::optional<std::string> project_id;

Review Comment:
   No worries at all, I am sure your fixes will be better.  And I am getting very efficient at "did nothing" 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] raulcd commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "raulcd (via GitHub)" <gi...@apache.org>.
raulcd commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1246669295


##########
cpp/src/arrow/filesystem/gcsfs.h:
##########
@@ -77,6 +83,13 @@ struct ARROW_EXPORT GcsOptions {
   /// This will be ignored if non-empty metadata is passed to OpenOutputStream.
   std::shared_ptr<const KeyValueMetadata> default_metadata;
 
+  /// \brief The project to use for creating buckets.
+  ///
+  /// If not set, the library uses the GOOGLE_CLOUD_PROJECT environment
+  /// variable. Most I/O operations do not need a project id, only applications
+  /// that create new buckets need a project id.
+  std::optional<std::string> project_id;

Review Comment:
   hey @coryan I gave the Python PR a try to try and fix some of the CI failures. I am trying to fix some of our nightly builds to try and get to a better state before the code freeze. Sorry for stepping in :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36228:
URL: https://github.com/apache/arrow/pull/36228#issuecomment-1601864500

   :warning: GitHub issue #36227 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1237975638


##########
cpp/src/arrow/filesystem/gcsfs_internal.h:
##########
@@ -30,6 +32,15 @@
 
 namespace arrow {
 namespace fs {
+struct GcsCredentialsHolder {

Review Comment:
   Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou merged pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou merged PR #36228:
URL: https://github.com/apache/arrow/pull/36228


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coryan commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "coryan (via GitHub)" <gi...@apache.org>.
coryan commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1246660262


##########
cpp/src/arrow/filesystem/gcsfs.h:
##########
@@ -77,6 +83,13 @@ struct ARROW_EXPORT GcsOptions {
   /// This will be ignored if non-empty metadata is passed to OpenOutputStream.
   std::shared_ptr<const KeyValueMetadata> default_metadata;
 
+  /// \brief The project to use for creating buckets.
+  ///
+  /// If not set, the library uses the GOOGLE_CLOUD_PROJECT environment
+  /// variable. Most I/O operations do not need a project id, only applications
+  /// that create new buckets need a project id.
+  std::optional<std::string> project_id;

Review Comment:
   You are probably right.  There is a difference between "not set" (which implies "use the `GOOGLE_CLOUD_PROJECT` environment variable") and "set to empty" (which implies "ignore the environment variable and use the empty string"). Is this distinction useful?  Arguably not.
   
   I am working on the Python fixes today, I can change this to `std::string` if you think it is best to do so.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coryan commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "coryan (via GitHub)" <gi...@apache.org>.
coryan commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1237938323


##########
cpp/src/arrow/filesystem/gcsfs_internal.h:
##########
@@ -30,6 +32,15 @@
 
 namespace arrow {
 namespace fs {
+struct GcsCredentialsHolder {

Review Comment:
   I am not sure we want to do this.  That symbol was not exported before, and was not part of the public API:
   
   https://github.com/apache/arrow/blob/cd6b2b5dca468cee6704ade4ce539c0a09874607/cpp/src/arrow/filesystem/gcsfs.cc#L38-L43
   
   arguably it should live in `arrow::fs::internal`.  Thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1237889042


##########
cpp/src/arrow/filesystem/gcsfs_internal.h:
##########
@@ -30,6 +32,15 @@
 
 namespace arrow {
 namespace fs {
+struct GcsCredentialsHolder {

Review Comment:
   ```suggestion
   struct ARROW_EXPORT GcsCredentialsHolder {
   ```



##########
cpp/src/arrow/filesystem/gcsfs_internal.h:
##########
@@ -53,6 +64,8 @@ ARROW_EXPORT Result<std::shared_ptr<const KeyValueMetadata>> FromObjectMetadata(
 
 ARROW_EXPORT std::int64_t Depth(std::string_view path);
 
+google::cloud::Options AsGoogleCloudOptions(const GcsOptions& options);

Review Comment:
   ```suggestion
   ARROW_EXPORT google::cloud::Options AsGoogleCloudOptions(const GcsOptions& options);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1246637351


##########
cpp/src/arrow/filesystem/gcsfs.h:
##########
@@ -77,6 +83,13 @@ struct ARROW_EXPORT GcsOptions {
   /// This will be ignored if non-empty metadata is passed to OpenOutputStream.
   std::shared_ptr<const KeyValueMetadata> default_metadata;
 
+  /// \brief The project to use for creating buckets.
+  ///
+  /// If not set, the library uses the GOOGLE_CLOUD_PROJECT environment
+  /// variable. Most I/O operations do not need a project id, only applications
+  /// that create new buckets need a project id.
+  std::optional<std::string> project_id;

Review Comment:
   FTR, an optional was probably not necessary as I suppose an empty string cannot possibly be a valid project id?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] pitrou commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1246663304


##########
cpp/src/arrow/filesystem/gcsfs.h:
##########
@@ -77,6 +83,13 @@ struct ARROW_EXPORT GcsOptions {
   /// This will be ignored if non-empty metadata is passed to OpenOutputStream.
   std::shared_ptr<const KeyValueMetadata> default_metadata;
 
+  /// \brief The project to use for creating buckets.
+  ///
+  /// If not set, the library uses the GOOGLE_CLOUD_PROJECT environment
+  /// variable. Most I/O operations do not need a project id, only applications
+  /// that create new buckets need a project id.
+  std::optional<std::string> project_id;

Review Comment:
   It's probably ok to keep the optional.
   There's already a PR for Python here: https://github.com/apache/arrow/pull/36376



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] coryan commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "coryan (via GitHub)" <gi...@apache.org>.
coryan commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1237937479


##########
cpp/src/arrow/filesystem/gcsfs_internal.h:
##########
@@ -53,6 +64,8 @@ ARROW_EXPORT Result<std::shared_ptr<const KeyValueMetadata>> FromObjectMetadata(
 
 ARROW_EXPORT std::int64_t Depth(std::string_view path);
 
+google::cloud::Options AsGoogleCloudOptions(const GcsOptions& options);

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on a diff in pull request #36228: GH-36227: [C++] New GcsOption to set the project id

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on code in PR #36228:
URL: https://github.com/apache/arrow/pull/36228#discussion_r1237941820


##########
cpp/src/arrow/filesystem/gcsfs_internal.h:
##########
@@ -30,6 +32,15 @@
 
 namespace arrow {
 namespace fs {
+struct GcsCredentialsHolder {

Review Comment:
   > it should live in `arrow::fs::internal`. 
   
   Ah, yes. I agree with it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org