You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/03/01 19:47:54 UTC

[GitHub] [arrow] westonpace opened a new pull request, #34406: GH-34405: [C++] Add support for custom names in QueryOptions. Wire this up to Substrait

westonpace opened a new pull request, #34406:
URL: https://github.com/apache/arrow/pull/34406

   ### Rationale for this change
   
   Users want to be able to specify custom column names / aliases instead of using the ones generated by Acero
   
   ### What changes are included in this PR?
   
   It is now possible to specify custom column names in QueryOptions.  In addition, the python Substrait bindings now use this feature so that the Substrait plan's names will be respsected.
   
   ### Are these changes tested?
   
   Yes.  These are tested directly.  In addition, I added a python test for the Substrait bindings as this is actually a regression there and this should close https://github.com/apache/arrow/issues/33434.
   
   ### Are there any user-facing changes?
   
   There is new API surface but nothing breaking.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on a diff in pull request #34406: GH-34405: [C++] Add support for custom names in QueryOptions. Wire this up to Substrait

Posted by "lidavidm (via GitHub)" <gi...@apache.org>.
lidavidm commented on code in PR #34406:
URL: https://github.com/apache/arrow/pull/34406#discussion_r1126828055


##########
cpp/src/arrow/compute/exec/options.h:
##########
@@ -614,6 +616,13 @@ class ARROW_EXPORT TableSinkNodeOptions : public ExecNodeOptions {
   ///
   /// \see QueryOptions for more details
   std::optional<bool> sequence_output;
+  /// \brief Custom names to use for the columns.
+  ///
+  /// If specified then names must be provided for all fields. Currently, only a flat
+  /// schema is supported (see ARROW-15901).

Review Comment:
   nit: maybe we should refer to the GitHub issue instead? (I assume this was copied from above)



##########
cpp/src/arrow/compute/exec/exec_plan.cc:
##########
@@ -925,14 +943,32 @@ struct BatchConverter {
         });
   }
 
+  Result<std::shared_ptr<Schema>> InitializeSchema(
+      const std::vector<std::string>& names) {
+    // By this point this->schema will have been set by the SinkNode.  We potentially
+    // rename it with the names provided by the user and then return this in case the user
+    // wants to know the output schema.
+    if (!names.empty()) {
+      if (static_cast<int>(names.size()) != schema->num_fields()) {
+        return Status::Invalid(
+            "A plan was created with custom field names but the number of names did not "
+            "match the number of output columns");

Review Comment:
   nit: include expected/actual count in the message for convenience?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace merged pull request #34406: GH-34405: [C++] Add support for custom names in QueryOptions. Wire this up to Substrait

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace merged PR #34406:
URL: https://github.com/apache/arrow/pull/34406


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #34406: GH-34405: [C++] Add support for custom names in QueryOptions. Wire this up to Substrait

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #34406:
URL: https://github.com/apache/arrow/pull/34406#issuecomment-1468869917

   Benchmark runs are scheduled for baseline = 28e6d3649a4f5e6adcd5ca912886bd210acc1c4d and contender = bd8005151cac0470474c0c65b6a9299f2c0bde83. bd8005151cac0470474c0c65b6a9299f2c0bde83 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/90ccc3bbe8d94689a9301c5b39b349ff...ac23976006444f01bba7a96dbb0ccc55/)
   [Failed :arrow_down:1.69% :arrow_up:0.06%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/2e54d35a40dd41729714fe190a274ef3...a932b1867caf4ef49777b77efb8820cd/)
   [Finished :arrow_down:0.26% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/ef092dd5b3754017b58c01061ba7fa78...ae9c0cf1b0de47d6a9ceda0cf1a03ac4/)
   [Finished :arrow_down:0.38% :arrow_up:0.25%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f5e9457d93044dbf96e7b26326e9923c...c2972e7bb8934a3d9c09363c664c44d4/)
   Buildkite builds:
   [Finished] [`bd800515` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2492)
   [Failed] [`bd800515` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2522)
   [Finished] [`bd800515` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2490)
   [Finished] [`bd800515` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2513)
   [Finished] [`28e6d364` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2491)
   [Finished] [`28e6d364` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2521)
   [Finished] [`28e6d364` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2489)
   [Finished] [`28e6d364` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2512)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #34406: GH-34405: [C++] Add support for custom names in QueryOptions. Wire this up to Substrait

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #34406:
URL: https://github.com/apache/arrow/pull/34406#discussion_r1127121989


##########
cpp/src/arrow/compute/exec/options.h:
##########
@@ -614,6 +616,13 @@ class ARROW_EXPORT TableSinkNodeOptions : public ExecNodeOptions {
   ///
   /// \see QueryOptions for more details
   std::optional<bool> sequence_output;
+  /// \brief Custom names to use for the columns.
+  ///
+  /// If specified then names must be provided for all fields. Currently, only a flat
+  /// schema is supported (see ARROW-15901).

Review Comment:
   Ah, ARROW-15901 was the wrong issue anyways.  I've updated these to point to the correct GH- issue.



##########
cpp/src/arrow/compute/exec/exec_plan.cc:
##########
@@ -925,14 +943,32 @@ struct BatchConverter {
         });
   }
 
+  Result<std::shared_ptr<Schema>> InitializeSchema(
+      const std::vector<std::string>& names) {
+    // By this point this->schema will have been set by the SinkNode.  We potentially
+    // rename it with the names provided by the user and then return this in case the user
+    // wants to know the output schema.
+    if (!names.empty()) {
+      if (static_cast<int>(names.size()) != schema->num_fields()) {
+        return Status::Invalid(
+            "A plan was created with custom field names but the number of names did not "
+            "match the number of output columns");

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34406: GH-34405: [C++] Add support for custom names in QueryOptions. Wire this up to Substrait

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34406:
URL: https://github.com/apache/arrow/pull/34406#issuecomment-1450756017

   :warning: GitHub issue #34405 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #34406: GH-34405: [C++] Add support for custom names in QueryOptions. Wire this up to Substrait

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #34406:
URL: https://github.com/apache/arrow/pull/34406#issuecomment-1450755961

   * Closes: #34405


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org