You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/22 22:06:06 UTC

[GitHub] [arrow] ArianaVillegas opened a new pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

ArianaVillegas opened a new pull request #12031:
URL: https://github.com/apache/arrow/pull/12031


   Add indentation to ToString() function in ExecPlan. 
   
   For example if we have the following graph of execution plan:
   ![image](https://user-images.githubusercontent.com/40250321/147159958-79ad4ad4-db3d-435e-bf01-bb5ea1162730.png)
   
   The execution plan string will be:
   ```
   6 node
       5 node
            4 node
                2 node
            3 node
                2 node
                1 node
   ```
    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ArianaVillegas commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ArianaVillegas commented on a change in pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#discussion_r774807237



##########
File path: cpp/src/arrow/compute/exec/exec_plan.cc
##########
@@ -157,11 +157,46 @@ struct ExecPlanImpl : public ExecPlan {
     return std::move(Impl{nodes_}.sorted);
   }
 
+  std::pair<NodeVector, std::vector<int>> OrderedNodes() const {
+    struct Impl {
+      const std::vector<std::unique_ptr<ExecNode>>& nodes;
+      std::unordered_set<ExecNode*> visited;
+      NodeVector sorted;
+      std::vector<int> indents;
+
+      explicit Impl(const std::vector<std::unique_ptr<ExecNode>>& nodes) : nodes(nodes) {
+        visited.reserve(nodes.size());
+
+        for (auto it = nodes.rbegin(); it != nodes.rend(); ++it) {
+          if (visited.count(it->get()) != 0) return;

Review comment:
       Yes, I think we should take care of odd cases too. I'll change it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-1007001783


   Benchmark runs are scheduled for baseline = 1e7bfa24c579887f324982a27c0e06f6f9f5a803 and contender = e64480db51fc9622d02613f3ec60bac34d765092. e64480db51fc9622d02613f3ec60bac34d765092 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7ae812ab84b7425eaf0ba278acd0b69d...37f7dcb0c33f4b01895fe2e3068890c7/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c77114d10e154f93a6ab7569dff54a07...c606099ce8a342fb869028700c6f0019/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c6308aa0bcf244b899eded3b4235559f...d9415ec5d54e4a78aed84f59f692a6bb/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ArianaVillegas commented on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ArianaVillegas commented on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-1004376003


   @westonpace, I think we can add join type in hash join nodes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace closed pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
westonpace closed pull request #12031:
URL: https://github.com/apache/arrow/pull/12031


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-1007001783


   Benchmark runs are scheduled for baseline = 1e7bfa24c579887f324982a27c0e06f6f9f5a803 and contender = e64480db51fc9622d02613f3ec60bac34d765092. e64480db51fc9622d02613f3ec60bac34d765092 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7ae812ab84b7425eaf0ba278acd0b69d...37f7dcb0c33f4b01895fe2e3068890c7/)
   [Failed :arrow_down:0.45% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c77114d10e154f93a6ab7569dff54a07...c606099ce8a342fb869028700c6f0019/)
   [Finished :arrow_down:0.31% :arrow_up:0.13%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c6308aa0bcf244b899eded3b4235559f...d9415ec5d54e4a78aed84f59f692a6bb/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ArianaVillegas commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ArianaVillegas commented on a change in pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#discussion_r776899565



##########
File path: cpp/src/arrow/compute/exec/plan_test.cc
##########
@@ -301,11 +301,11 @@ TEST(ExecPlan, ToString) {
                     {"sink", SinkNodeOptions{&sink_gen}},
                 })
                 .AddToPlan(plan.get()));
-  EXPECT_EQ(plan->sources()[0]->ToString(), R"(:SourceNode{outputs=[:SinkNode]})");
-  EXPECT_EQ(plan->sinks()[0]->ToString(), R"(:SinkNode{inputs=[collected=:SourceNode]})");
+  EXPECT_EQ(plan->sources()[0]->ToString(), R"(:SourceNode{})");
+  EXPECT_EQ(plan->sinks()[0]->ToString(), R"(:SinkNode{})");
   EXPECT_EQ(plan->ToString(), R"(ExecPlan with 2 nodes:
-:SourceNode{outputs=[:SinkNode]}
-:SinkNode{inputs=[collected=:SourceNode]}
+:SinkNode{}
+  :SourceNode{}

Review comment:
       The format is label: kind_node, so when a node has no label, only: kind_node remains. We could prevent the colons from appearing when there is no label.

##########
File path: cpp/src/arrow/compute/exec/plan_test.cc
##########
@@ -301,11 +301,11 @@ TEST(ExecPlan, ToString) {
                     {"sink", SinkNodeOptions{&sink_gen}},
                 })
                 .AddToPlan(plan.get()));
-  EXPECT_EQ(plan->sources()[0]->ToString(), R"(:SourceNode{outputs=[:SinkNode]})");
-  EXPECT_EQ(plan->sinks()[0]->ToString(), R"(:SinkNode{inputs=[collected=:SourceNode]})");
+  EXPECT_EQ(plan->sources()[0]->ToString(), R"(:SourceNode{})");
+  EXPECT_EQ(plan->sinks()[0]->ToString(), R"(:SinkNode{})");
   EXPECT_EQ(plan->ToString(), R"(ExecPlan with 2 nodes:
-:SourceNode{outputs=[:SinkNode]}
-:SinkNode{inputs=[collected=:SourceNode]}
+:SinkNode{}
+  :SourceNode{}

Review comment:
       The format is label: kind_node, so when a node has no label, only: kind_node remains. We could prevent the colons from appearing when there is no label.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ArianaVillegas commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ArianaVillegas commented on a change in pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#discussion_r774809343



##########
File path: cpp/src/arrow/compute/exec/exec_plan.cc
##########
@@ -157,11 +157,46 @@ struct ExecPlanImpl : public ExecPlan {
     return std::move(Impl{nodes_}.sorted);
   }
 
+  std::pair<NodeVector, std::vector<int>> OrderedNodes() const {
+    struct Impl {
+      const std::vector<std::unique_ptr<ExecNode>>& nodes;
+      std::unordered_set<ExecNode*> visited;
+      NodeVector sorted;
+      std::vector<int> indents;
+
+      explicit Impl(const std::vector<std::unique_ptr<ExecNode>>& nodes) : nodes(nodes) {
+        visited.reserve(nodes.size());
+
+        for (auto it = nodes.rbegin(); it != nodes.rend(); ++it) {
+          if (visited.count(it->get()) != 0) return;
+          Visit(it->get());
+        }
+
+        DCHECK_EQ(visited.size(), nodes.size());
+      }
+
+      void Visit(ExecNode* node, int indent = 0) {
+        for (auto input : node->inputs()) {
+          Visit(input, indent + 1);
+        }
+
+        indents.push_back(indent);
+        sorted.push_back(node);
+        visited.insert(node);
+      }
+    };
+
+    auto result = Impl{nodes_};

Review comment:
       I think it is not necessary because the function OrderedNodes() is a variant of TopoSort (), so with that, it should reflect the direction of data flow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ArianaVillegas commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ArianaVillegas commented on a change in pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#discussion_r774813812



##########
File path: cpp/src/arrow/compute/exec/plan_test.cc
##########
@@ -374,13 +374,13 @@ custom_sink_label:OrderBySinkNode{inputs=[collected=:FilterNode], by={sort_keys=
           })
           .AddToPlan(plan.get()));
   EXPECT_EQ(plan->ToString(), R"a(ExecPlan with 5 nodes:
-lhs:SourceNode{outputs=[:UnionNode]}
-rhs:SourceNode{outputs=[:UnionNode]}
-:UnionNode{inputs=[input_0_label=lhs:SourceNode, input_1_label=rhs:SourceNode], outputs=[:ScalarAggregateNode]}
-:ScalarAggregateNode{inputs=[target=:UnionNode], outputs=[:SinkNode], aggregates=[
+:SinkNode{inputs=[collected=:ScalarAggregateNode]}
+  :ScalarAggregateNode{inputs=[target=:UnionNode], outputs=[:SinkNode], aggregates=[
 	count(i32, {mode=NON_NULL}),
 ]}
-:SinkNode{inputs=[collected=:ScalarAggregateNode]}
+    :UnionNode{inputs=[input_0_label=lhs:SourceNode, input_1_label=rhs:SourceNode], outputs=[:ScalarAggregateNode]}

Review comment:
       I think we don't need it anymore because indentation gives us an idea about which nodes are inputs and ouputs.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
pitrou commented on a change in pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#discussion_r776816593



##########
File path: cpp/src/arrow/compute/exec/plan_test.cc
##########
@@ -301,11 +301,11 @@ TEST(ExecPlan, ToString) {
                     {"sink", SinkNodeOptions{&sink_gen}},
                 })
                 .AddToPlan(plan.get()));
-  EXPECT_EQ(plan->sources()[0]->ToString(), R"(:SourceNode{outputs=[:SinkNode]})");
-  EXPECT_EQ(plan->sinks()[0]->ToString(), R"(:SinkNode{inputs=[collected=:SourceNode]})");
+  EXPECT_EQ(plan->sources()[0]->ToString(), R"(:SourceNode{})");
+  EXPECT_EQ(plan->sinks()[0]->ToString(), R"(:SinkNode{})");
   EXPECT_EQ(plan->ToString(), R"(ExecPlan with 2 nodes:
-:SourceNode{outputs=[:SinkNode]}
-:SinkNode{inputs=[collected=:SourceNode]}
+:SinkNode{}
+  :SourceNode{}

Review comment:
       I'm curious, why are there colons at the start of the nodes?

##########
File path: cpp/src/arrow/compute/exec/exec_plan.cc
##########
@@ -157,11 +157,50 @@ struct ExecPlanImpl : public ExecPlan {
     return std::move(Impl{nodes_}.sorted);
   }
 
+  std::pair<NodeVector, std::vector<int>> OrderedNodes() const {

Review comment:
       What does this do, is this different from the topological sort above? In any case, please add a comment describing this (what does it return?).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
westonpace commented on a change in pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#discussion_r777654664



##########
File path: cpp/src/arrow/compute/exec/plan_test.cc
##########
@@ -301,11 +301,11 @@ TEST(ExecPlan, ToString) {
                     {"sink", SinkNodeOptions{&sink_gen}},
                 })
                 .AddToPlan(plan.get()));
-  EXPECT_EQ(plan->sources()[0]->ToString(), R"(:SourceNode{outputs=[:SinkNode]})");
-  EXPECT_EQ(plan->sinks()[0]->ToString(), R"(:SinkNode{inputs=[collected=:SourceNode]})");
+  EXPECT_EQ(plan->sources()[0]->ToString(), R"(:SourceNode{})");
+  EXPECT_EQ(plan->sinks()[0]->ToString(), R"(:SinkNode{})");
   EXPECT_EQ(plan->ToString(), R"(ExecPlan with 2 nodes:
-:SourceNode{outputs=[:SinkNode]}
-:SinkNode{inputs=[collected=:SourceNode]}
+:SinkNode{}
+  :SourceNode{}

Review comment:
       I would not worry about it.
   
   `ExecPlan::AddNode` will attach a label (auto-incrementing id counter) if none is set so there should not be any valid cases where there is no label.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-1007001783


   Benchmark runs are scheduled for baseline = 1e7bfa24c579887f324982a27c0e06f6f9f5a803 and contender = e64480db51fc9622d02613f3ec60bac34d765092. e64480db51fc9622d02613f3ec60bac34d765092 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7ae812ab84b7425eaf0ba278acd0b69d...37f7dcb0c33f4b01895fe2e3068890c7/)
   [Failed :arrow_down:0.45% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c77114d10e154f93a6ab7569dff54a07...c606099ce8a342fb869028700c6f0019/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c6308aa0bcf244b899eded3b4235559f...d9415ec5d54e4a78aed84f59f692a6bb/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#discussion_r774824053



##########
File path: cpp/src/arrow/compute/exec/exec_plan.cc
##########
@@ -157,11 +157,46 @@ struct ExecPlanImpl : public ExecPlan {
     return std::move(Impl{nodes_}.sorted);
   }
 
+  std::pair<NodeVector, std::vector<int>> OrderedNodes() const {
+    struct Impl {
+      const std::vector<std::unique_ptr<ExecNode>>& nodes;
+      std::unordered_set<ExecNode*> visited;
+      NodeVector sorted;
+      std::vector<int> indents;
+
+      explicit Impl(const std::vector<std::unique_ptr<ExecNode>>& nodes) : nodes(nodes) {
+        visited.reserve(nodes.size());
+
+        for (auto it = nodes.rbegin(); it != nodes.rend(); ++it) {
+          if (visited.count(it->get()) != 0) return;
+          Visit(it->get());
+        }
+
+        DCHECK_EQ(visited.size(), nodes.size());
+      }
+
+      void Visit(ExecNode* node, int indent = 0) {
+        for (auto input : node->inputs()) {
+          Visit(input, indent + 1);

Review comment:
       It should not be possible, but I don't think anything really checks for this, so the suggestion is just to be cautious. It's not strictly required, though, just a suggestion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ArianaVillegas commented on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ArianaVillegas commented on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-999915435


   @aucahuasi @lidavidm 
   cc @westonpace 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ArianaVillegas commented on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ArianaVillegas commented on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-1005180841


   I think this PR is ready to be merged. Thanks so much for your feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-1007001783


   Benchmark runs are scheduled for baseline = 1e7bfa24c579887f324982a27c0e06f6f9f5a803 and contender = e64480db51fc9622d02613f3ec60bac34d765092. e64480db51fc9622d02613f3ec60bac34d765092 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7ae812ab84b7425eaf0ba278acd0b69d...37f7dcb0c33f4b01895fe2e3068890c7/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c77114d10e154f93a6ab7569dff54a07...c606099ce8a342fb869028700c6f0019/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c6308aa0bcf244b899eded3b4235559f...d9415ec5d54e4a78aed84f59f692a6bb/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] lidavidm commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
lidavidm commented on a change in pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#discussion_r774585813



##########
File path: cpp/src/arrow/compute/exec/exec_plan.cc
##########
@@ -157,11 +157,46 @@ struct ExecPlanImpl : public ExecPlan {
     return std::move(Impl{nodes_}.sorted);
   }
 
+  std::pair<NodeVector, std::vector<int>> OrderedNodes() const {
+    struct Impl {
+      const std::vector<std::unique_ptr<ExecNode>>& nodes;
+      std::unordered_set<ExecNode*> visited;
+      NodeVector sorted;
+      std::vector<int> indents;
+
+      explicit Impl(const std::vector<std::unique_ptr<ExecNode>>& nodes) : nodes(nodes) {
+        visited.reserve(nodes.size());
+
+        for (auto it = nodes.rbegin(); it != nodes.rend(); ++it) {
+          if (visited.count(it->get()) != 0) return;

Review comment:
       Hmm, should this be `continue;` not `return;`?

##########
File path: cpp/src/arrow/compute/exec/exec_plan.cc
##########
@@ -157,11 +157,46 @@ struct ExecPlanImpl : public ExecPlan {
     return std::move(Impl{nodes_}.sorted);
   }
 
+  std::pair<NodeVector, std::vector<int>> OrderedNodes() const {
+    struct Impl {
+      const std::vector<std::unique_ptr<ExecNode>>& nodes;
+      std::unordered_set<ExecNode*> visited;
+      NodeVector sorted;
+      std::vector<int> indents;
+
+      explicit Impl(const std::vector<std::unique_ptr<ExecNode>>& nodes) : nodes(nodes) {
+        visited.reserve(nodes.size());
+
+        for (auto it = nodes.rbegin(); it != nodes.rend(); ++it) {
+          if (visited.count(it->get()) != 0) return;
+          Visit(it->get());
+        }
+
+        DCHECK_EQ(visited.size(), nodes.size());
+      }
+
+      void Visit(ExecNode* node, int indent = 0) {
+        for (auto input : node->inputs()) {
+          Visit(input, indent + 1);
+        }
+
+        indents.push_back(indent);
+        sorted.push_back(node);
+        visited.insert(node);
+      }
+    };
+
+    auto result = Impl{nodes_};

Review comment:
       Should this be `TopoSort()`? It seems `nodes_` reflects the order that nodes were added to the plan, which probably, but not necessarily, corresponds with direction of data flow.

##########
File path: cpp/src/arrow/compute/exec/exec_plan.cc
##########
@@ -157,11 +157,46 @@ struct ExecPlanImpl : public ExecPlan {
     return std::move(Impl{nodes_}.sorted);
   }
 
+  std::pair<NodeVector, std::vector<int>> OrderedNodes() const {
+    struct Impl {
+      const std::vector<std::unique_ptr<ExecNode>>& nodes;
+      std::unordered_set<ExecNode*> visited;
+      NodeVector sorted;
+      std::vector<int> indents;
+
+      explicit Impl(const std::vector<std::unique_ptr<ExecNode>>& nodes) : nodes(nodes) {
+        visited.reserve(nodes.size());
+
+        for (auto it = nodes.rbegin(); it != nodes.rend(); ++it) {
+          if (visited.count(it->get()) != 0) return;

Review comment:
       I suppose it'd only matter if we have something like two sink nodes (which may be an odd case).

##########
File path: cpp/src/arrow/compute/exec/plan_test.cc
##########
@@ -374,13 +374,13 @@ custom_sink_label:OrderBySinkNode{inputs=[collected=:FilterNode], by={sort_keys=
           })
           .AddToPlan(plan.get()));
   EXPECT_EQ(plan->ToString(), R"a(ExecPlan with 5 nodes:
-lhs:SourceNode{outputs=[:UnionNode]}
-rhs:SourceNode{outputs=[:UnionNode]}
-:UnionNode{inputs=[input_0_label=lhs:SourceNode, input_1_label=rhs:SourceNode], outputs=[:ScalarAggregateNode]}
-:ScalarAggregateNode{inputs=[target=:UnionNode], outputs=[:SinkNode], aggregates=[
+:SinkNode{inputs=[collected=:ScalarAggregateNode]}
+  :ScalarAggregateNode{inputs=[target=:UnionNode], outputs=[:SinkNode], aggregates=[
 	count(i32, {mode=NON_NULL}),
 ]}
-:SinkNode{inputs=[collected=:ScalarAggregateNode]}
+    :UnionNode{inputs=[input_0_label=lhs:SourceNode, input_1_label=rhs:SourceNode], outputs=[:ScalarAggregateNode]}

Review comment:
       Do we still want to print `inputs` and `outputs` for every node?

##########
File path: cpp/src/arrow/compute/exec/exec_plan.cc
##########
@@ -157,11 +157,46 @@ struct ExecPlanImpl : public ExecPlan {
     return std::move(Impl{nodes_}.sorted);
   }
 
+  std::pair<NodeVector, std::vector<int>> OrderedNodes() const {
+    struct Impl {
+      const std::vector<std::unique_ptr<ExecNode>>& nodes;
+      std::unordered_set<ExecNode*> visited;
+      NodeVector sorted;
+      std::vector<int> indents;
+
+      explicit Impl(const std::vector<std::unique_ptr<ExecNode>>& nodes) : nodes(nodes) {
+        visited.reserve(nodes.size());
+
+        for (auto it = nodes.rbegin(); it != nodes.rend(); ++it) {
+          if (visited.count(it->get()) != 0) return;
+          Visit(it->get());
+        }
+
+        DCHECK_EQ(visited.size(), nodes.size());
+      }
+
+      void Visit(ExecNode* node, int indent = 0) {
+        for (auto input : node->inputs()) {
+          Visit(input, indent + 1);

Review comment:
       Should we also check that `input` is not in `visited` just to ensure we don't get trapped if there is somehow a cycle?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-999908643


   https://issues.apache.org/jira/browse/ARROW-15138


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ArianaVillegas commented on a change in pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ArianaVillegas commented on a change in pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#discussion_r774819817



##########
File path: cpp/src/arrow/compute/exec/exec_plan.cc
##########
@@ -157,11 +157,46 @@ struct ExecPlanImpl : public ExecPlan {
     return std::move(Impl{nodes_}.sorted);
   }
 
+  std::pair<NodeVector, std::vector<int>> OrderedNodes() const {
+    struct Impl {
+      const std::vector<std::unique_ptr<ExecNode>>& nodes;
+      std::unordered_set<ExecNode*> visited;
+      NodeVector sorted;
+      std::vector<int> indents;
+
+      explicit Impl(const std::vector<std::unique_ptr<ExecNode>>& nodes) : nodes(nodes) {
+        visited.reserve(nodes.size());
+
+        for (auto it = nodes.rbegin(); it != nodes.rend(); ++it) {
+          if (visited.count(it->get()) != 0) return;
+          Visit(it->get());
+        }
+
+        DCHECK_EQ(visited.size(), nodes.size());
+      }
+
+      void Visit(ExecNode* node, int indent = 0) {
+        for (auto input : node->inputs()) {
+          Visit(input, indent + 1);

Review comment:
       Is it possible to have a cycle? It sounds interesting, can you give me an example, please?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ArianaVillegas commented on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ArianaVillegas commented on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-1003225341


   @pitrou, I add a comment to describe this function


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-1007001783






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

Posted by GitBox <gi...@apache.org>.
westonpace commented on pull request #12031:
URL: https://github.com/apache/arrow/pull/12031#issuecomment-1004306629


   I have a test plan which is a hard-coded version of one of the TPC-H queries and, using this, I think indentation is missing in the GroupByNode's aggregates.  But, this is much better than I was getting before.  Also, it would be nice if we had some info on the hash join node's options, but that could be left for a follow-up PR as it isn't related to indentation:
   
   ```
   10:SinkNode{}
     9:GroupByNode{keys=["o_orderpriority"], aggregates=[
   	hash_sum(order_count, {skip_nulls=true, min_count=1}),
   ]} // <-- This should be indented
       8:ProjectNode{projection=["order_count": 1, o_orderpriority]}
         7:ProjectNode{projection=[l_orderkey, o_orderpriority]}
           6:GroupByNode{keys=["l_orderkey", "o_orderpriority"], aggregates=[
   ]} // <-- This should be indented
             5:ProjectNode{projection=[l_orderkey, o_orderpriority]}
               4:HashJoinNode{/* would be nice to have some details here */}
                 3:ProjectNode{projection=[o_orderkey, o_orderpriority]}
                   2:SourceNode{}
                 1:FilterNode{filter=(l_commitdate < l_receiptdate)}
                   0:SourceNode{}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org