You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/05/01 20:55:00 UTC

[GitHub] [arrow] westonpace opened a new pull request, #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

westonpace opened a new pull request, #35384:
URL: https://github.com/apache/arrow/pull/35384

### Rationale for this change

The recent change (#34912) calculates the max concurrency using `plan->query_context()->executor()->GetCapacity()`. This is later used to initialize the kernel states. However, this is different than what we used to use. The previous method used was `plan->query_context()->max_concurrency()` which is slightly different(if the aggregate node IS run in parallel then we initialize one state for each CPU thread, one for each I/O thread, and one for the calling user thread).

This is unfortunately a bit complicated as `max_concurrency` would not be a good indicator to use when determining if the plan is running in parallel or not. So we need to query both properties and use them in their respective spots.

### What changes are included in this PR?

Now, `max_concurrency` is used to figure out how many thread local states need to be initialized and `GetCapacity` is used to figure out if there are multiple CPU threads or not.

### Are these changes tested?

The bug was caught by the benchmarks which is a bit concerning. Most of the CI have a very small number of CPU threads and don't experience much concurrency and so I think we just didn't see this pattern. Or possibly, this pattern is only experienced in the legacy way that pyarrow launches exec plans.

### Are there any user-facing changes?

No.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1532194606

   ['Python', 'R'] benchmarks have high level of regressions.
   [test-mac-arm](https://conbench.ursa.dev/compare/runs/cbdabb928df0445aa4633ca5da01d747...f7e37e1f5ab048059aaa08c55ee83438/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] icexelloss commented on a diff in pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.

icexelloss commented on code in PR #35384:
URL: https://github.com/apache/arrow/pull/35384#discussion_r1181994491


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -398,25 +398,25 @@ class ScalarAggregateNode : public ExecNode, public TracedNode {
     auto aggregates = aggregate_options.aggregates;
     const auto& keys = aggregate_options.keys;
     const auto& segment_keys = aggregate_options.segment_keys;
-    const auto concurreny =
-        plan->query_context()->exec_context()->executor()->GetCapacity();
+    const auto concurrency = plan->query_context()->max_concurrency();
+    // We can't use concurrency == 1 because that include I/O concurrency

Review Comment:
   Do you have an issue with this code?
   https://github.com/apache/arrow/blob/be12888997c81b1fb7947f6284be1256edd4d3e4/cpp/src/arrow/acero/hash_join_node.cc#L929



##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -398,25 +398,25 @@ class ScalarAggregateNode : public ExecNode, public TracedNode {
     auto aggregates = aggregate_options.aggregates;
     const auto& keys = aggregate_options.keys;
     const auto& segment_keys = aggregate_options.segment_keys;
-    const auto concurreny =
-        plan->query_context()->exec_context()->executor()->GetCapacity();
+    const auto concurrency = plan->query_context()->max_concurrency();
+    // We can't use concurrency == 1 because that include I/O concurrency

Review Comment:
   Do we have an issue with this code?
   https://github.com/apache/arrow/blob/be12888997c81b1fb7947f6284be1256edd4d3e4/cpp/src/arrow/acero/hash_join_node.cc#L929



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] icexelloss commented on a diff in pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "icexelloss (via GitHub)" <gi...@apache.org>.

icexelloss commented on code in PR #35384:
URL: https://github.com/apache/arrow/pull/35384#discussion_r1182556828


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -398,25 +398,25 @@ class ScalarAggregateNode : public ExecNode, public TracedNode {
     auto aggregates = aggregate_options.aggregates;
     const auto& keys = aggregate_options.keys;
     const auto& segment_keys = aggregate_options.segment_keys;
-    const auto concurreny =
-        plan->query_context()->exec_context()->executor()->GetCapacity();
+    const auto concurrency = plan->query_context()->max_concurrency();
+    // We can't use concurrency == 1 because that include I/O concurrency

Review Comment:
   I see. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1531859922

   > Does this mean that:
   > 
   >     1. `plan->query_context()->executor()->GetCapacity()`: config cpu threads
   > 
   >     2. `plan->query_context()->max_concurrency()` query maximum DOP
   > 
   > 
   > Am I right?
   
   @mapleFU 
   
   Yes, that is correct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1530244253

   * Closes: #35383


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1530862168

   ['Python', 'R'] benchmarks have high level of regressions.
   [test-mac-arm](https://conbench.ursa.dev/compare/runs/e33671654caa403daaf24724a74617bd...8fc11d0806e54829b3d00b3e675cbeaf/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on a diff in pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on code in PR #35384:
URL: https://github.com/apache/arrow/pull/35384#discussion_r1182068507


##########
cpp/src/arrow/acero/aggregate_node.cc:
##########
@@ -398,25 +398,25 @@ class ScalarAggregateNode : public ExecNode, public TracedNode {
     auto aggregates = aggregate_options.aggregates;
     const auto& keys = aggregate_options.keys;
     const auto& segment_keys = aggregate_options.segment_keys;
-    const auto concurreny =
-        plan->query_context()->exec_context()->executor()->GetCapacity();
+    const auto concurrency = plan->query_context()->max_concurrency();
+    // We can't use concurrency == 1 because that include I/O concurrency

Review Comment:
   I don't think so.  You'll notice, in the very next line, it is doing the same calculation that max_concurrency does.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1530855772

   The mac arm benchmarks are now passing.  It looks like the 9960x is offline so I can't actually tell.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1530244340

   :warning: GitHub issue #35383 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1530418430

   @ursabot please benchmark lang=R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1530418863

   Benchmark runs are scheduled for baseline = be12888997c81b1fb7947f6284be1256edd4d3e4 and contender = 1e9a2d59f5ac7bb2a3c52ca9eb45ba035864b034. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Only ['Python'] langs are supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/cbfbc9871a1846e3bb5326528317068b...97043805cbd7425faa6c3c80b571ef8a/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/e33671654caa403daaf24724a74617bd...8fc11d0806e54829b3d00b3e675cbeaf/)
   [Scheduled :warning: ursa-i9-9960x is offline.] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/6d138698130349e496384450e7a95c04...eb3c449eeae140a081075eef9fc9e306/)
   [Skipped :warning: Only ['C++', 'Java'] langs are supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/4bbf47eecfe94ee5a7bdd779d2fd1c9d...eb82b58b05264ba697fe38c84e484e4c/)
   Buildkite builds:
   [Scheduled] [`1e9a2d59` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2839)
   [Scheduled] [`1e9a2d59` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2803)
   [Finished] [`be128889` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2804)
   [Scheduled] [`be128889` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2838)
   [Scheduled] [`be128889` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2802)
   [Scheduled] [`be128889` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2829)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] mapleFU commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.

mapleFU commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1531748920

   Does this mean that:
   1. `plan->query_context()->executor()->GetCapacity()`: config cpu threads
   2. `plan->query_context()->max_concurrency()` query maximum DOP
   
   Am I right?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace merged pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace merged PR #35384:
URL: https://github.com/apache/arrow/pull/35384


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #35384: GH-35383: [C++] Prefer max_concurrency over executor capacity to avoid segmentation fault

Posted by "ursabot (via GitHub)" <gi...@apache.org>.

ursabot commented on PR #35384:
URL: https://github.com/apache/arrow/pull/35384#issuecomment-1532193538

   Benchmark runs are scheduled for baseline = a3a01726b1ca195c3d26fec084c0ff19453e6866 and contender = f74d7473a79f3c7b1e6221272509037c7fe421cb. f74d7473a79f3c7b1e6221272509037c7fe421cb is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3e93bbe3cd1a497188cffea93fcc0b85...114a81ea85aa40b4afb2b5fa41c7578e/)
   [Failed :arrow_down:1.39% :arrow_up:0.12%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/cbdabb928df0445aa4633ca5da01d747...f7e37e1f5ab048059aaa08c55ee83438/)
   [Failed :arrow_down:0.26% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/5a6f1fde24184abab4f63b016fea5cf1...e7f9d5d8b3db44d79bee6a7398751ac3/)
   [Finished :arrow_down:0.51% :arrow_up:0.09%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/c4698a597e074483aa2e0213b2d76431...5c5c1b27bbb8471a9ddcbdf714641544/)
   Buildkite builds:
   [Finished] [`f74d7473` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2809)
   [Finished] [`f74d7473` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2844)
   [Finished] [`f74d7473` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2809)
   [Finished] [`f74d7473` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2834)
   [Finished] [`a3a01726` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2808)
   [Failed] [`a3a01726` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2843)
   [Failed] [`a3a01726` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2808)
   [Finished] [`a3a01726` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2833)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org