You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/22 12:23:06 UTC

[GitHub] [arrow] zagto opened a new pull request, #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

zagto opened a new pull request, #12957:
URL: https://github.com/apache/arrow/pull/12957

   I see big improvements in the `ExecuteScalarExpressionOverhead` benchmarks, especially `ref_only_expression` with this, for example:
   ```
   before:
   ExecuteScalarExpressionOverhead/ref_only_expression/rows_per_batch:1000000/real_time/threads:1         35.4 ns         35.4 ns     19577007 batches_per_second=28.2395M/s rows_per_second=28.2395T/s
   ExecuteScalarExpressionOverhead/ref_only_expression/rows_per_batch:1000000/real_time/threads:16        49.8 ns          788 ns     14280992 batches_per_second=20.0734M/s rows_per_second=20.0734T/s
   
   after:
   ExecuteScalarExpressionOverhead/ref_only_expression/rows_per_batch:1000000/real_time/threads:1         27.6 ns         27.5 ns     25090317 batches_per_second=36.2832M/s rows_per_second=36.2832T/s
   ExecuteScalarExpressionOverhead/ref_only_expression/rows_per_batch:1000000/real_time/threads:16        4.26 ns         67.2 ns    184745728 batches_per_second=235.006M/s rows_per_second=235.006T/s
   
   ```
   
   Also the overhead of small batch size/multithreaded benchmarks is reduced, for example in `complex_expression`:
   ```
   before:
   ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:1          3723682 ns      3721326 ns          191 batches_per_second=268.551k/s rows_per_second=268.551M/s
   ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:16         1153070 ns     18365265 ns          624 batches_per_second=867.25k/s rows_per_second=867.25M/s
   
   after:
   ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:1          3543745 ns      3541909 ns          197 batches_per_second=282.187k/s rows_per_second=282.187M/s
   ExecuteScalarExpressionOverhead/complex_expression/rows_per_batch:1000/real_time/threads:16          841776 ns     13395266 ns          944 batches_per_second=1.18796M/s rows_per_second=1.18796G/s
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zagto commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
zagto commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1106595844

   > Note there's some lint failures (though, the git version means a lot of pipelines are broken)
   
   oops you're right. should be fixed now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zagto commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
zagto commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1120009112

   I now run ran one of the most-affected benchmarks - [tpch, arrow, parquet, memory_map=False, TPCH-15, scale_factor=1, R](https://conbench.ursa.dev/compare/benchmarks/3c980569f8be4d85a8d54e0ffb93e93c...ed4d1a2fb6a346478f3467cbe308c895/) - and saw no such difference on my machine. (provided R picked up the changed C++ library correctly, I'm not very familar with it)
   
   Going trough the benchmark runner results a bit more I noticed that all the extreme permanent changes are R TCPH benchmarks. For example [this](https://conbench.ursa.dev/compare/benchmarks/925f7fb18a244281880c996eefc42ba3...a2aa39dec1bf4bb284bbbbbcc04ecd2f/) and [this](https://conbench.ursa.dev/compare/benchmarks/e1ac2cb7e24a4d7d9193294a1d999fb2...ce9a5035803444b79fab77045c9bb786/) are the worst affected Python and JS benchmarks. Also some TCPH benchmarks see an extreme improvement. Overall this gives me the impression the performance change may be unrelated to this commit.
   
   I wonder if it is possible to run the baseline again on conbench to isolate this?
   
   [output-TPCH15.txt](https://github.com/apache/arrow/files/8643616/output-TPCH15.txt)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1106496250

   Note there's some lint failures (though, the git version means a lot of pipelines are broken)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1106821023

   Thanks for catching this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1110394517

   ['Python', 'R'] benchmarks have high level of regressions.
   [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8b8a13ad985c4adfbf68518fb9fca07d...d0967da307ff4694b471d4f131c5e2b8/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zagto commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
zagto commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1111301403

   looks like the "high level of regressions" are a temporary glich


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] zagto commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
zagto commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1118602443

   Interesting. It seems like [this](https://github.com/apache/arrow/commit/3e56a949c73168d789612a9022cdceae6088e2b7) commit (which seems unrelated to me) made the graph temporarily go back to normal, making it look like a glitch. This is probably what I saw when writing the above comment.
   
   For example here:
   [https://conbench.ursa.dev/compare/benchmarks/3c980569f8be4d85a8d54e0ffb93e93c...ed4d1a2fb6a346478f3467cbe308c895/](https://conbench.ursa.dev/compare/benchmarks/3c980569f8be4d85a8d54e0ffb93e93c...ed4d1a2fb6a346478f3467cbe308c895/)
   
   I guess I'll try if I can reproduce any of that
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1106484191

   https://issues.apache.org/jira/browse/ARROW-16280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
lidavidm commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1106675965

   CI should pass if you rebase here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] lidavidm closed pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
lidavidm closed pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()
URL: https://github.com/apache/arrow/pull/12957


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1110394212

   Benchmark runs are scheduled for baseline = e1e782a4542817e8a6139d6d5e022b56abdbc81d and contender = e3b84b08e549afe8477074883ed151083051bdff. e3b84b08e549afe8477074883ed151083051bdff is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/9af84758685c4d269ffd4cd0abea3fa9...3ab50a9766ed464a9eecec2e4860bd6c/)
   [Failed] [test-mac-arm](https://conbench.ursa.dev/compare/runs/20b2367ab2454eeba717dff7af1b2ff5...b7acb64c078749b8a5bc42a9497efd0d/)
   [Failed :arrow_down:12.78% :arrow_up:2.63%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/8b8a13ad985c4adfbf68518fb9fca07d...d0967da307ff4694b471d4f131c5e2b8/)
   [Finished :arrow_down:0.55% :arrow_up:0.42%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/a0be24cac1024327a4fa3340d17e9776...ce5fe4d6c8d74852a3b2c693b9a58c7b/)
   Buildkite builds:
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/596| `e3b84b08` ec2-t3-xlarge-us-east-2>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/584| `e3b84b08` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/583| `e3b84b08` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/594| `e3b84b08` ursa-thinkcentre-m75q>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/595| `e1e782a4` ec2-t3-xlarge-us-east-2>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/583| `e1e782a4` test-mac-arm>
   [Failed] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/582| `e1e782a4` ursa-i9-9960x>
   [Finished] <https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/593| `e1e782a4` ursa-thinkcentre-m75q>
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jorisvandenbossche commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on PR #12957:
URL: https://github.com/apache/arrow/pull/12957#issuecomment-1118488972

   Now a few more days have passed and a longer timeline is visible, it doesn't seem to be a temporary change: https://conbench.ursa.dev/compare/runs/8b8a13ad985c4adfbf68518fb9fca07d...d0967da307ff4694b471d4f131c5e2b8/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org