You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/06 19:15:36 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue, #2174: Easier flamegraph / profiling support for datafusion benchmarks

alamb opened a new issue, #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I want to make flamegraphs to understand performance of PRs like https://github.com/apache/arrow-datafusion/pull/2146 from @yjshen and other benchmarks in a cross platform, easy to use way. I don't want to switch between Instruments on Mac to `perf` on linux
   
   It is not easy to get them (see [this mailing list topic](https://lists.apache.org/thread/hnpzq41zt7csfnds9m652brf18xr6sb6), and I hit the same "flamegraph takes forever to make" issue when I tried to run this on my development machine
   
   **Describe the solution you'd like**
   Use code in [pprof crate](https://crates.io/crates/pprof) -- @mkmik  did this in [`influxdb_iox`](https://github.com/influxdata/influxdb_iox/pulls) have used it to good effect in IOx
   
   So this would look like adding an optional feature `pprof` to benchmark program that would generate flamegraphs and profile.proto format output.  You would run it like  this:
   
   ```shell
   # --profile also writes flamegraph.svg and profile.proto files
   target/release/tpch benchmark datafusion --profile --iterations 3 --path /tpch-parquet --format parquet --query 1
   ```
   
   You could use the `profile.proto` in the (excellent) golang tooling like:
   
   ```
   go tool pprof cpu.prof
   ```
   
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090779133

   Perhaps [this](https://github.com/jonhoo/inferno) might help? FWIW I also use the hotspot UI, which also has some nice functionality for digging into the profiles, viewing thread activity, etc... It chugs a bit on larger profiles (multiple GB) but otherwise works well enough for my purposes...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] bobtins commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
bobtins commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090871915

   @tustvold I did try out hotspot; it's really nice! It can record data (which just runs perf) or open perf files that were recorded. 
   ![hotspot_3](https://user-images.githubusercontent.com/12849637/162082184-270172ea-a737-4d55-8476-b4ca2c21ce50.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1645553567

   I don't think there is any work action for this, standard profiling tools work great. Closing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090726944

   Have you checked out [cargo-flamegraph](https://github.com/flamegraph-rs/flamegraph)? I think it might fit the bill?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] houqp commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1091280546

   @alamb have you tried cargo-flamegraph's `--no-inline` argument?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090866993

   @bobtins I think you maybe meant to link to this https://eighty-twenty.org/2021/09/09/perf-addr2line-speed-improvement ?
   
   The good news is a mitigation was merged at some point last year, and has apparently been released, so perhaps the GCP machines was running an old Debian version? - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=be8ecc57f180415e8a7c1cc5620c5236be2a7e56


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] bobtins commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
bobtins commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090874766

   > @bobtins I think you maybe meant to link to this https://eighty-twenty.org/2021/09/09/perf-addr2line-speed-improvement ?
   > 
   Hm, why didn't I find that? You have good google-fu, or I thought my link was the end of the story and didn't keep plugging away. That's great news!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mkmik commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
mkmik commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1091195608

   FWIW I found the pprof approach to be very useful for online performance analysis (e.g on demand on production services), and less useful when running ad-hoc performance analysis runs locally (e.g. benchmarks) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #2174: Easier flamegraph / profiling support for datafusion benchmarks
URL: https://github.com/apache/arrow-datafusion/issues/2174


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] bobtins commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
bobtins commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090888841

   > @bobtins I think you maybe meant to link to this https://eighty-twenty.org/2021/09/09/perf-addr2line-speed-improvement ?
   > 
   Hm, good news, thanks for the updated info.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090935069

   Thanks for the suggestions -- I'l try the various tools on this thread and will report back


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] realno commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
realno commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090916516

   This is great, looking forward to seeing some results! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] mingmwang commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
mingmwang commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1092536106

   I  had use [pprof-rs](https://github.com/tikv/pprof-rs) to benchmark the DataFusion/Arrow Parquet reader performance. It can generate the flamegraph pictures easily.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] bobtins commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
bobtins commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090847533

   @alamb [someone drilled deep into why perf is slow](https://stackoverflow.com/questions/4048151/what-are-the-options-for-storing-hierarchical-data-in-a-relational-database). 
   
   TL;DR--license-compatible perf is really inefficient.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090732529

   I did try to run `cargo flamegraph` ... and it was spending an absurd amount of time calling `addr2line` (as reported in the mailing list topic). Perhaps due to some old version of prof or something on the GCP debian machine I was using


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on issue #2174: Easier flamegraph / profiling support for datafusion benchmarks

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #2174:
URL: https://github.com/apache/arrow-datafusion/issues/2174#issuecomment-1090772976

   FWIW, I never have experienced the in the mailinglist mentioned slowness of flamegraph (or the Hotspot UI I mentioned there), using a recent Pop!_OS version.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org