You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/23 15:53:22 UTC

[GitHub] [arrow-datafusion] andygrove opened a new pull request #398: Add metrics to RepartitionExec

andygrove opened a new pull request #398:
URL: https://github.com/apache/arrow-datafusion/pull/398


   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Adds metrics to `RepartitionExec`. Example output:
   
   ```
   RepartitionExec: partitioning=Hash([Column { name: "s_suppkey" }, Column { name: "s_nationkey" }], 24) metrics=[fetchTime=25910087,repartitionTime=537097]
     RepartitionExec: partitioning=RoundRobinBatch(24) metrics=[repartitionTime=0,fetchTime=897114]
   ```
   
   Closes #397 .
   
    # Rationale for this change
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   Help debug performance issues in queries.
   
   # What changes are included in this PR?
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Adds metrics to `RepartitionExec`.
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   No. The metrics are not shown by default.
   
   <!--
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on pull request #398: Add metrics to RepartitionExec

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #398:
URL: https://github.com/apache/arrow-datafusion/pull/398#issuecomment-846587539


   > Looks good. I think the time calculation for round robin repartition is missing.
   
   The new metrics don't include the time for sending the resulting batches to the channels, so the only thing to measure for round-robin would be the time to execute `let output_partition = counter % num_output_partitions` so I figured that was not worth measuring.
   
   I am now wondering if we should also measure time to send the results to the channel because if this is high it could indicate that upstream operators are not fetching data as fast as they could be. I will take a look at that next.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #398: Add metrics to RepartitionExec

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #398:
URL: https://github.com/apache/arrow-datafusion/pull/398#issuecomment-846587578


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#398](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (affa192) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/174226c086a4838eab2a238853b4871c295c0189?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (174226c) will **decrease** coverage by `0.00%`.
   > The diff coverage is `70.96%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/398/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #398      +/-   ##
   ==========================================
   - Coverage   74.94%   74.94%   -0.01%     
   ==========================================
     Files         146      146              
     Lines       24314    24344      +30     
   ==========================================
   + Hits        18223    18244      +21     
   - Misses       6091     6100       +9     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/src/physical\_plan/repartition.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/398/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9yZXBhcnRpdGlvbi5ycw==) | `82.45% <70.96%> (-1.89%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [174226c...affa192](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on pull request #398: Add metrics to RepartitionExec

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #398:
URL: https://github.com/apache/arrow-datafusion/pull/398#issuecomment-846588544


   > > Looks good. I think the time calculation for round robin repartition is missing.
   > 
   > The new metrics don't include the time for sending the resulting batches to the channels, so the only thing to measure for round-robin would be the time to execute `let output_partition = counter % num_output_partitions` so I figured that was not worth measuring.
   > 
   > I am now wondering if we should also measure time to send the results to the channel because if this is high it could indicate that upstream operators are not fetching data as fast as they could be. I will take a look at that next.
   
   Thanks, makes sense 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #398: Add metrics to RepartitionExec

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #398:
URL: https://github.com/apache/arrow-datafusion/pull/398#issuecomment-846587578


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#398](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (81f4103) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/174226c086a4838eab2a238853b4871c295c0189?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (174226c) will **decrease** coverage by `0.00%`.
   > The diff coverage is `72.72%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/398/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #398      +/-   ##
   ==========================================
   - Coverage   74.94%   74.94%   -0.01%     
   ==========================================
     Files         146      146              
     Lines       24314    24335      +21     
   ==========================================
   + Hits        18223    18238      +15     
   - Misses       6091     6097       +6     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/src/physical\_plan/repartition.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/398/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9yZXBhcnRpdGlvbi5ycw==) | `83.10% <72.72%> (-1.24%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [174226c...81f4103](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on pull request #398: Add metrics to RepartitionExec

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #398:
URL: https://github.com/apache/arrow-datafusion/pull/398#issuecomment-846588443


   I added the `sendTime` metric and this includes the cost of round-robin.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #398: Add metrics to RepartitionExec

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #398:
URL: https://github.com/apache/arrow-datafusion/pull/398#issuecomment-846587578


   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#398](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e607d71) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/174226c086a4838eab2a238853b4871c295c0189?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (174226c) will **increase** coverage by `0.00%`.
   > The diff coverage is `75.86%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-datafusion/pull/398/graphs/tree.svg?width=650&height=150&src=pr&token=JXwWBKD3D9&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #398   +/-   ##
   =======================================
     Coverage   74.94%   74.94%           
   =======================================
     Files         146      146           
     Lines       24314    24342   +28     
   =======================================
   + Hits        18223    18244   +21     
   - Misses       6091     6098    +7     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/src/physical\_plan/repartition.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/398/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcmMvcGh5c2ljYWxfcGxhbi9yZXBhcnRpdGlvbi5ycw==) | `83.18% <75.86%> (-1.16%)` | :arrow_down: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [174226c...e607d71](https://codecov.io/gh/apache/arrow-datafusion/pull/398?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan merged pull request #398: Add metrics to RepartitionExec

Posted by GitBox <gi...@apache.org>.
Dandandan merged pull request #398:
URL: https://github.com/apache/arrow-datafusion/pull/398


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org