You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "ryantam626 (via GitHub)" <gi...@apache.org> on 2023/08/09 11:30:12 UTC

[GitHub] [beam] ryantam626 opened a new pull request, #27924: Speed up `IntervalWindow` argument construction

ryantam626 opened a new pull request, #27924:
URL: https://github.com/apache/beam/pull/27924

   In the original snippet, we create two instances of `Timestamp(micros=s)`, the later is rather short
   lived as we immediately use the `__add__` dunder
   method to create a brand new instance of `Timestamp`.
   
   The change in this commit is to remove the construction of this short lived `Timestamp` and reuse the one already constructed for the `start` argument of `IntervalWindow`.
   
   The speed gain of this change depends on the specification of the `SlidingWindows`. I have a pipeline that is relatively simple and with GCP's profiler in Dataflow, it revealed about ~2.5% of wall time is wasted in constructing this short lived `Timestamp`.
   
   Given that `__add__` doesn't alter the `Timestamp` object inplace (and unlikely to be changed to that behaviour), I think this is a safe change.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
   **This is just a oneliner change so I didn't bother, let me know if I should do any of the below.**
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn merged pull request #27924: Speed up `IntervalWindow` argument construction

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn merged PR #27924:
URL: https://github.com/apache/beam/pull/27924


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #27924: Speed up `IntervalWindow` argument construction

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #27924:
URL: https://github.com/apache/beam/pull/27924#issuecomment-1671295838

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @AnandInguva for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #27924: Speed up `IntervalWindow` argument construction

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #27924:
URL: https://github.com/apache/beam/pull/27924#issuecomment-1682178606

   Reminder, please take a look at this pr: @AnandInguva 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #27924: Speed up `IntervalWindow` argument construction

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #27924:
URL: https://github.com/apache/beam/pull/27924#issuecomment-1671174400

   ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/27924?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#27924](https://app.codecov.io/gh/apache/beam/pull/27924?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (69cc7ad) into [master](https://app.codecov.io/gh/apache/beam/commit/b502b3750cc7d49bace3ec87e9e277ee4e4e02f8?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (b502b37) will **decrease** coverage by `0.38%`.
   > Report is 109 commits behind head on master.
   > The diff coverage is `n/a`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #27924      +/-   ##
   ==========================================
   - Coverage   70.95%   70.58%   -0.38%     
   ==========================================
     Files         860      857       -3     
     Lines      104864   103917     -947     
   ==========================================
   - Hits        74403    73345    -1058     
   - Misses      28902    29013     +111     
     Partials     1559     1559              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `79.55% <ø> (-0.45%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files Changed](https://app.codecov.io/gh/apache/beam/pull/27924?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/transforms/window.py](https://app.codecov.io/gh/apache/beam/pull/27924?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdHJhbnNmb3Jtcy93aW5kb3cucHk=) | `87.50% <ø> (ø)` | |
   
   ... and [35 files with indirect coverage changes](https://app.codecov.io/gh/apache/beam/pull/27924/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #27924: Speed up `IntervalWindow` argument construction

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #27924:
URL: https://github.com/apache/beam/pull/27924#issuecomment-1672361534

   Run Python_Runners PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #27924: Speed up `IntervalWindow` argument construction

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #27924:
URL: https://github.com/apache/beam/pull/27924#issuecomment-1672036077

   That's interesting finding, generally object creation of one or two should not make noticable difference, and Timestamp is light weight. This suggest the involved code path is a hot path, not limited to Timestamp creation.
   
   Would you mind sharing more about the benchmark / codes that doing the test ?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ryantam626 commented on pull request #27924: Speed up `IntervalWindow` argument construction

Posted by "ryantam626 (via GitHub)" <gi...@apache.org>.
ryantam626 commented on PR #27924:
URL: https://github.com/apache/beam/pull/27924#issuecomment-1672082225

   > ... generally object creation of one or two should not make noticable difference, and Timestamp is light weight.
   
   Yes I am surprised it showed up that prominently in the flamegraph we obtained as well.
   
   > This suggest the involved code path is a hot path, not limited to Timestamp creation.
   
   Smells like it!
   
   > Would you mind sharing more about the benchmark / codes that doing the test ?
   
   I can try - the pipeline we have is not really a benchmark, but rather a production-ish pipeline that we use for analysing our data, the first few steps goes like this
   
   1. Read from a BigQuery query which returns a sparse minutely summary from data collection devices
   2. Assign timestamp using the summary's timestamp (already in the minute level resolution)
   3. Window into using a SlidingWindows, 30 days window size, 1 day increment.
   4. Unrelated (to this ticket) steps that analyse patterns follows.
   
   --------
   
   I am curious to know if you have spotted any inefficiency in these first few steps.
   
   We have since made some improvement to our pipeline which incidentally side-stepped this slowness [1], but regardless I thought this change would be a nice addition seeing as it's almost a risk-free change which supposedly improve performance.
   
   [1] We went for a sliding window specification of 90 days window size, 30 days increment, trading extra memory usage (and small amount of correctness in our pipeline) for speed - as each minutely summary will only need to go into 3 bucket instead of 30 buckets under this sliding window specification.
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org