You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/07 15:02:36 UTC

[GitHub] [beam] egalpin opened a new pull request, #22183: Moves timestamp skew override to correct place

egalpin opened a new pull request, #22183:
URL: https://github.com/apache/beam/pull/22183

   Relates to #21690 and #21895 (and originally #17112).
   
   The timestamp skew override required as part of the bug fix in #21895 was added to the wrong DoFn, which resulted in the reintroduction of #21885 in v2.40.0.  This PR corrects that error.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on a diff in pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on code in PR #22183:
URL: https://github.com/apache/beam/pull/22183#discussion_r917888945


##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -2366,6 +2361,11 @@ protected BulkIOBaseFn(BulkIO bulkSpec) {
         this.spec = bulkSpec;
       }
 
+      @Override
+      public Duration getAllowedTimestampSkew() {
+        return Duration.millis(Long.MAX_VALUE);

Review Comment:
   Yes the way in which this error arises is that multiple elements from the same bundle and window are buffered, and then later output.  But the timestamp of each element differs, even though they're in the same window and bundle.  The timestamp check only considers the output timestamp provided as arg against the timestamp of the _current_ element so in a case like this of buffering values which are all in the same window, the timestamp of the element at the point in time where the buffer has reached the desired size may result in checkTimestamp failures if any buffered elements from the same window had an earlier timestamp than the current element.
   
   Agreed, it would be great to be able to relax the strictness of `checkTimestamp` if that can be done in a way that preserves the original intent of the method (like Jan said, ensuring "not to output elements that change from on_time to late (or droppable).")



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin merged pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin merged PR #22183:
URL: https://github.com/apache/beam/pull/22183


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1182066928

   Run Spotless PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1185600147

   Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
Abacn commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1182203949

   FYI Spotless check now includes checkStyle. The checkStyle violation in this PR is shown here: https://ci-beam.apache.org/job/beam_PreCommit_Spotless_Phrase/141/checkstyle/new/source.c5bf3a74-5b86-482d-a7fd-9e77bf899d46/#50


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1180723560

   Thoughts on adding a new issue to track adding test coverage, while merging this to ensure the breakage in 2.40.0 is patched?  Goal is still to have the test coverage added as part of 2.41.0 as well but that would decouple fix from test coverage if that's agreeable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1180369452

   > LGTM. Regarding the test - did you use TestStream for that? It should be possible to create specific bundles so that you can trigger the exception, no?
   
   Thanks, this is great!  It's a little older, but I found this blog[1] very helpful for understanding TestStream. 
   
   [1] https://beam.apache.org/blog/test-stream/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1177848159

   R: @je-ik @kennknowles as you each have context on the prior 2 changes (https://github.com/apache/beam/pull/17112 and https://github.com/apache/beam/pull/21895) which this PR finally closes the loop on. 
   
   I wish I had better test coverage for this, but I've had great difficulty reproducing the `Output timestamps must be no earlier than the timestamp of the current input` error via unit tests.  I'm very open to ideas and input on the testing front.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1180808795

   update: I've reproduced the exact error locally using TestStream and the prior code (not including this PR's change set).  I'll include the test in this PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1182607586

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1181770282

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1182255828

   thanks @Abacn !  I'll have to update my local pre-commit settings to include checkStyle as well.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
egalpin commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1185570574

   @je-ik anything outstanding to garner PR approval?  Please let me know if there are any actions or changes required :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] je-ik commented on a diff in pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
je-ik commented on code in PR #22183:
URL: https://github.com/apache/beam/pull/22183#discussion_r917680579


##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -2366,6 +2361,11 @@ protected BulkIOBaseFn(BulkIO bulkSpec) {
         this.spec = bulkSpec;
       }
 
+      @Override
+      public Duration getAllowedTimestampSkew() {
+        return Duration.millis(Long.MAX_VALUE);

Review Comment:
   Elements should not be assigned to expired windows, because the elements are assigned to windows present in the active bundle. This is just a workaround the check for element output timestamp which is too restrictive. There is no need for elements not to be output with lower timestamp than timestamp of the current element, it is only needed not to output elements that change from on_time to late (or droppable). It would be best to relax the restriction, though it seems it would require a runner-specific code to be provided, because the check does not know about bundles.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] je-ik commented on pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
je-ik commented on PR #22183:
URL: https://github.com/apache/beam/pull/22183#issuecomment-1179512375

   LGTM. Regarding the test - did you use TestStream for that? It should be possible to create specific bundles so that you can trigger the exception, no?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on a diff in pull request #22183: Moves timestamp skew override to correct place

Posted by GitBox <gi...@apache.org>.
kennknowles commented on code in PR #22183:
URL: https://github.com/apache/beam/pull/22183#discussion_r917095292


##########
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java:
##########
@@ -2366,6 +2361,11 @@ protected BulkIOBaseFn(BulkIO bulkSpec) {
         this.spec = bulkSpec;
       }
 
+      @Override
+      public Duration getAllowedTimestampSkew() {
+        return Duration.millis(Long.MAX_VALUE);

Review Comment:
   This won't affect the watermark, so it will just allow output to be dropped if it is assigned to an expired window.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org