You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/10/06 17:12:00 UTC

[jira] [Work logged] (BEAM-12931) Timer.setOutputTimestamp doesn't take into account for DoFn#getAllowedTimestampSkew()

     [ https://issues.apache.org/jira/browse/BEAM-12931?focusedWorklogId=661082&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-661082 ]

ASF GitHub Bot logged work on BEAM-12931:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Oct/21 17:11
            Start Date: 06/Oct/21 17:11
    Worklog Time Spent: 10m 
      Work Description: lukecwik commented on pull request #15540:
URL: https://github.com/apache/beam/pull/15540#issuecomment-936715458


   > > Won't this allow for infinite skew since if have a timer at `X` and skew of `-1` then the first time the timer is processed you can output at time `X-1` and when it gets scheduled again you can now output at `X-2` since the the new timers timestamp is `X-1`?
   > 
   > So my understanding of the reason for these checks is to stop people from doing the wrong thing without realizing it. We don't even take any different action based on this variable. It seems okay to apply this to each specific output timestamp and let you skew more if you chain timers in this fashion.
   > 
   > On a more practical note, there's reasons why you might want a timer to output an earlier element if you've properly set up watermark holds. There's currently no way to do that so we need some allowance. It would probably be better if we could constrain skew from the first output timestamp but I don't think that's available in the later timers, right?
   > 
   > If you disagree with the approach, I can bring this up on the email thread for others to chime in in case they are not checking here.
   
   I think users will be surprised that their data will be dropped as late once they pass the watermark skew bound if they output past it. The existing logic had guards for this explicitly since it would be surprising for users so I do believe it is important enough to discuss whether there is another approach to solve this or we are ok with this happening.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 661082)
    Time Spent: 2h 50m  (was: 2h 40m)

> Timer.setOutputTimestamp doesn't take into account for DoFn#getAllowedTimestampSkew()
> -------------------------------------------------------------------------------------
>
>                 Key: BEAM-12931
>                 URL: https://issues.apache.org/jira/browse/BEAM-12931
>             Project: Beam
>          Issue Type: Bug
>          Components: beam-model
>            Reporter: Lara Schmidt
>            Assignee: Lara Schmidt
>            Priority: P2
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> A DoFn may emit elements with a timestamp up to DoFn#getAllowedTimestampSkew() before the current element's timestamp. However getAllowedTimestampSkew is not properly accounted for in looking at the output timestamp of a timer.
> Context: [https://lists.apache.org/thread.html/r7554658114ddde86c5d82e1c39fe7e1ef587fe926b8e406d1130d501%40%3Cdev.beam.apache.org%3E]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)