You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Evan Galpin <ev...@gmail.com> on 2022/02/02 16:00:23 UTC

Timestamp Verification when Outputting from FinishBundleContext Vs. ProcessContext

Hey folks,

I noticed through tracing code that when calling
ProcessContext#outputWithTimestamp, the method checkTimestamp is
invoked[1]. However, no similar check appears to be invoked when calling
FinishBundleContext#output, which explicitly requires passing a timestamp
as one of the arguments[2]. Instead, all that's checked is that the pane
and timestamp are not null.  Is this difference intentional?  Could someone
help me improve my understanding?

Thanks,
Evan

[1]
https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L422
[2]
https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/transforms/DoFn.FinishBundleContext.html#output-org.apache.beam.sdk.values.TupleTag-T-org.joda.time.Instant-org.apache.beam.sdk.transforms.windowing.BoundedWindow-

Re: Timestamp Verification when Outputting from FinishBundleContext Vs. ProcessContext

Posted by Lara Schmidt <la...@google.com>.
+Reuven Lax <re...@google.com> as well in case he has any thoughts on
removing this check

On Fri, Feb 4, 2022 at 1:58 PM Lara Schmidt <la...@google.com> wrote:

> Yeah there's nothing to check there since there's no element to check
> against.
>
> I'd be fine removing the check, it's caused quite a bit of headache. But
> the point of it is to keep users from accidentally having late data when
> they don't mean to. So if we do remove it we need to acknowledge this is
> okay.
>
> On Thu, Feb 3, 2022 at 8:43 PM Kenneth Knowles <ke...@apache.org> wrote:
>
>> One reason is the lack of `elem` used here:
>> https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L440
>>
>> That might be the whole reason.
>>
>> Still, your point is a good one that the sort of validation is different.
>> The check for timestamp skew dates back to even before Beam and has been a
>> source of various troubles. It is also probably obsolete. The purpose was
>> really to make sure elements did not fall into time intervals that would be
>> dropped immediately. Now that element dropping is associated with expired
>> windows, it may be entirely obsolete or just ready for some updating. +Lara
>> Schmidt <la...@google.com> is the person who has looked in detail
>> most recently, I think.
>>
>> Kenn
>>
>> On Wed, Feb 2, 2022 at 8:00 AM Evan Galpin <ev...@gmail.com> wrote:
>>
>>> Hey folks,
>>>
>>> I noticed through tracing code that when calling
>>> ProcessContext#outputWithTimestamp, the method checkTimestamp is
>>> invoked[1]. However, no similar check appears to be invoked when calling
>>> FinishBundleContext#output, which explicitly requires passing a timestamp
>>> as one of the arguments[2]. Instead, all that's checked is that the pane
>>> and timestamp are not null.  Is this difference intentional?  Could someone
>>> help me improve my understanding?
>>>
>>> Thanks,
>>> Evan
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L422
>>> [2]
>>> https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/transforms/DoFn.FinishBundleContext.html#output-org.apache.beam.sdk.values.TupleTag-T-org.joda.time.Instant-org.apache.beam.sdk.transforms.windowing.BoundedWindow-
>>>
>>

Re: Timestamp Verification when Outputting from FinishBundleContext Vs. ProcessContext

Posted by Lara Schmidt <la...@google.com>.
Yeah there's nothing to check there since there's no element to check
against.

I'd be fine removing the check, it's caused quite a bit of headache. But
the point of it is to keep users from accidentally having late data when
they don't mean to. So if we do remove it we need to acknowledge this is
okay.

On Thu, Feb 3, 2022 at 8:43 PM Kenneth Knowles <ke...@apache.org> wrote:

> One reason is the lack of `elem` used here:
> https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L440
>
> That might be the whole reason.
>
> Still, your point is a good one that the sort of validation is different.
> The check for timestamp skew dates back to even before Beam and has been a
> source of various troubles. It is also probably obsolete. The purpose was
> really to make sure elements did not fall into time intervals that would be
> dropped immediately. Now that element dropping is associated with expired
> windows, it may be entirely obsolete or just ready for some updating. +Lara
> Schmidt <la...@google.com> is the person who has looked in detail
> most recently, I think.
>
> Kenn
>
> On Wed, Feb 2, 2022 at 8:00 AM Evan Galpin <ev...@gmail.com> wrote:
>
>> Hey folks,
>>
>> I noticed through tracing code that when calling
>> ProcessContext#outputWithTimestamp, the method checkTimestamp is
>> invoked[1]. However, no similar check appears to be invoked when calling
>> FinishBundleContext#output, which explicitly requires passing a timestamp
>> as one of the arguments[2]. Instead, all that's checked is that the pane
>> and timestamp are not null.  Is this difference intentional?  Could someone
>> help me improve my understanding?
>>
>> Thanks,
>> Evan
>>
>> [1]
>> https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L422
>> [2]
>> https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/transforms/DoFn.FinishBundleContext.html#output-org.apache.beam.sdk.values.TupleTag-T-org.joda.time.Instant-org.apache.beam.sdk.transforms.windowing.BoundedWindow-
>>
>

Re: Timestamp Verification when Outputting from FinishBundleContext Vs. ProcessContext

Posted by Kenneth Knowles <ke...@apache.org>.
One reason is the lack of `elem` used here:
https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L440

That might be the whole reason.

Still, your point is a good one that the sort of validation is different.
The check for timestamp skew dates back to even before Beam and has been a
source of various troubles. It is also probably obsolete. The purpose was
really to make sure elements did not fall into time intervals that would be
dropped immediately. Now that element dropping is associated with expired
windows, it may be entirely obsolete or just ready for some updating. +Lara
Schmidt <la...@google.com> is the person who has looked in detail
most recently, I think.

Kenn

On Wed, Feb 2, 2022 at 8:00 AM Evan Galpin <ev...@gmail.com> wrote:

> Hey folks,
>
> I noticed through tracing code that when calling
> ProcessContext#outputWithTimestamp, the method checkTimestamp is
> invoked[1]. However, no similar check appears to be invoked when calling
> FinishBundleContext#output, which explicitly requires passing a timestamp
> as one of the arguments[2]. Instead, all that's checked is that the pane
> and timestamp are not null.  Is this difference intentional?  Could someone
> help me improve my understanding?
>
> Thanks,
> Evan
>
> [1]
> https://github.com/apache/beam/blob/15048929495ad66963b528d5bd71eb7b4a844c96/runners/core-java/src/main/java/org/apache/beam/runners/core/SimpleDoFnRunner.java#L422
> [2]
> https://beam.apache.org/releases/javadoc/2.35.0/org/apache/beam/sdk/transforms/DoFn.FinishBundleContext.html#output-org.apache.beam.sdk.values.TupleTag-T-org.joda.time.Instant-org.apache.beam.sdk.transforms.windowing.BoundedWindow-
>