You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Shen Li <cs...@gmail.com> on 2017/06/14 20:34:22 UTC

Rewind back tuple timestamp in DoFn

Hi,

I saw the DoFn#getAllowedTimestampSkew has been marked as deprecated. What
if a user does want to rewind back the timestamp without violating the
watermark?

Consider the case where there is a GroupByKey followed by a ParDo. The
GroupByKey transform groups tuples into one-hour windows. Say, each value
of the output iterable of the GroupByKey remembers the timestamp of when it
is created. The ParDo finds the max value in the iterable and wants to use
its timestamp as the output timestamp. For example, the timestamp of the
GroupByKey output might be 11 AM, but the timestamp of the max value might
be 10:30 AM. Is it possible for the user-defined ParDo to rewind back the
timestamp to 10:30 AM?

As the runner knows the current watermark, should there be any API for the
runner to notify the app of the allowedTimestampSkew?

Thanks,

Shen

Re: Rewind back tuple timestamp in DoFn

Posted by Shen Li <cs...@gmail.com>.
Hi Kenn,

Thanks a lot for the info. I will follow the discussion.

Shen

On Fri, Jun 23, 2017 at 10:38 AM, Kenneth Knowles <kl...@google.com.invalid>
wrote:

> Hi Shen,
>
> In order for this to work well with watermark tracking, we have some
> initial ideas on https://issues.apache.org/jira/browse/BEAM-644
>
> Kenn
>
> On Wed, Jun 14, 2017 at 1:34 PM, Shen Li <cs...@gmail.com> wrote:
>
> > Hi,
> >
> > I saw the DoFn#getAllowedTimestampSkew has been marked as deprecated.
> What
> > if a user does want to rewind back the timestamp without violating the
> > watermark?
> >
> > Consider the case where there is a GroupByKey followed by a ParDo. The
> > GroupByKey transform groups tuples into one-hour windows. Say, each value
> > of the output iterable of the GroupByKey remembers the timestamp of when
> it
> > is created. The ParDo finds the max value in the iterable and wants to
> use
> > its timestamp as the output timestamp. For example, the timestamp of the
> > GroupByKey output might be 11 AM, but the timestamp of the max value
> might
> > be 10:30 AM. Is it possible for the user-defined ParDo to rewind back the
> > timestamp to 10:30 AM?
> >
> > As the runner knows the current watermark, should there be any API for
> the
> > runner to notify the app of the allowedTimestampSkew?
> >
> > Thanks,
> >
> > Shen
> >
>

Re: Rewind back tuple timestamp in DoFn

Posted by Kenneth Knowles <kl...@google.com.INVALID>.
Hi Shen,

In order for this to work well with watermark tracking, we have some
initial ideas on https://issues.apache.org/jira/browse/BEAM-644

Kenn

On Wed, Jun 14, 2017 at 1:34 PM, Shen Li <cs...@gmail.com> wrote:

> Hi,
>
> I saw the DoFn#getAllowedTimestampSkew has been marked as deprecated. What
> if a user does want to rewind back the timestamp without violating the
> watermark?
>
> Consider the case where there is a GroupByKey followed by a ParDo. The
> GroupByKey transform groups tuples into one-hour windows. Say, each value
> of the output iterable of the GroupByKey remembers the timestamp of when it
> is created. The ParDo finds the max value in the iterable and wants to use
> its timestamp as the output timestamp. For example, the timestamp of the
> GroupByKey output might be 11 AM, but the timestamp of the max value might
> be 10:30 AM. Is it possible for the user-defined ParDo to rewind back the
> timestamp to 10:30 AM?
>
> As the runner knows the current watermark, should there be any API for the
> runner to notify the app of the allowedTimestampSkew?
>
> Thanks,
>
> Shen
>