You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Chinmay Kolhatkar <ch...@apache.org> on 2016/11/24 10:52:29 UTC

APEXMALHAR-2354 Heuristic Watermark in windowed operator

Dear Community,

I'm working on adding support for heuristic watermark in Windowed Operator.
Heuristic watermark give users of WindowedOperator a way to logically
determine whether watermark condition is met or not by inspecting the
tuples received.
This can act as a replacement for or way to work along with Control Tuple
received on control port.

Here is the approach I'm considering:

1. A new interface lets say "HeuristicWatermark" will be added which
extends Component<Context.OperatorContext>
The reason why its extended with Component is then it can follow a
lifecycle.

2. This method contains a single method something like this:

ControlTuple.Watermark processTupleForWatermark(Tuple.WindowedTuple<InputT>
input);

3. Object of this type can optionally be set to AbstractWindowedOperator as
a plugin which identified whether watermark condition has reached.

4. If heuristicWatermark is set, processTupleForWatermark will be called
for every received tuple and the method can return the Watermark object if
watermark condition is met OR return null if not so.

5. If return value of this method is non-null, then processWatermark method
will be called which sets the nextWatermark value. And then rest of the
watermark processing can continue to happen in endWindow.


Please share your opinion on above approach.

Thanks,
Chinmay.

Re: APEXMALHAR-2354 Heuristic Watermark in windowed operator

Posted by Chinmay Kolhatkar <ch...@apache.org>.
Thanks Guys.

If there are no further comments, I can go about writing code and creating
a PR.

-Chinmay.

On Thu, Nov 24, 2016 at 5:10 PM, Bhupesh Chawda <bh...@datatorrent.com>
wrote:

> Good idea, +1
>
> ~ Bhupesh
>
> On Thu, Nov 24, 2016 at 5:08 PM, Tushar Gosavi <tu...@datatorrent.com>
> wrote:
>
> > +1, but can you change name of HeuristicWatermark to
> > WatermarkGenerator as the purpose of
> > this interface is to generate watermarks.
> >
> > - Tushar.
> >
> >
> > On Thu, Nov 24, 2016 at 4:22 PM, Chinmay Kolhatkar <ch...@apache.org>
> > wrote:
> > > Dear Community,
> > >
> > > I'm working on adding support for heuristic watermark in Windowed
> > Operator.
> > > Heuristic watermark give users of WindowedOperator a way to logically
> > > determine whether watermark condition is met or not by inspecting the
> > > tuples received.
> > > This can act as a replacement for or way to work along with Control
> Tuple
> > > received on control port.
> > >
> > > Here is the approach I'm considering:
> > >
> > > 1. A new interface lets say "HeuristicWatermark" will be added which
> > > extends Component<Context.OperatorContext>
> > > The reason why its extended with Component is then it can follow a
> > > lifecycle.
> > >
> > > 2. This method contains a single method something like this:
> > >
> > > ControlTuple.Watermark processTupleForWatermark(
> > Tuple.WindowedTuple<InputT>
> > > input);
> > >
> > > 3. Object of this type can optionally be set to
> AbstractWindowedOperator
> > as
> > > a plugin which identified whether watermark condition has reached.
> > >
> > > 4. If heuristicWatermark is set, processTupleForWatermark will be
> called
> > > for every received tuple and the method can return the Watermark object
> > if
> > > watermark condition is met OR return null if not so.
> > >
> > > 5. If return value of this method is non-null, then processWatermark
> > method
> > > will be called which sets the nextWatermark value. And then rest of the
> > > watermark processing can continue to happen in endWindow.
> > >
> > >
> > > Please share your opinion on above approach.
> > >
> > > Thanks,
> > > Chinmay.
> >
>

Re: APEXMALHAR-2354 Heuristic Watermark in windowed operator

Posted by Bhupesh Chawda <bh...@datatorrent.com>.
Good idea, +1

~ Bhupesh

On Thu, Nov 24, 2016 at 5:08 PM, Tushar Gosavi <tu...@datatorrent.com>
wrote:

> +1, but can you change name of HeuristicWatermark to
> WatermarkGenerator as the purpose of
> this interface is to generate watermarks.
>
> - Tushar.
>
>
> On Thu, Nov 24, 2016 at 4:22 PM, Chinmay Kolhatkar <ch...@apache.org>
> wrote:
> > Dear Community,
> >
> > I'm working on adding support for heuristic watermark in Windowed
> Operator.
> > Heuristic watermark give users of WindowedOperator a way to logically
> > determine whether watermark condition is met or not by inspecting the
> > tuples received.
> > This can act as a replacement for or way to work along with Control Tuple
> > received on control port.
> >
> > Here is the approach I'm considering:
> >
> > 1. A new interface lets say "HeuristicWatermark" will be added which
> > extends Component<Context.OperatorContext>
> > The reason why its extended with Component is then it can follow a
> > lifecycle.
> >
> > 2. This method contains a single method something like this:
> >
> > ControlTuple.Watermark processTupleForWatermark(
> Tuple.WindowedTuple<InputT>
> > input);
> >
> > 3. Object of this type can optionally be set to AbstractWindowedOperator
> as
> > a plugin which identified whether watermark condition has reached.
> >
> > 4. If heuristicWatermark is set, processTupleForWatermark will be called
> > for every received tuple and the method can return the Watermark object
> if
> > watermark condition is met OR return null if not so.
> >
> > 5. If return value of this method is non-null, then processWatermark
> method
> > will be called which sets the nextWatermark value. And then rest of the
> > watermark processing can continue to happen in endWindow.
> >
> >
> > Please share your opinion on above approach.
> >
> > Thanks,
> > Chinmay.
>

Re: APEXMALHAR-2354 Heuristic Watermark in windowed operator

Posted by Tushar Gosavi <tu...@datatorrent.com>.
+1, but can you change name of HeuristicWatermark to
WatermarkGenerator as the purpose of
this interface is to generate watermarks.

- Tushar.


On Thu, Nov 24, 2016 at 4:22 PM, Chinmay Kolhatkar <ch...@apache.org> wrote:
> Dear Community,
>
> I'm working on adding support for heuristic watermark in Windowed Operator.
> Heuristic watermark give users of WindowedOperator a way to logically
> determine whether watermark condition is met or not by inspecting the
> tuples received.
> This can act as a replacement for or way to work along with Control Tuple
> received on control port.
>
> Here is the approach I'm considering:
>
> 1. A new interface lets say "HeuristicWatermark" will be added which
> extends Component<Context.OperatorContext>
> The reason why its extended with Component is then it can follow a
> lifecycle.
>
> 2. This method contains a single method something like this:
>
> ControlTuple.Watermark processTupleForWatermark(Tuple.WindowedTuple<InputT>
> input);
>
> 3. Object of this type can optionally be set to AbstractWindowedOperator as
> a plugin which identified whether watermark condition has reached.
>
> 4. If heuristicWatermark is set, processTupleForWatermark will be called
> for every received tuple and the method can return the Watermark object if
> watermark condition is met OR return null if not so.
>
> 5. If return value of this method is non-null, then processWatermark method
> will be called which sets the nextWatermark value. And then rest of the
> watermark processing can continue to happen in endWindow.
>
>
> Please share your opinion on above approach.
>
> Thanks,
> Chinmay.

Re: APEXMALHAR-2354 Heuristic Watermark in windowed operator

Posted by Mohit Jotwani <mo...@datatorrent.com>.
+1  for David's approach

Regards,
Mohit

On Tue, Nov 29, 2016 at 11:17 AM, Chinmay Kolhatkar <ch...@apache.org>
wrote:

> David,
>
> Your suggestion make sense. This will also help in the idempotency.
> I'll also change the name of interface to WatermarkGenerator.
>
> I had 2 more questions here:
> 1. AbstractWindowedOperator currently has fixedWatermark right now. Should
> that be moved out of AbstractWindowedOperator and provided that as an
> implementation of WatermarkGenerator?
>
> 2. Should we have a setter method to this interface to provide object of
> DataStorage, retractionStorage to the implementation? This can be useful
> for any advanced implementation of watermark generation.
>
> -Chinmay.
>
>
>
> On Tue, Nov 29, 2016 at 2:19 AM, David Yan <da...@datatorrent.com> wrote:
>
> > +1 for the feature.
> >
> > But since having multiple watermark tuples within one streaming window is
> > not useful because WindowedOperator currently only processes watermark
> only
> > at endWindow, how about:
> >
> > void processTupleForWatermark(Tuple.WindowedTuple<InputT> tuple);
> > // to be called for each input tuple, updates the state of the impl of
> > WatermarkGenerator interface.
> >
> > ControlTuple.Watermark getWatermarkTuple();
> > // to be called at endWindow. return null if watermark is not available
> >
> > This is actually somewhat related to the separate debate on this list
> about
> > control tuples being delivered only at streaming window boundary.
> >
> > David
> >
> > On Thu, Nov 24, 2016 at 2:52 AM, Chinmay Kolhatkar <ch...@apache.org>
> > wrote:
> >
> > > Dear Community,
> > >
> > > I'm working on adding support for heuristic watermark in Windowed
> > Operator.
> > > Heuristic watermark give users of WindowedOperator a way to logically
> > > determine whether watermark condition is met or not by inspecting the
> > > tuples received.
> > > This can act as a replacement for or way to work along with Control
> Tuple
> > > received on control port.
> > >
> > > Here is the approach I'm considering:
> > >
> > > 1. A new interface lets say "HeuristicWatermark" will be added which
> > > extends Component<Context.OperatorContext>
> > > The reason why its extended with Component is then it can follow a
> > > lifecycle.
> > >
> > > 2. This method contains a single method something like this:
> > >
> > > ControlTuple.Watermark processTupleForWatermark(
> > > Tuple.WindowedTuple<InputT>
> > > input);
> > >
> > > 3. Object of this type can optionally be set to
> AbstractWindowedOperator
> > as
> > > a plugin which identified whether watermark condition has reached.
> > >
> > > 4. If heuristicWatermark is set, processTupleForWatermark will be
> called
> > > for every received tuple and the method can return the Watermark object
> > if
> > > watermark condition is met OR return null if not so.
> > >
> > > 5. If return value of this method is non-null, then processWatermark
> > method
> > > will be called which sets the nextWatermark value. And then rest of the
> > > watermark processing can continue to happen in endWindow.
> > >
> > >
> > > Please share your opinion on above approach.
> > >
> > > Thanks,
> > > Chinmay.
> > >
> >
>

Re: APEXMALHAR-2354 Heuristic Watermark in windowed operator

Posted by David Yan <da...@datatorrent.com>.
Chinmay:

My response is inline:

On Mon, Nov 28, 2016 at 9:47 PM, Chinmay Kolhatkar <ch...@apache.org>
wrote:

> David,
>
> Your suggestion make sense. This will also help in the idempotency.
> I'll also change the name of interface to WatermarkGenerator.
>
> I had 2 more questions here:
> 1. AbstractWindowedOperator currently has fixedWatermark right now. Should
> that be moved out of AbstractWindowedOperator and provided that as an
> implementation of WatermarkGenerator?
>


Yes, this makes sense.


>
> 2. Should we have a setter method to this interface to provide object of
> DataStorage, retractionStorage to the implementation? This can be useful
> for any advanced implementation of watermark generation.
>
>
The storage objects can be passed as part of a context object when
constructing the WatermarkGenerator object. In the future, we can add other
stuff to the context.





> -Chinmay.
>
>
>
> On Tue, Nov 29, 2016 at 2:19 AM, David Yan <da...@datatorrent.com> wrote:
>
> > +1 for the feature.
> >
> > But since having multiple watermark tuples within one streaming window is
> > not useful because WindowedOperator currently only processes watermark
> only
> > at endWindow, how about:
> >
> > void processTupleForWatermark(Tuple.WindowedTuple<InputT> tuple);
> > // to be called for each input tuple, updates the state of the impl of
> > WatermarkGenerator interface.
> >
> > ControlTuple.Watermark getWatermarkTuple();
> > // to be called at endWindow. return null if watermark is not available
> >
> > This is actually somewhat related to the separate debate on this list
> about
> > control tuples being delivered only at streaming window boundary.
> >
> > David
> >
> > On Thu, Nov 24, 2016 at 2:52 AM, Chinmay Kolhatkar <ch...@apache.org>
> > wrote:
> >
> > > Dear Community,
> > >
> > > I'm working on adding support for heuristic watermark in Windowed
> > Operator.
> > > Heuristic watermark give users of WindowedOperator a way to logically
> > > determine whether watermark condition is met or not by inspecting the
> > > tuples received.
> > > This can act as a replacement for or way to work along with Control
> Tuple
> > > received on control port.
> > >
> > > Here is the approach I'm considering:
> > >
> > > 1. A new interface lets say "HeuristicWatermark" will be added which
> > > extends Component<Context.OperatorContext>
> > > The reason why its extended with Component is then it can follow a
> > > lifecycle.
> > >
> > > 2. This method contains a single method something like this:
> > >
> > > ControlTuple.Watermark processTupleForWatermark(
> > > Tuple.WindowedTuple<InputT>
> > > input);
> > >
> > > 3. Object of this type can optionally be set to
> AbstractWindowedOperator
> > as
> > > a plugin which identified whether watermark condition has reached.
> > >
> > > 4. If heuristicWatermark is set, processTupleForWatermark will be
> called
> > > for every received tuple and the method can return the Watermark object
> > if
> > > watermark condition is met OR return null if not so.
> > >
> > > 5. If return value of this method is non-null, then processWatermark
> > method
> > > will be called which sets the nextWatermark value. And then rest of the
> > > watermark processing can continue to happen in endWindow.
> > >
> > >
> > > Please share your opinion on above approach.
> > >
> > > Thanks,
> > > Chinmay.
> > >
> >
>

Re: APEXMALHAR-2354 Heuristic Watermark in windowed operator

Posted by Chinmay Kolhatkar <ch...@apache.org>.
David,

Your suggestion make sense. This will also help in the idempotency.
I'll also change the name of interface to WatermarkGenerator.

I had 2 more questions here:
1. AbstractWindowedOperator currently has fixedWatermark right now. Should
that be moved out of AbstractWindowedOperator and provided that as an
implementation of WatermarkGenerator?

2. Should we have a setter method to this interface to provide object of
DataStorage, retractionStorage to the implementation? This can be useful
for any advanced implementation of watermark generation.

-Chinmay.



On Tue, Nov 29, 2016 at 2:19 AM, David Yan <da...@datatorrent.com> wrote:

> +1 for the feature.
>
> But since having multiple watermark tuples within one streaming window is
> not useful because WindowedOperator currently only processes watermark only
> at endWindow, how about:
>
> void processTupleForWatermark(Tuple.WindowedTuple<InputT> tuple);
> // to be called for each input tuple, updates the state of the impl of
> WatermarkGenerator interface.
>
> ControlTuple.Watermark getWatermarkTuple();
> // to be called at endWindow. return null if watermark is not available
>
> This is actually somewhat related to the separate debate on this list about
> control tuples being delivered only at streaming window boundary.
>
> David
>
> On Thu, Nov 24, 2016 at 2:52 AM, Chinmay Kolhatkar <ch...@apache.org>
> wrote:
>
> > Dear Community,
> >
> > I'm working on adding support for heuristic watermark in Windowed
> Operator.
> > Heuristic watermark give users of WindowedOperator a way to logically
> > determine whether watermark condition is met or not by inspecting the
> > tuples received.
> > This can act as a replacement for or way to work along with Control Tuple
> > received on control port.
> >
> > Here is the approach I'm considering:
> >
> > 1. A new interface lets say "HeuristicWatermark" will be added which
> > extends Component<Context.OperatorContext>
> > The reason why its extended with Component is then it can follow a
> > lifecycle.
> >
> > 2. This method contains a single method something like this:
> >
> > ControlTuple.Watermark processTupleForWatermark(
> > Tuple.WindowedTuple<InputT>
> > input);
> >
> > 3. Object of this type can optionally be set to AbstractWindowedOperator
> as
> > a plugin which identified whether watermark condition has reached.
> >
> > 4. If heuristicWatermark is set, processTupleForWatermark will be called
> > for every received tuple and the method can return the Watermark object
> if
> > watermark condition is met OR return null if not so.
> >
> > 5. If return value of this method is non-null, then processWatermark
> method
> > will be called which sets the nextWatermark value. And then rest of the
> > watermark processing can continue to happen in endWindow.
> >
> >
> > Please share your opinion on above approach.
> >
> > Thanks,
> > Chinmay.
> >
>

Re: APEXMALHAR-2354 Heuristic Watermark in windowed operator

Posted by David Yan <da...@datatorrent.com>.
+1 for the feature.

But since having multiple watermark tuples within one streaming window is
not useful because WindowedOperator currently only processes watermark only
at endWindow, how about:

void processTupleForWatermark(Tuple.WindowedTuple<InputT> tuple);
// to be called for each input tuple, updates the state of the impl of
WatermarkGenerator interface.

ControlTuple.Watermark getWatermarkTuple();
// to be called at endWindow. return null if watermark is not available

This is actually somewhat related to the separate debate on this list about
control tuples being delivered only at streaming window boundary.

David

On Thu, Nov 24, 2016 at 2:52 AM, Chinmay Kolhatkar <ch...@apache.org>
wrote:

> Dear Community,
>
> I'm working on adding support for heuristic watermark in Windowed Operator.
> Heuristic watermark give users of WindowedOperator a way to logically
> determine whether watermark condition is met or not by inspecting the
> tuples received.
> This can act as a replacement for or way to work along with Control Tuple
> received on control port.
>
> Here is the approach I'm considering:
>
> 1. A new interface lets say "HeuristicWatermark" will be added which
> extends Component<Context.OperatorContext>
> The reason why its extended with Component is then it can follow a
> lifecycle.
>
> 2. This method contains a single method something like this:
>
> ControlTuple.Watermark processTupleForWatermark(
> Tuple.WindowedTuple<InputT>
> input);
>
> 3. Object of this type can optionally be set to AbstractWindowedOperator as
> a plugin which identified whether watermark condition has reached.
>
> 4. If heuristicWatermark is set, processTupleForWatermark will be called
> for every received tuple and the method can return the Watermark object if
> watermark condition is met OR return null if not so.
>
> 5. If return value of this method is non-null, then processWatermark method
> will be called which sets the nextWatermark value. And then rest of the
> watermark processing can continue to happen in endWindow.
>
>
> Please share your opinion on above approach.
>
> Thanks,
> Chinmay.
>