You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by Yan Fang <ya...@gmail.com> on 2014/09/10 01:54:34 UTC

Do we want to have the sliding window implementation?

Hi guys,

realize that both Storm and Spark Streaming have sliding window
implementation while Samza only has the fixed window (not sure if it's a
correct name). I think you guys must consider this idea at the beginning of
designing the Samza. What was the thought? Thank you.

Cheers,

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108

Re: Do we want to have the sliding window implementation?

Posted by Yan Fang <ya...@gmail.com>.
@Mayur, thanks for the excellent explanation. :)

@Chris, yes, after second thought, I think we can achieve the sliding
window implementation with existing code base. Maybe adding a simple
example in hello-samza will be helpful since the implementation is not very
explicit but sliding window is a common use case.

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108

On Wed, Sep 10, 2014 at 9:39 AM, Chris Riccomini <
criccomini@linkedin.com.invalid> wrote:

> Hey Guys,
>
> Our thought was that sliding window could be implemented with a buffer
> inside of a process() call. For example, you might have a list of 10
> elements, and every time process() is invoked, you could add the message
> to the ahead of the list, and dequeue the last element from the buffer (if
> it already has 10 elements in it).
>
> So, Samza currently doesn't support any explicit sliding window, but it
> seems to me that you could implement it in StreamTask.process() if you
> need to.
>
> Cheers,
> Chris
>
> On 9/9/14 10:31 PM, "Mayur Rustagi" <ma...@gmail.com> wrote:
>
> >Slide window is another dimension to processing
> >
> >say my batch is 3 sec  & window is 9 sec then this is what I get
> >
> >[ x1 x2 x3][x4 x5 x6][x7 x8 x9]
> >
> >This is using slide but slide is equal to window size, but i may want to
> >get last 3 elements at any point of time then that would be
> >
> >
> >[ x1 x2 x3]  after3sec  [ x2 x3 x4] after3sec  [ x3 x4 x5] after3sec  [
> >x4 x5 x6]
> >
> >to implement this  you use batch of 3 sec, window of 9 sec & slide
> >duration of 3 sec. So we are sliding every 3 sec & also getting a batch
> >every 3 sec.
> >
> >
> >
> >
> >
> >--
> >Regards,
> >Mayur Rustagi
> >Ph: +1 (760) 203 3257
> >http://www.sigmoidanalytics.com
> >@mayur_rustagi
> >
> >On Wed, Sep 10, 2014 at 5:25 AM, Yan Fang <ya...@gmail.com> wrote:
> >
> >> Hi guys,
> >> realize that both Storm and Spark Streaming have sliding window
> >> implementation while Samza only has the fixed window (not sure if it's a
> >> correct name). I think you guys must consider this idea at the
> >>beginning of
> >> designing the Samza. What was the thought? Thank you.
> >> Cheers,
> >> Fang, Yan
> >> yanfang724@gmail.com
> >> +1 (206) 849-4108
>
>

Re: Do we want to have the sliding window implementation?

Posted by Dan Di Spaltro <da...@gmail.com>.
That's very similar to how Esper implements a sliding window.  They have
the concept of oldEvents and newEvents that get passed in to every
"process". oldEvents would have the old events leaving a window and
newEvents would have the new ones coming in.  The difference here is that
esper provides the sliding window capability, but it can also be helpful to
translate domain problems to your code[1].

[1]
http://esper.codehaus.org/esper-5.0.0/doc/reference/en-US/html_single/index.html#view-win-time


On Wed, Sep 10, 2014 at 9:39 AM, Chris Riccomini <
criccomini@linkedin.com.invalid> wrote:

> Hey Guys,
>
> Our thought was that sliding window could be implemented with a buffer
> inside of a process() call. For example, you might have a list of 10
> elements, and every time process() is invoked, you could add the message
> to the ahead of the list, and dequeue the last element from the buffer (if
> it already has 10 elements in it).
>
> So, Samza currently doesn't support any explicit sliding window, but it
> seems to me that you could implement it in StreamTask.process() if you
> need to.
>
> Cheers,
> Chris
>
> On 9/9/14 10:31 PM, "Mayur Rustagi" <ma...@gmail.com> wrote:
>
> >Slide window is another dimension to processing
> >
> >say my batch is 3 sec  & window is 9 sec then this is what I get
> >
> >[ x1 x2 x3][x4 x5 x6][x7 x8 x9]
> >
> >This is using slide but slide is equal to window size, but i may want to
> >get last 3 elements at any point of time then that would be
> >
> >
> >[ x1 x2 x3]  after3sec  [ x2 x3 x4] after3sec  [ x3 x4 x5] after3sec  [
> >x4 x5 x6]
> >
> >to implement this  you use batch of 3 sec, window of 9 sec & slide
> >duration of 3 sec. So we are sliding every 3 sec & also getting a batch
> >every 3 sec.
> >
> >
> >
> >
> >
> >--
> >Regards,
> >Mayur Rustagi
> >Ph: +1 (760) 203 3257
> >http://www.sigmoidanalytics.com
> >@mayur_rustagi
> >
> >On Wed, Sep 10, 2014 at 5:25 AM, Yan Fang <ya...@gmail.com> wrote:
> >
> >> Hi guys,
> >> realize that both Storm and Spark Streaming have sliding window
> >> implementation while Samza only has the fixed window (not sure if it's a
> >> correct name). I think you guys must consider this idea at the
> >>beginning of
> >> designing the Samza. What was the thought? Thank you.
> >> Cheers,
> >> Fang, Yan
> >> yanfang724@gmail.com
> >> +1 (206) 849-4108
>
>


-- 
Dan Di Spaltro

Re: Do we want to have the sliding window implementation?

Posted by Chris Riccomini <cr...@linkedin.com.INVALID>.
Hey Guys,

Our thought was that sliding window could be implemented with a buffer
inside of a process() call. For example, you might have a list of 10
elements, and every time process() is invoked, you could add the message
to the ahead of the list, and dequeue the last element from the buffer (if
it already has 10 elements in it).

So, Samza currently doesn't support any explicit sliding window, but it
seems to me that you could implement it in StreamTask.process() if you
need to.

Cheers,
Chris

On 9/9/14 10:31 PM, "Mayur Rustagi" <ma...@gmail.com> wrote:

>Slide window is another dimension to processing
>
>say my batch is 3 sec  & window is 9 sec then this is what I get
>
>[ x1 x2 x3][x4 x5 x6][x7 x8 x9]
>
>This is using slide but slide is equal to window size, but i may want to
>get last 3 elements at any point of time then that would be
>
>
>[ x1 x2 x3]  after3sec  [ x2 x3 x4] after3sec  [ x3 x4 x5] after3sec  [
>x4 x5 x6]
>
>to implement this  you use batch of 3 sec, window of 9 sec & slide
>duration of 3 sec. So we are sliding every 3 sec & also getting a batch
>every 3 sec. 
>
>
>
>
>
>-- 
>Regards,
>Mayur Rustagi
>Ph: +1 (760) 203 3257
>http://www.sigmoidanalytics.com
>@mayur_rustagi
>
>On Wed, Sep 10, 2014 at 5:25 AM, Yan Fang <ya...@gmail.com> wrote:
>
>> Hi guys,
>> realize that both Storm and Spark Streaming have sliding window
>> implementation while Samza only has the fixed window (not sure if it's a
>> correct name). I think you guys must consider this idea at the
>>beginning of
>> designing the Samza. What was the thought? Thank you.
>> Cheers,
>> Fang, Yan
>> yanfang724@gmail.com
>> +1 (206) 849-4108


Re: Do we want to have the sliding window implementation?

Posted by Mayur Rustagi <ma...@gmail.com>.
Slide window is another dimension to processing

say my batch is 3 sec  & window is 9 sec then this is what I get

[ x1 x2 x3][x4 x5 x6][x7 x8 x9]

This is using slide but slide is equal to window size, but i may want to get last 3 elements at any point of time then that would be


[ x1 x2 x3]  after3sec  [ x2 x3 x4] after3sec  [ x3 x4 x5] after3sec  [ x4 x5 x6]

to implement this  you use batch of 3 sec, window of 9 sec & slide duration of 3 sec. So we are sliding every 3 sec & also getting a batch every 3 sec. 





-- 
Regards,
Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi

On Wed, Sep 10, 2014 at 5:25 AM, Yan Fang <ya...@gmail.com> wrote:

> Hi guys,
> realize that both Storm and Spark Streaming have sliding window
> implementation while Samza only has the fixed window (not sure if it's a
> correct name). I think you guys must consider this idea at the beginning of
> designing the Samza. What was the thought? Thank you.
> Cheers,
> Fang, Yan
> yanfang724@gmail.com
> +1 (206) 849-4108

Re: Do we want to have the sliding window implementation?

Posted by Shekar Tippur <ct...@gmail.com>.
Yan,

Can you please give a use case for this?

- Shekar

On Tue, Sep 9, 2014 at 4:54 PM, Yan Fang <ya...@gmail.com> wrote:

> Hi guys,
>
> realize that both Storm and Spark Streaming have sliding window
> implementation while Samza only has the fixed window (not sure if it's a
> correct name). I think you guys must consider this idea at the beginning of
> designing the Samza. What was the thought? Thank you.
>
> Cheers,
>
> Fang, Yan
> yanfang724@gmail.com
> +1 (206) 849-4108
>