You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@samza.apache.org by "Mhaskar, Tushar" <tm...@paypal.com.INVALID> on 2015/08/19 19:07:31 UTC

[DISCUSS] Sliding window implementation in Samza

Hi,

  1.  I came across Windowing functionality in Samza. It looks like it implements a static window.  Is there a sliding window functionality available in Samza?

       2.   How to do aggregation across multi-node yarn node, any pointers to it?
   E.g :  Lets say I have 2 slave machines where my StreamTask implementation counts all the incoming messages. How and Where can I aggregate the data from the multiple nodes and produce one single count?

Regards,
Tushar Mhaskar


Re: [DISCUSS] Sliding window implementation in Samza

Posted by "Mhaskar, Tushar" <tm...@paypal.com.INVALID>.
Thanks Yi for the response.


I have one more question .

Lets say I am consuming from a Kafka topic which has 5 partitions and I am
implementing the WindowableTask interface .
There will be 5 Task instances created to consume data from 5 partitions.
The window function will also be called 5 times.

Does samza provide a common data storage for all the tasks? If I use the
levelDB KV store , will all the 5 tasks use the same instance of it?
 


Regards,
Tushar Mhaskar
Cell : 213-572-7867
Skype : tmhaskarpp




On 8/19/15, 12:04 PM, "Yi Pan" <ni...@gmail.com> wrote:

>Hi, Mhaskar,
>
>I assume that you were referring to WindowableTask interface, which
>enables
>a basic periodic timer function. As you noticed, it is only a very limited
>version of windowing. For more advanced windowing work, please refer to
>SAMZA-552, which is a WIP now. It will contain the sliding window concept
>with aggregate and join use cases.
>
>For your second question, I can image multiple ways of implementing it:
>
>1) If you have one aggregator job consuming all counting events from all
>upstream counter containers, you can aggregate the counters from all
>containers from the upstream counter job
>2) If you output your single counter result to a distributed KV-store that
>support "compare-and-set", you can always read the value from the store,
>increment the counter, and "compare-and-set" to make sure that you only
>update based on the version that your result is calculated.
>
>There are probably more ways of doing it. I am just putting the above two
>out here as a thought to start with.
>
>Cheers!
>
>-Yi
>
>On Wed, Aug 19, 2015 at 10:07 AM, Mhaskar, Tushar <
>tmhaskar@paypal.com.invalid> wrote:
>
>> Hi,
>>
>>   1.  I came across Windowing functionality in Samza. It looks like it
>> implements a static window.  Is there a sliding window functionality
>> available in Samza?
>>
>>        2.   How to do aggregation across multi-node yarn node, any
>> pointers to it?
>>    E.g :  Lets say I have 2 slave machines where my StreamTask
>> implementation counts all the incoming messages. How and Where can I
>> aggregate the data from the multiple nodes and produce one single count?
>>
>> Regards,
>> Tushar Mhaskar
>>
>>


Re: [DISCUSS] Sliding window implementation in Samza

Posted by Yi Pan <ni...@gmail.com>.
Hi, Tushar,

Unfortunately, no. Each task will have its own RocksDB instance, although
the KV-store name in the configuration is the same.

-Yi

On Wed, Aug 26, 2015 at 9:39 AM, Mhaskar, Tushar <
tmhaskar@paypal.com.invalid> wrote:

> Thanks for the response but I have one more question .
>
> Lets say I am consuming from a Kafka topic which has 5 partitions and I am
> implementing the WindowableTask interface .
> There will be 5 Task instances created to consume data from 5 partitions.
> The window function will also be called 5 times.
>
> Does samza provide a common data storage for all the tasks? If I use the
> KV store which Samza provides , will all the 5 tasks use the same instance
> of it?
>
>
> Regards,
> Tushar Mhaskar
>
>
>
>
> On 8/19/15, 12:04 PM, "Yi Pan" <ni...@gmail.com> wrote:
>
> >Hi, Mhaskar,
> >
> >I assume that you were referring to WindowableTask interface, which
> >enables
> >a basic periodic timer function. As you noticed, it is only a very limited
> >version of windowing. For more advanced windowing work, please refer to
> >SAMZA-552, which is a WIP now. It will contain the sliding window concept
> >with aggregate and join use cases.
> >
> >For your second question, I can image multiple ways of implementing it:
> >
> >1) If you have one aggregator job consuming all counting events from all
> >upstream counter containers, you can aggregate the counters from all
> >containers from the upstream counter job
> >2) If you output your single counter result to a distributed KV-store that
> >support "compare-and-set", you can always read the value from the store,
> >increment the counter, and "compare-and-set" to make sure that you only
> >update based on the version that your result is calculated.
> >
> >There are probably more ways of doing it. I am just putting the above two
> >out here as a thought to start with.
> >
> >Cheers!
> >
> >-Yi
> >
> >On Wed, Aug 19, 2015 at 10:07 AM, Mhaskar, Tushar <
> >tmhaskar@paypal.com.invalid> wrote:
> >
> >> Hi,
> >>
> >>   1.  I came across Windowing functionality in Samza. It looks like it
> >> implements a static window.  Is there a sliding window functionality
> >> available in Samza?
> >>
> >>        2.   How to do aggregation across multi-node yarn node, any
> >> pointers to it?
> >>    E.g :  Lets say I have 2 slave machines where my StreamTask
> >> implementation counts all the incoming messages. How and Where can I
> >> aggregate the data from the multiple nodes and produce one single count?
> >>
> >> Regards,
> >> Tushar Mhaskar
> >>
> >>
>
>

Re: [DISCUSS] Sliding window implementation in Samza

Posted by "Mhaskar, Tushar" <tm...@paypal.com.INVALID>.
Thanks for the response but I have one more question .

Lets say I am consuming from a Kafka topic which has 5 partitions and I am
implementing the WindowableTask interface .
There will be 5 Task instances created to consume data from 5 partitions.
The window function will also be called 5 times.

Does samza provide a common data storage for all the tasks? If I use the
KV store which Samza provides , will all the 5 tasks use the same instance
of it?


Regards,
Tushar Mhaskar




On 8/19/15, 12:04 PM, "Yi Pan" <ni...@gmail.com> wrote:

>Hi, Mhaskar,
>
>I assume that you were referring to WindowableTask interface, which
>enables
>a basic periodic timer function. As you noticed, it is only a very limited
>version of windowing. For more advanced windowing work, please refer to
>SAMZA-552, which is a WIP now. It will contain the sliding window concept
>with aggregate and join use cases.
>
>For your second question, I can image multiple ways of implementing it:
>
>1) If you have one aggregator job consuming all counting events from all
>upstream counter containers, you can aggregate the counters from all
>containers from the upstream counter job
>2) If you output your single counter result to a distributed KV-store that
>support "compare-and-set", you can always read the value from the store,
>increment the counter, and "compare-and-set" to make sure that you only
>update based on the version that your result is calculated.
>
>There are probably more ways of doing it. I am just putting the above two
>out here as a thought to start with.
>
>Cheers!
>
>-Yi
>
>On Wed, Aug 19, 2015 at 10:07 AM, Mhaskar, Tushar <
>tmhaskar@paypal.com.invalid> wrote:
>
>> Hi,
>>
>>   1.  I came across Windowing functionality in Samza. It looks like it
>> implements a static window.  Is there a sliding window functionality
>> available in Samza?
>>
>>        2.   How to do aggregation across multi-node yarn node, any
>> pointers to it?
>>    E.g :  Lets say I have 2 slave machines where my StreamTask
>> implementation counts all the incoming messages. How and Where can I
>> aggregate the data from the multiple nodes and produce one single count?
>>
>> Regards,
>> Tushar Mhaskar
>>
>>


Re: [DISCUSS] Sliding window implementation in Samza

Posted by Yi Pan <ni...@gmail.com>.
Hi, Mhaskar,

I assume that you were referring to WindowableTask interface, which enables
a basic periodic timer function. As you noticed, it is only a very limited
version of windowing. For more advanced windowing work, please refer to
SAMZA-552, which is a WIP now. It will contain the sliding window concept
with aggregate and join use cases.

For your second question, I can image multiple ways of implementing it:

1) If you have one aggregator job consuming all counting events from all
upstream counter containers, you can aggregate the counters from all
containers from the upstream counter job
2) If you output your single counter result to a distributed KV-store that
support "compare-and-set", you can always read the value from the store,
increment the counter, and "compare-and-set" to make sure that you only
update based on the version that your result is calculated.

There are probably more ways of doing it. I am just putting the above two
out here as a thought to start with.

Cheers!

-Yi

On Wed, Aug 19, 2015 at 10:07 AM, Mhaskar, Tushar <
tmhaskar@paypal.com.invalid> wrote:

> Hi,
>
>   1.  I came across Windowing functionality in Samza. It looks like it
> implements a static window.  Is there a sliding window functionality
> available in Samza?
>
>        2.   How to do aggregation across multi-node yarn node, any
> pointers to it?
>    E.g :  Lets say I have 2 slave machines where my StreamTask
> implementation counts all the incoming messages. How and Where can I
> aggregate the data from the multiple nodes and produce one single count?
>
> Regards,
> Tushar Mhaskar
>
>