You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@apex.apache.org by "McCullough, Alex" <Al...@capitalone.com> on 2016/08/12 16:36:23 UTC

Round Robin Partitioning with Dynamic Partitioning

I can’t find any examples of how to set round robin partitioning or how to set dynamic partitioning, are there any example applications I can look at?

The DataTorrent/Examples github page has a dynamic partitioning example but it just looks to be a shell, I don’t see any actual logic implemented…

Thanks,
Alex
________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Round Robin Partitioning with Dynamic Partitioning

Posted by Thomas Weise <th...@gmail.com>.
If idempotency is needed (replay on recovery) and the order of tuples in
the stream can change within a window (multiple upstream partitions), then
it may be better to use a stateless hash function that ensures even
distribution.


On Fri, Aug 12, 2016 at 10:10 AM, Munagala Ramanath <ra...@datatorrent.com>
wrote:

> I assume the situation is that you have operators A -> {B1, B2, ...} where
> Bi are partitions of B and
> you want to distribute incoming tuples from A to the Bi in round-robin
> fashion.
>
> For this, you'll need to create a StreamCodec; please see:
> https://github.com/DataTorrent/examples/blob/master/tutorials/partition/
> src/main/java/com/example/myapexapp/Codec3.java
>
> For round robin, you'll probably need to save the current partition id
> (i.e. an integer in the range 0...N
> where N is the number of partitions) and increment it mod N for each input
> tuple.
>
> Ram
>
> On Fri, Aug 12, 2016 at 9:53 AM, McCullough, Alex <
> Alex.McCullough@capitalone.com> wrote:
>
>> Thanks Ram.
>>
>>
>>
>> If I didn’t want dynamic partitioning and just round robin on a fixed #
>> of partitions, can it just be set through a property? If so, what is the
>> property?
>>
>>
>>
>> *From: *Munagala Ramanath <ra...@datatorrent.com>
>> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
>> *Date: *Friday, August 12, 2016 at 12:50 PM
>> *To: *"users@apex.apache.org" <us...@apex.apache.org>
>> *Subject: *Re: Round Robin Partitioning with Dynamic Partitioning
>>
>>
>>
>> Alex,
>>
>>
>>
>> Please take a look at: https://github.com/DataTor
>> rent/examples/blob/master/tutorials/dynamic-partition/
>> src/main/java/com/example/dynamic/Gen.java
>>
>>
>>
>> It shows an operator that implements both the *Partitioner* and the
>> *StatsListener* interface.
>>
>> The *processStats()* method of the latter interface checks the number of
>> emitted tuples from the argument
>>
>> and, if the count exceeds 500, it sets the number of partitions to
>> *MAX_PARTITIONS* (which is 4).
>>
>> The number of partitions initially starts out at 2.
>>
>>
>>
>> The *definePartitions()* method of the former interface checks the
>> desired number of partitions and
>>
>> if it differs from the current partition count, it performs a dynamic
>> repartition. If you turn on *DEBUG*,
>>
>> you should see log messages at some of these steps.
>>
>>
>>
>> Naturally, this is a contrived example to illustrate how to exercise the
>> functionality.
>>
>>
>>
>> Let me know if any of this is not clear or if you have further questions.
>>
>>
>>
>> Ram
>>
>>
>>
>>
>>
>> On Fri, Aug 12, 2016 at 9:36 AM, McCullough, Alex <
>> Alex.McCullough@capitalone.com> wrote:
>>
>> I can’t find any examples of how to set round robin partitioning or how
>> to set dynamic partitioning, are there any example applications I can look
>> at?
>>
>>
>>
>> The DataTorrent/Examples github page has a dynamic partitioning example
>> but it just looks to be a shell, I don’t see any actual logic implemented…
>>
>>
>>
>> Thanks,
>>
>> Alex
>>
>>
>> ------------------------------
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>>
>>
>> ------------------------------
>>
>> The information contained in this e-mail is confidential and/or
>> proprietary to Capital One and/or its affiliates and may only be used
>> solely in performance of work or services for Capital One. The information
>> transmitted herewith is intended only for use by the individual or entity
>> to which it is addressed. If the reader of this message is not the intended
>> recipient, you are hereby notified that any review, retransmission,
>> dissemination, distribution, copying or other use of, or taking of any
>> action in reliance upon this information is strictly prohibited. If you
>> have received this communication in error, please contact the sender and
>> delete the material from your computer.
>>
>
>

Re: Round Robin Partitioning with Dynamic Partitioning

Posted by Munagala Ramanath <ra...@datatorrent.com>.
I assume the situation is that you have operators A -> {B1, B2, ...} where
Bi are partitions of B and
you want to distribute incoming tuples from A to the Bi in round-robin
fashion.

For this, you'll need to create a StreamCodec; please see:
https://github.com/DataTorrent/examples/blob/master/tutorials/partition/src/main/java/com/example/myapexapp/Codec3.java

For round robin, you'll probably need to save the current partition id
(i.e. an integer in the range 0...N
where N is the number of partitions) and increment it mod N for each input
tuple.

Ram

On Fri, Aug 12, 2016 at 9:53 AM, McCullough, Alex <
Alex.McCullough@capitalone.com> wrote:

> Thanks Ram.
>
>
>
> If I didn’t want dynamic partitioning and just round robin on a fixed # of
> partitions, can it just be set through a property? If so, what is the
> property?
>
>
>
> *From: *Munagala Ramanath <ra...@datatorrent.com>
> *Reply-To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Date: *Friday, August 12, 2016 at 12:50 PM
> *To: *"users@apex.apache.org" <us...@apex.apache.org>
> *Subject: *Re: Round Robin Partitioning with Dynamic Partitioning
>
>
>
> Alex,
>
>
>
> Please take a look at: https://github.com/DataTorrent/examples/blob/
> master/tutorials/dynamic-partition/src/main/java/com/
> example/dynamic/Gen.java
>
>
>
> It shows an operator that implements both the *Partitioner* and the
> *StatsListener* interface.
>
> The *processStats()* method of the latter interface checks the number of
> emitted tuples from the argument
>
> and, if the count exceeds 500, it sets the number of partitions to
> *MAX_PARTITIONS* (which is 4).
>
> The number of partitions initially starts out at 2.
>
>
>
> The *definePartitions()* method of the former interface checks the
> desired number of partitions and
>
> if it differs from the current partition count, it performs a dynamic
> repartition. If you turn on *DEBUG*,
>
> you should see log messages at some of these steps.
>
>
>
> Naturally, this is a contrived example to illustrate how to exercise the
> functionality.
>
>
>
> Let me know if any of this is not clear or if you have further questions.
>
>
>
> Ram
>
>
>
>
>
> On Fri, Aug 12, 2016 at 9:36 AM, McCullough, Alex <
> Alex.McCullough@capitalone.com> wrote:
>
> I can’t find any examples of how to set round robin partitioning or how to
> set dynamic partitioning, are there any example applications I can look at?
>
>
>
> The DataTorrent/Examples github page has a dynamic partitioning example
> but it just looks to be a shell, I don’t see any actual logic implemented…
>
>
>
> Thanks,
>
> Alex
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>

Re: Round Robin Partitioning with Dynamic Partitioning

Posted by "McCullough, Alex" <Al...@capitalone.com>.
Thanks Ram.

If I didn’t want dynamic partitioning and just round robin on a fixed # of partitions, can it just be set through a property? If so, what is the property?

From: Munagala Ramanath <ra...@datatorrent.com>
Reply-To: "users@apex.apache.org" <us...@apex.apache.org>
Date: Friday, August 12, 2016 at 12:50 PM
To: "users@apex.apache.org" <us...@apex.apache.org>
Subject: Re: Round Robin Partitioning with Dynamic Partitioning

Alex,

Please take a look at: https://github.com/DataTorrent/examples/blob/master/tutorials/dynamic-partition/src/main/java/com/example/dynamic/Gen.java

It shows an operator that implements both the Partitioner and the StatsListener interface.
The processStats() method of the latter interface checks the number of emitted tuples from the argument
and, if the count exceeds 500, it sets the number of partitions to MAX_PARTITIONS (which is 4).
The number of partitions initially starts out at 2.

The definePartitions() method of the former interface checks the desired number of partitions and
if it differs from the current partition count, it performs a dynamic repartition. If you turn on DEBUG,
you should see log messages at some of these steps.

Naturally, this is a contrived example to illustrate how to exercise the functionality.

Let me know if any of this is not clear or if you have further questions.

Ram


On Fri, Aug 12, 2016 at 9:36 AM, McCullough, Alex <Al...@capitalone.com>> wrote:
I can’t find any examples of how to set round robin partitioning or how to set dynamic partitioning, are there any example applications I can look at?

The DataTorrent/Examples github page has a dynamic partitioning example but it just looks to be a shell, I don’t see any actual logic implemented…

Thanks,
Alex

________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.

Re: Round Robin Partitioning with Dynamic Partitioning

Posted by Munagala Ramanath <ra...@datatorrent.com>.
Alex,

Please take a look at:
https://github.com/DataTorrent/examples/blob/master/tutorials/dynamic-partition/src/main/java/com/example/dynamic/Gen.java

It shows an operator that implements both the *Partitioner* and the
*StatsListener* interface.
The *processStats()* method of the latter interface checks the number of
emitted tuples from the argument
and, if the count exceeds 500, it sets the number of partitions to
*MAX_PARTITIONS* (which is 4).
The number of partitions initially starts out at 2.

The *definePartitions()* method of the former interface checks the desired
number of partitions and
if it differs from the current partition count, it performs a dynamic
repartition. If you turn on *DEBUG*,
you should see log messages at some of these steps.

Naturally, this is a contrived example to illustrate how to exercise the
functionality.

Let me know if any of this is not clear or if you have further questions.

Ram


On Fri, Aug 12, 2016 at 9:36 AM, McCullough, Alex <
Alex.McCullough@capitalone.com> wrote:

> I can’t find any examples of how to set round robin partitioning or how to
> set dynamic partitioning, are there any example applications I can look at?
>
>
>
> The DataTorrent/Examples github page has a dynamic partitioning example
> but it just looks to be a shell, I don’t see any actual logic implemented…
>
>
>
> Thanks,
>
> Alex
>
> ------------------------------
>
> The information contained in this e-mail is confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>