You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Seungtack Baek <se...@precocityllc.com> on 2015/06/08 06:12:12 UTC

Re: Storm Message Flow Question

Hi,

I have read from the documentation that if you have more spout tasks than
kafka partition, the excessive tasks will remain idle for entire lifecycle
of the topology.

Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
assigned to each partitions in kafka and the other 2 will remain idle.
However, does that mean that only the bolts within the same worker will get
the messages (assuming shuffle grouping)? Or, do the messages get emitted
to whatever bolt taks available, regardless of which worker?

Thanks,
Baek


*Seungtack Baek | Precocity, LLC*

Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715

*SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
www.precocityllc.com


This is the end of this message.

--

On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <
seungtackbaek@precocityllc.com> wrote:

> Hi,
>
> I have read from the documentation that if you have more spout tasks than
> kafka partition, the excessive tasks will remain idle for entire lifecycle
> of the topology.
>
> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
> assigned to each partitions in kafka and the other 2 will remain idle.
> However, does that mean that only the bolts within the same worker will get
> the messages (assuming shuffle grouping)? Or, do the messages get emitted
> to whatever bolt taks available, regardless of which worker?
>
> Thanks,
> Baek
>
>
> *Seungtack Baek | Precocity, LLC*
>
> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>
> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
> www.precocityllc.com
>
>
> This is the end of this message.
>
> --
>

Re: Storm Message Flow Question

Posted by Dima Dragan <di...@belleron.net>.
For your case, if messages have the same field value, they will be send to
only one executor in whole topology.

Best regards,
Dmytro Dragan
On Jun 8, 2015 08:31, "Seungtack Baek" <se...@precocityllc.com>
wrote:

> Thanks a lot for such a timely response.
>
> So, even if each bolt tasks resides in different worker (different server
> in our use-case), the messages go to all 32 tasks, right?
>
> Also, this leads me into another question. (I think the answer is yes).
> Given field grouping guarantees that messages with same "field value" go
> to the same task, does "the same task" mean across all workers? or within
> same worker.
>
> For example, let's two kafka partition 0, 1, spout task s1, s2 and bolt
> tasks b1, b2, b3 and b4 distributed across two workers w1 and w2.
> So it looks like,
> w1
>  - partition_0 -> s1 -> b1 & b2
> w2
>  - partition_1 -> s2 -> b3 & b4
>
> When two messages with same field value, m1 and m2 are produced to kafka
> partition 0 and 1, respectively, does both m1 and m2 go to same bolt, say
> b3? Or, does it get sent to same bolt in each worker (say b1 in w1 and b3
> in w3)?
>
> Simply put, does field grouping groups messages in whole topology? or only
> groups in a single worker?
>
> Thanks,
> Baek
>
>
>
>
>
> *Seungtack Baek | Precocity, LLC*
>
> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>
> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
> www.precocityllc.com
>
>
> This is the end of this message.
>
> --
>
> On Mon, Jun 8, 2015 at 12:17 AM, Dima Dragan <di...@belleron.net>
> wrote:
>
>> Hi, Seungtack!
>>
>> Distribution of messages will be depends only from grouping (in case of
>> "shuffe grouping", Tuples are randomly distributed across the all bolt's
>> tasks in a way such that each bolt is guaranteed to get an equal number of
>> tuples.
>>
>> Best regards,
>> Dmytro Dragan
>> On Jun 8, 2015 07:12, "Seungtack Baek" <se...@precocityllc.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have read from the documentation that if you have more spout tasks
>>> than kafka partition, the excessive tasks will remain idle for entire
>>> lifecycle of the topology.
>>>
>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>> However, does that mean that only the bolts within the same worker will get
>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>> to whatever bolt taks available, regardless of which worker?
>>>
>>> Thanks,
>>> Baek
>>>
>>>
>>> *Seungtack Baek | Precocity, LLC*
>>>
>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>
>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>> www.precocityllc.com
>>>
>>>
>>> This is the end of this message.
>>>
>>> --
>>>
>>> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <
>>> seungtackbaek@precocityllc.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have read from the documentation that if you have more spout tasks
>>>> than kafka partition, the excessive tasks will remain idle for entire
>>>> lifecycle of the topology.
>>>>
>>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>>> However, does that mean that only the bolts within the same worker will get
>>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>>> to whatever bolt taks available, regardless of which worker?
>>>>
>>>> Thanks,
>>>> Baek
>>>>
>>>>
>>>> *Seungtack Baek | Precocity, LLC*
>>>>
>>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>>
>>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>>> www.precocityllc.com
>>>>
>>>>
>>>> This is the end of this message.
>>>>
>>>> --
>>>>
>>>
>>>
>

Re: Storm Message Flow Question

Posted by Vineet Mishra <cl...@gmail.com>.
For having the unique tuple access across the Bolts use shuffle group
(otherwise for some specific use case refer to my last mail links), it will
distribute the data uniformly across all the bolts without heavily loading
any of the bolt, it basically works on the hashing principle, assign the
tuple to the corresponding bolt based on the hashing mechanism.

Cheers!


On Mon, Jun 8, 2015 at 11:00 AM, Seungtack Baek <
seungtackbaek@precocityllc.com> wrote:

> Thanks a lot for such a timely response.
>
> So, even if each bolt tasks resides in different worker (different server
> in our use-case), the messages go to all 32 tasks, right?
>
> Also, this leads me into another question. (I think the answer is yes).
> Given field grouping guarantees that messages with same "field value" go
> to the same task, does "the same task" mean across all workers? or within
> same worker.
>
> For example, let's two kafka partition 0, 1, spout task s1, s2 and bolt
> tasks b1, b2, b3 and b4 distributed across two workers w1 and w2.
> So it looks like,
> w1
>  - partition_0 -> s1 -> b1 & b2
> w2
>  - partition_1 -> s2 -> b3 & b4
>
> When two messages with same field value, m1 and m2 are produced to kafka
> partition 0 and 1, respectively, does both m1 and m2 go to same bolt, say
> b3? Or, does it get sent to same bolt in each worker (say b1 in w1 and b3
> in w3)?
>
> Simply put, does field grouping groups messages in whole topology? or only
> groups in a single worker?
>
> Thanks,
> Baek
>
>
>
>
>
> *Seungtack Baek | Precocity, LLC*
>
> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>
> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
> www.precocityllc.com
>
>
> This is the end of this message.
>
> --
>
> On Mon, Jun 8, 2015 at 12:17 AM, Dima Dragan <di...@belleron.net>
> wrote:
>
>> Hi, Seungtack!
>>
>> Distribution of messages will be depends only from grouping (in case of
>> "shuffe grouping", Tuples are randomly distributed across the all bolt's
>> tasks in a way such that each bolt is guaranteed to get an equal number of
>> tuples.
>>
>> Best regards,
>> Dmytro Dragan
>> On Jun 8, 2015 07:12, "Seungtack Baek" <se...@precocityllc.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have read from the documentation that if you have more spout tasks
>>> than kafka partition, the excessive tasks will remain idle for entire
>>> lifecycle of the topology.
>>>
>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>> However, does that mean that only the bolts within the same worker will get
>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>> to whatever bolt taks available, regardless of which worker?
>>>
>>> Thanks,
>>> Baek
>>>
>>>
>>> *Seungtack Baek | Precocity, LLC*
>>>
>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>
>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>> www.precocityllc.com
>>>
>>>
>>> This is the end of this message.
>>>
>>> --
>>>
>>> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <
>>> seungtackbaek@precocityllc.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have read from the documentation that if you have more spout tasks
>>>> than kafka partition, the excessive tasks will remain idle for entire
>>>> lifecycle of the topology.
>>>>
>>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>>> However, does that mean that only the bolts within the same worker will get
>>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>>> to whatever bolt taks available, regardless of which worker?
>>>>
>>>> Thanks,
>>>> Baek
>>>>
>>>>
>>>> *Seungtack Baek | Precocity, LLC*
>>>>
>>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>>
>>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>>> www.precocityllc.com
>>>>
>>>>
>>>> This is the end of this message.
>>>>
>>>> --
>>>>
>>>
>>>
>

Re: Storm Message Flow Question

Posted by Seungtack Baek <se...@precocityllc.com>.
Thanks a lot for such a timely response.

So, even if each bolt tasks resides in different worker (different server
in our use-case), the messages go to all 32 tasks, right?

Also, this leads me into another question. (I think the answer is yes).
Given field grouping guarantees that messages with same "field value" go to
the same task, does "the same task" mean across all workers? or within same
worker.

For example, let's two kafka partition 0, 1, spout task s1, s2 and bolt
tasks b1, b2, b3 and b4 distributed across two workers w1 and w2.
So it looks like,
w1
 - partition_0 -> s1 -> b1 & b2
w2
 - partition_1 -> s2 -> b3 & b4

When two messages with same field value, m1 and m2 are produced to kafka
partition 0 and 1, respectively, does both m1 and m2 go to same bolt, say
b3? Or, does it get sent to same bolt in each worker (say b1 in w1 and b3
in w3)?

Simply put, does field grouping groups messages in whole topology? or only
groups in a single worker?

Thanks,
Baek





*Seungtack Baek | Precocity, LLC*

Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715

*SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
www.precocityllc.com


This is the end of this message.

--

On Mon, Jun 8, 2015 at 12:17 AM, Dima Dragan <di...@belleron.net>
wrote:

> Hi, Seungtack!
>
> Distribution of messages will be depends only from grouping (in case of
> "shuffe grouping", Tuples are randomly distributed across the all bolt's
> tasks in a way such that each bolt is guaranteed to get an equal number of
> tuples.
>
> Best regards,
> Dmytro Dragan
> On Jun 8, 2015 07:12, "Seungtack Baek" <se...@precocityllc.com>
> wrote:
>
>> Hi,
>>
>> I have read from the documentation that if you have more spout tasks than
>> kafka partition, the excessive tasks will remain idle for entire lifecycle
>> of the topology.
>>
>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>> assigned to each partitions in kafka and the other 2 will remain idle.
>> However, does that mean that only the bolts within the same worker will get
>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>> to whatever bolt taks available, regardless of which worker?
>>
>> Thanks,
>> Baek
>>
>>
>> *Seungtack Baek | Precocity, LLC*
>>
>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>
>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>> www.precocityllc.com
>>
>>
>> This is the end of this message.
>>
>> --
>>
>> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <
>> seungtackbaek@precocityllc.com> wrote:
>>
>>> Hi,
>>>
>>> I have read from the documentation that if you have more spout tasks
>>> than kafka partition, the excessive tasks will remain idle for entire
>>> lifecycle of the topology.
>>>
>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>> However, does that mean that only the bolts within the same worker will get
>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>> to whatever bolt taks available, regardless of which worker?
>>>
>>> Thanks,
>>> Baek
>>>
>>>
>>> *Seungtack Baek | Precocity, LLC*
>>>
>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>
>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>> www.precocityllc.com
>>>
>>>
>>> This is the end of this message.
>>>
>>> --
>>>
>>
>>

Re: Storm Message Flow Question

Posted by Seungtack Baek <se...@precocityllc.com>.
It surely did! Thanks for such a precint answer!

Thanks,
Baek

> On Jun 8, 2015, at 12:43 AM, Vineet Mishra <cl...@gmail.com> wrote:
> 
> Any Storm Streaming job runs in its own space and doesn't interact with other topology. Your tuple distribution will be across the topology within the number of workers on the number of bolts defined, so for instance if you have shuffle grouping enabled and specific data of your interest
> 
> 0   1 - Kafka Partition
> s1 s2 - Subscribed Spouts
> b1 b2 b3 b4 - bolts available
> 
> Then all the data(tuple) which is passing through s1 and s2(which indeed are subscribed to Kafka Partition 0 and 1) are going to emit to bolts b[1-4] based on the hash of the tuple key so it will be something like, for the data
> 
> tuple(somefancydata1) - b1
> tuple(somefancydata43) - b3
> tuple(somefancydata855) - b1
> 
> and so on. . .the data(tuple) will be distinct across the bolts!
> 
> Let me know if that solves your concern!
> 
> Cheers!
> 
>> On Mon, Jun 8, 2015 at 11:05 AM, Seungtack Baek <se...@precocityllc.com> wrote:
>> @Vineet,
>> 
>> Thanks a lot for "another" timely response!
>> 
>> Actually I have read that section but it wasn't still clear (to me, and I guess to me only) whether field grouping was concerning the whole cluster (or topology) or for the same worker only.. Maybe I am not too familiar with the "zoo".
>> 
>> 
>> Thanks,
>> Baek
>> 
>> 
>> Seungtack Baek | Precocity, LLC
>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>> SeungtackBaek@precocityllc.com | www.precocityllc.com
>> 
>> This is the end of this message.
>> --
>> 
>>> On Mon, Jun 8, 2015 at 12:31 AM, Vineet Mishra <cl...@gmail.com> wrote:
>>> Hi Seung,
>>> 
>>> You can better refer to the section Stream Groupings in the following link attached below
>>> 
>>> https://storm.apache.org/documentation/Concepts.html
>>> 
>>> It will get you better understanding of the tuple distribution in Storm, for clear understanding here is the pictorial representation of the same,
>>> 
>>> https://blog.safaribooksonline.com/wp-content/uploads/2013/06/Grouping.png
>>> 
>>> Cheers!
>>> 
>>>> On Mon, Jun 8, 2015 at 10:47 AM, Dima Dragan <di...@belleron.net> wrote:
>>>> Hi, Seungtack!
>>>> 
>>>> Distribution of messages will be depends only from grouping (in case of "shuffe grouping", Tuples are randomly distributed across the all bolt's tasks in a way such that each bolt is guaranteed to get an equal number of tuples.
>>>> 
>>>> Best regards,
>>>> Dmytro Dragan
>>>> 
>>>>> On Jun 8, 2015 07:12, "Seungtack Baek" <se...@precocityllc.com> wrote:
>>>>> Hi,
>>>>>  
>>>>> I have read from the documentation that if you have more spout tasks than kafka partition, the excessive tasks will remain idle for entire lifecycle of the topology.
>>>>> 
>>>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4 workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be assigned to each partitions in kafka and the other 2 will remain idle. However, does that mean that only the bolts within the same worker will get the messages (assuming shuffle grouping)? Or, do the messages get emitted to whatever bolt taks available, regardless of which worker?
>>>>> 
>>>>> Thanks,
>>>>> Baek
>>>>> 
>>>>> 
>>>>> Seungtack Baek | Precocity, LLC
>>>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>>> SeungtackBaek@precocityllc.com | www.precocityllc.com
>>>>> 
>>>>> This is the end of this message.
>>>>> --
>>>>> 
>>>>>> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <se...@precocityllc.com> wrote:
>>>>>> Hi,
>>>>>>  
>>>>>> I have read from the documentation that if you have more spout tasks than kafka partition, the excessive tasks will remain idle for entire lifecycle of the topology.
>>>>>> 
>>>>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4 workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be assigned to each partitions in kafka and the other 2 will remain idle. However, does that mean that only the bolts within the same worker will get the messages (assuming shuffle grouping)? Or, do the messages get emitted to whatever bolt taks available, regardless of which worker?
>>>>>> 
>>>>>> Thanks,
>>>>>> Baek
>>>>>> 
>>>>>> 
>>>>>> Seungtack Baek | Precocity, LLC
>>>>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>>>> SeungtackBaek@precocityllc.com | www.precocityllc.com
>>>>>> 
>>>>>> This is the end of this message.
>>>>>> --
> 

Re: Storm Message Flow Question

Posted by Vineet Mishra <cl...@gmail.com>.
Any Storm Streaming job runs in its own space and doesn't interact with
other topology. Your tuple distribution will be across the topology within
the number of workers on the number of bolts defined, so for instance if
you have shuffle grouping enabled and specific data of your interest

0   1 - Kafka Partition
s1 s2 - Subscribed Spouts
b1 b2 b3 b4 - bolts available

Then all the data(tuple) which is passing through s1 and s2(which indeed
are subscribed to Kafka Partition 0 and 1) are going to emit to bolts
b[1-4] based on the hash of the tuple key so it will be something like, for
the data

tuple(somefancydata1) - b1
tuple(somefancydata43) - b3
tuple(somefancydata855) - b1

and so on. . .the data(tuple) will be distinct across the bolts!

Let me know if that solves your concern!

Cheers!

On Mon, Jun 8, 2015 at 11:05 AM, Seungtack Baek <
seungtackbaek@precocityllc.com> wrote:

> @Vineet,
>
> Thanks a lot for "another" timely response!
>
> Actually I have read that section but it wasn't still clear (to me, and I
> guess to me only) whether field grouping was concerning the whole cluster
> (or topology) or for the same worker only.. Maybe I am not too familiar
> with the "zoo".
>
>
> Thanks,
> Baek
>
>
> *Seungtack Baek | Precocity, LLC*
>
> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>
> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
> www.precocityllc.com
>
>
> This is the end of this message.
>
> --
>
> On Mon, Jun 8, 2015 at 12:31 AM, Vineet Mishra <cl...@gmail.com>
> wrote:
>
>> Hi Seung,
>>
>> You can better refer to the section Stream Groupings in the following
>> link attached below
>>
>> https://storm.apache.org/documentation/Concepts.html
>>
>> It will get you better understanding of the tuple distribution in Storm,
>> for clear understanding here is the pictorial representation of the same,
>>
>> https://blog.safaribooksonline.com/wp-content/uploads/2013/06/Grouping.png
>>
>> Cheers!
>>
>> On Mon, Jun 8, 2015 at 10:47 AM, Dima Dragan <di...@belleron.net>
>> wrote:
>>
>>> Hi, Seungtack!
>>>
>>> Distribution of messages will be depends only from grouping (in case of
>>> "shuffe grouping", Tuples are randomly distributed across the all bolt's
>>> tasks in a way such that each bolt is guaranteed to get an equal number of
>>> tuples.
>>>
>>> Best regards,
>>> Dmytro Dragan
>>> On Jun 8, 2015 07:12, "Seungtack Baek" <se...@precocityllc.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have read from the documentation that if you have more spout tasks
>>>> than kafka partition, the excessive tasks will remain idle for entire
>>>> lifecycle of the topology.
>>>>
>>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>>> However, does that mean that only the bolts within the same worker will get
>>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>>> to whatever bolt taks available, regardless of which worker?
>>>>
>>>> Thanks,
>>>> Baek
>>>>
>>>>
>>>> *Seungtack Baek | Precocity, LLC*
>>>>
>>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>>
>>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>>> www.precocityllc.com
>>>>
>>>>
>>>> This is the end of this message.
>>>>
>>>> --
>>>>
>>>> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <
>>>> seungtackbaek@precocityllc.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have read from the documentation that if you have more spout tasks
>>>>> than kafka partition, the excessive tasks will remain idle for entire
>>>>> lifecycle of the topology.
>>>>>
>>>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>>>> However, does that mean that only the bolts within the same worker will get
>>>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>>>> to whatever bolt taks available, regardless of which worker?
>>>>>
>>>>> Thanks,
>>>>> Baek
>>>>>
>>>>>
>>>>> *Seungtack Baek | Precocity, LLC*
>>>>>
>>>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>>>
>>>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>>>> www.precocityllc.com
>>>>>
>>>>>
>>>>> This is the end of this message.
>>>>>
>>>>> --
>>>>>
>>>>
>>>>
>>
>

Re: Storm Message Flow Question

Posted by Seungtack Baek <se...@precocityllc.com>.
@Vineet,

Thanks a lot for "another" timely response!

Actually I have read that section but it wasn't still clear (to me, and I
guess to me only) whether field grouping was concerning the whole cluster
(or topology) or for the same worker only.. Maybe I am not too familiar
with the "zoo".


Thanks,
Baek


*Seungtack Baek | Precocity, LLC*

Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715

*SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
www.precocityllc.com


This is the end of this message.

--

On Mon, Jun 8, 2015 at 12:31 AM, Vineet Mishra <cl...@gmail.com>
wrote:

> Hi Seung,
>
> You can better refer to the section Stream Groupings in the following link
> attached below
>
> https://storm.apache.org/documentation/Concepts.html
>
> It will get you better understanding of the tuple distribution in Storm,
> for clear understanding here is the pictorial representation of the same,
>
> https://blog.safaribooksonline.com/wp-content/uploads/2013/06/Grouping.png
>
> Cheers!
>
> On Mon, Jun 8, 2015 at 10:47 AM, Dima Dragan <di...@belleron.net>
> wrote:
>
>> Hi, Seungtack!
>>
>> Distribution of messages will be depends only from grouping (in case of
>> "shuffe grouping", Tuples are randomly distributed across the all bolt's
>> tasks in a way such that each bolt is guaranteed to get an equal number of
>> tuples.
>>
>> Best regards,
>> Dmytro Dragan
>> On Jun 8, 2015 07:12, "Seungtack Baek" <se...@precocityllc.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have read from the documentation that if you have more spout tasks
>>> than kafka partition, the excessive tasks will remain idle for entire
>>> lifecycle of the topology.
>>>
>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>> However, does that mean that only the bolts within the same worker will get
>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>> to whatever bolt taks available, regardless of which worker?
>>>
>>> Thanks,
>>> Baek
>>>
>>>
>>> *Seungtack Baek | Precocity, LLC*
>>>
>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>
>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>> www.precocityllc.com
>>>
>>>
>>> This is the end of this message.
>>>
>>> --
>>>
>>> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <
>>> seungtackbaek@precocityllc.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have read from the documentation that if you have more spout tasks
>>>> than kafka partition, the excessive tasks will remain idle for entire
>>>> lifecycle of the topology.
>>>>
>>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>>> However, does that mean that only the bolts within the same worker will get
>>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>>> to whatever bolt taks available, regardless of which worker?
>>>>
>>>> Thanks,
>>>> Baek
>>>>
>>>>
>>>> *Seungtack Baek | Precocity, LLC*
>>>>
>>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>>
>>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>>> www.precocityllc.com
>>>>
>>>>
>>>> This is the end of this message.
>>>>
>>>> --
>>>>
>>>
>>>
>

Re: Storm Message Flow Question

Posted by Vineet Mishra <cl...@gmail.com>.
Hi Seung,

You can better refer to the section Stream Groupings in the following link
attached below

https://storm.apache.org/documentation/Concepts.html

It will get you better understanding of the tuple distribution in Storm,
for clear understanding here is the pictorial representation of the same,

https://blog.safaribooksonline.com/wp-content/uploads/2013/06/Grouping.png

Cheers!

On Mon, Jun 8, 2015 at 10:47 AM, Dima Dragan <di...@belleron.net>
wrote:

> Hi, Seungtack!
>
> Distribution of messages will be depends only from grouping (in case of
> "shuffe grouping", Tuples are randomly distributed across the all bolt's
> tasks in a way such that each bolt is guaranteed to get an equal number of
> tuples.
>
> Best regards,
> Dmytro Dragan
> On Jun 8, 2015 07:12, "Seungtack Baek" <se...@precocityllc.com>
> wrote:
>
>> Hi,
>>
>> I have read from the documentation that if you have more spout tasks than
>> kafka partition, the excessive tasks will remain idle for entire lifecycle
>> of the topology.
>>
>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>> assigned to each partitions in kafka and the other 2 will remain idle.
>> However, does that mean that only the bolts within the same worker will get
>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>> to whatever bolt taks available, regardless of which worker?
>>
>> Thanks,
>> Baek
>>
>>
>> *Seungtack Baek | Precocity, LLC*
>>
>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>
>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>> www.precocityllc.com
>>
>>
>> This is the end of this message.
>>
>> --
>>
>> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <
>> seungtackbaek@precocityllc.com> wrote:
>>
>>> Hi,
>>>
>>> I have read from the documentation that if you have more spout tasks
>>> than kafka partition, the excessive tasks will remain idle for entire
>>> lifecycle of the topology.
>>>
>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>> However, does that mean that only the bolts within the same worker will get
>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>> to whatever bolt taks available, regardless of which worker?
>>>
>>> Thanks,
>>> Baek
>>>
>>>
>>> *Seungtack Baek | Precocity, LLC*
>>>
>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>
>>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>>> www.precocityllc.com
>>>
>>>
>>> This is the end of this message.
>>>
>>> --
>>>
>>
>>

Re: Storm Message Flow Question

Posted by Dima Dragan <di...@belleron.net>.
Hi, Seungtack!

Distribution of messages will be depends only from grouping (in case of
"shuffe grouping", Tuples are randomly distributed across the all bolt's
tasks in a way such that each bolt is guaranteed to get an equal number of
tuples.

Best regards,
Dmytro Dragan
On Jun 8, 2015 07:12, "Seungtack Baek" <se...@precocityllc.com>
wrote:

> Hi,
>
> I have read from the documentation that if you have more spout tasks than
> kafka partition, the excessive tasks will remain idle for entire lifecycle
> of the topology.
>
> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
> assigned to each partitions in kafka and the other 2 will remain idle.
> However, does that mean that only the bolts within the same worker will get
> the messages (assuming shuffle grouping)? Or, do the messages get emitted
> to whatever bolt taks available, regardless of which worker?
>
> Thanks,
> Baek
>
>
> *Seungtack Baek | Precocity, LLC*
>
> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>
> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
> www.precocityllc.com
>
>
> This is the end of this message.
>
> --
>
> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <
> seungtackbaek@precocityllc.com> wrote:
>
>> Hi,
>>
>> I have read from the documentation that if you have more spout tasks than
>> kafka partition, the excessive tasks will remain idle for entire lifecycle
>> of the topology.
>>
>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>> assigned to each partitions in kafka and the other 2 will remain idle.
>> However, does that mean that only the bolts within the same worker will get
>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>> to whatever bolt taks available, regardless of which worker?
>>
>> Thanks,
>> Baek
>>
>>
>> *Seungtack Baek | Precocity, LLC*
>>
>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>
>> *SeungtackBaek@precocityllc.com <Se...@precocityllc.com>* |
>> www.precocityllc.com
>>
>>
>> This is the end of this message.
>>
>> --
>>
>
>