You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Evgeniy Khyst <ev...@gmail.com> on 2016/06/22 15:09:44 UTC

Question about Fields grouping

Hi,

I can't find information how fields grouping works in case of worker fail.

With fields grouping tuples are partitioned by some key and are sent to
different tasks.
Tuples with the same key goes to the same task.

When a worker dies, the supervisor will restart it. If it continuously
fails on startup and is unable to heartbeat to Nimbus, Nimbus will reassign
the worker to another machine.

Does it mean that while supervisor restarts node or Nimbus reassigns
worker, tuples that were processed by tasks on failed worker will be routed
to other tasks?

If yes, is it possible that while worker is restarted tuples by fields
grouping are directed to some other task, after worker is successfully
restarted or reassigned, tuples will be routed to taks on just restarted
worker?

In this case there is a chance that tuple with key "1", for example, will
be processed by task 1, while worker for task 2 is restarted. After
successful restart of worker new tuple emitted by spout with the same key
"1" will be routed to task 2 on just restarted worker, while tuple with key
"1" processing on task 1 is still in progress.

Does Storm provide guarantee that described situation will never happen and
when using fields grouping all tuples with the same key will be processed
by the same task even in the case of worker failure?


Best regards,
Evgeniy

Re: Question about Fields grouping

Posted by Navin Ipe <na...@searchlighthealth.com>.
As long as the bolt ID remains the same, the tuple will continue going to
the same task. See this:
http://nrecursions.blogspot.in/2016/09/concepts-about-storm-you-need-to-know.html#fieldsgroupingdoesnotgetoverwhelmedwithdata

On Thu, Jun 23, 2016 at 3:41 PM, Navin Ipe <na...@searchlighthealth.com>
wrote:

> Good question. At least one thing I know is that when a worker dies, it
> won't be able to ack the tuples it received, so those tuples will fail and
> you need to write code in your Spout's fail() method to re-emit them.
> My best guess is that your tuple with key 1 will continue going to task 1
> when it is restarted.
> It'd help if someone could confirm.
>
> On Wed, Jun 22, 2016 at 8:39 PM, Evgeniy Khyst <ev...@gmail.com>
> wrote:
>
>> Hi,
>>
>> I can't find information how fields grouping works in case of worker fail.
>>
>> With fields grouping tuples are partitioned by some key and are sent to
>> different tasks.
>> Tuples with the same key goes to the same task.
>>
>> When a worker dies, the supervisor will restart it. If it continuously
>> fails on startup and is unable to heartbeat to Nimbus, Nimbus will reassign
>> the worker to another machine.
>>
>> Does it mean that while supervisor restarts node or Nimbus reassigns
>> worker, tuples that were processed by tasks on failed worker will be routed
>> to other tasks?
>>
>> If yes, is it possible that while worker is restarted tuples by fields
>> grouping are directed to some other task, after worker is successfully
>> restarted or reassigned, tuples will be routed to taks on just restarted
>> worker?
>>
>> In this case there is a chance that tuple with key "1", for example, will
>> be processed by task 1, while worker for task 2 is restarted. After
>> successful restart of worker new tuple emitted by spout with the same key
>> "1" will be routed to task 2 on just restarted worker, while tuple with key
>> "1" processing on task 1 is still in progress.
>>
>> Does Storm provide guarantee that described situation will never happen
>> and when using fields grouping all tuples with the same key will be
>> processed by the same task even in the case of worker failure?
>>
>>
>> Best regards,
>> Evgeniy
>>
>
>
>
> --
> Regards,
> Navin
>



-- 
Regards,
Navin

Re: Question about Fields grouping

Posted by Navin Ipe <na...@searchlighthealth.com>.
Good question. At least one thing I know is that when a worker dies, it
won't be able to ack the tuples it received, so those tuples will fail and
you need to write code in your Spout's fail() method to re-emit them.
My best guess is that your tuple with key 1 will continue going to task 1
when it is restarted.
It'd help if someone could confirm.

On Wed, Jun 22, 2016 at 8:39 PM, Evgeniy Khyst <ev...@gmail.com>
wrote:

> Hi,
>
> I can't find information how fields grouping works in case of worker fail.
>
> With fields grouping tuples are partitioned by some key and are sent to
> different tasks.
> Tuples with the same key goes to the same task.
>
> When a worker dies, the supervisor will restart it. If it continuously
> fails on startup and is unable to heartbeat to Nimbus, Nimbus will reassign
> the worker to another machine.
>
> Does it mean that while supervisor restarts node or Nimbus reassigns
> worker, tuples that were processed by tasks on failed worker will be routed
> to other tasks?
>
> If yes, is it possible that while worker is restarted tuples by fields
> grouping are directed to some other task, after worker is successfully
> restarted or reassigned, tuples will be routed to taks on just restarted
> worker?
>
> In this case there is a chance that tuple with key "1", for example, will
> be processed by task 1, while worker for task 2 is restarted. After
> successful restart of worker new tuple emitted by spout with the same key
> "1" will be routed to task 2 on just restarted worker, while tuple with key
> "1" processing on task 1 is still in progress.
>
> Does Storm provide guarantee that described situation will never happen
> and when using fields grouping all tuples with the same key will be
> processed by the same task even in the case of worker failure?
>
>
> Best regards,
> Evgeniy
>



-- 
Regards,
Navin