You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Julián Bermejo Ferreiro | BEEVA <ju...@beeva.com> on 2016/05/26 08:53:44 UTC

Storm rebalancing problems

Hello,

We have a multiple-node storm cluster running on a Production environment.
We have had some issues with a couple of machines, which have been out of
service for a few hours.

Because some workers of the deployed topologies were running on the failed
machines, cluster's behaviour has been unusual (It has been running but not
as it should).

Once we recovered the failed nodes, and rebalanced the topologies, the
cluster returned to work properly.

We would like to know if there is any way to alert nimbus, when a node fall
down, in order to rebalance the affected topologies and  create new workers
in the healthy nodes of the cluster that supply those who were working on
the failed ones.

This would have helped us so much, because we could have kept consistency
in our service in spite of the failed nodes.

Any advice?

Tahnks in advance!






*JULIÁN BERMEJO FERREIRO*
*Departamento de Tecnología *
*julian.bermejo@beeva.com <ju...@beeva.com>*
<http://www.beeva.com/>

Re: Storm rebalancing problems

Posted by Jungtaek Lim <ka...@gmail.com>.

You can rebalance your topology with proper wait time without killing all
workers manually.
When 'kill' or 'rebalance' is issued, topology is immediately
'deactivated', so spouts are not fetching / emitting tuples. In wait time,
bolts process tuples which are already emitted from Spout. If bolts can
process all flowing tuples, it's a graceful restart. Same thing applies to
kill, 'graceful stop' in this case.

- Jungtaek Lim (HeartSaVioR)


2016년 5월 26일 (목) 오후 11:06, Julián Bermejo Ferreiro | BEEVA <
julian.bermejo@beeva.com>님이 작성:

> Hi Jungtaek,
>
> We are running, Storm 0.9.4, but we are planning to migrate to 1.0.1
> version.
>
> We deploy our topologies to move messages inside RabbitMQ brokers.
>
> Certanly, we have made the test of forcing a worker's die, and once nimbus
> timeout has happened, a new worker appeared in another node,  but system
> doesn't behave as good as it should. It was necessary to kill some other
> workers and rebalance a couple of times in order to get everything OK (A
> constant message flow inside our brokers).
>
> Is it possible to kill all the workers inside a topology and rebalance
> (like a kind of graceful shutdown)? Or once you kill all of them you must
> redeploy de hole topology?
>
> Is 1.0.1 version a possible solution?
>
> Thanks again.
>
>
>
>
> *JULIÁN BERMEJO FERREIRO*
> *Departamento de Tecnología *
> *julian.bermejo@beeva.com <ju...@beeva.com>*
> <http://www.beeva.com/>
>
>
>
>
> 2016-05-26 15:34 GMT+02:00 Jungtaek Lim <ka...@gmail.com>:
>
>> Hi Julián,
>>
>> Which version of Storm do you use?
>> I remember some of Storm 0.9.x versions has some issues when workers are
>> failing, so I'd like to know about it.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2016년 5월 26일 (목) 오후 5:53, Julián Bermejo Ferreiro | BEEVA <
>> julian.bermejo@beeva.com>님이 작성:
>>
>>> Hello,
>>>
>>> We have a multiple-node storm cluster running on a Production
>>> environment. We have had some issues with a couple of machines, which have
>>> been out of service for a few hours.
>>>
>>> Because some workers of the deployed topologies were running on the
>>> failed machines, cluster's behaviour has been unusual (It has been running
>>> but not as it should).
>>>
>>> Once we recovered the failed nodes, and rebalanced the topologies, the
>>> cluster returned to work properly.
>>>
>>> We would like to know if there is any way to alert nimbus, when a node
>>> fall down, in order to rebalance the affected topologies and  create new
>>> workers in the healthy nodes of the cluster that supply those who were
>>> working on the failed ones.
>>>
>>> This would have helped us so much, because we could have kept
>>> consistency in our service in spite of the failed nodes.
>>>
>>> Any advice?
>>>
>>> Tahnks in advance!
>>>
>>>
>>>
>>>
>>>
>>>
>>> *JULIÁN BERMEJO FERREIRO*
>>> *Departamento de Tecnología *
>>> *julian.bermejo@beeva.com <ju...@beeva.com>*
>>> <http://www.beeva.com/>
>>>
>>>
>>>
>>>
>

Re: Storm rebalancing problems

Posted by Julián Bermejo Ferreiro | BEEVA <ju...@beeva.com>.

Hi Jungtaek,

We are running, Storm 0.9.4, but we are planning to migrate to 1.0.1
version.

We deploy our topologies to move messages inside RabbitMQ brokers.

Certanly, we have made the test of forcing a worker's die, and once nimbus
timeout has happened, a new worker appeared in another node,  but system
doesn't behave as good as it should. It was necessary to kill some other
workers and rebalance a couple of times in order to get everything OK (A
constant message flow inside our brokers).

Is it possible to kill all the workers inside a topology and rebalance
(like a kind of graceful shutdown)? Or once you kill all of them you must
redeploy de hole topology?

Is 1.0.1 version a possible solution?

Thanks again.




*JULIÁN BERMEJO FERREIRO*
*Departamento de Tecnología *
*julian.bermejo@beeva.com <ju...@beeva.com>*
<http://www.beeva.com/>




2016-05-26 15:34 GMT+02:00 Jungtaek Lim <ka...@gmail.com>:

> Hi Julián,
>
> Which version of Storm do you use?
> I remember some of Storm 0.9.x versions has some issues when workers are
> failing, so I'd like to know about it.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2016년 5월 26일 (목) 오후 5:53, Julián Bermejo Ferreiro | BEEVA <
> julian.bermejo@beeva.com>님이 작성:
>
>> Hello,
>>
>> We have a multiple-node storm cluster running on a Production
>> environment. We have had some issues with a couple of machines, which have
>> been out of service for a few hours.
>>
>> Because some workers of the deployed topologies were running on the
>> failed machines, cluster's behaviour has been unusual (It has been running
>> but not as it should).
>>
>> Once we recovered the failed nodes, and rebalanced the topologies, the
>> cluster returned to work properly.
>>
>> We would like to know if there is any way to alert nimbus, when a node
>> fall down, in order to rebalance the affected topologies and  create new
>> workers in the healthy nodes of the cluster that supply those who were
>> working on the failed ones.
>>
>> This would have helped us so much, because we could have kept consistency
>> in our service in spite of the failed nodes.
>>
>> Any advice?
>>
>> Tahnks in advance!
>>
>>
>>
>>
>>
>>
>> *JULIÁN BERMEJO FERREIRO*
>> *Departamento de Tecnología *
>> *julian.bermejo@beeva.com <ju...@beeva.com>*
>> <http://www.beeva.com/>
>>
>>
>>
>>

Re: Storm rebalancing problems

Posted by Jungtaek Lim <ka...@gmail.com>.

Hi Julián,

Which version of Storm do you use?
I remember some of Storm 0.9.x versions has some issues when workers are
failing, so I'd like to know about it.

Thanks,
Jungtaek Lim (HeartSaVioR)

2016년 5월 26일 (목) 오후 5:53, Julián Bermejo Ferreiro | BEEVA <
julian.bermejo@beeva.com>님이 작성:

> Hello,
>
> We have a multiple-node storm cluster running on a Production environment.
> We have had some issues with a couple of machines, which have been out of
> service for a few hours.
>
> Because some workers of the deployed topologies were running on the failed
> machines, cluster's behaviour has been unusual (It has been running but not
> as it should).
>
> Once we recovered the failed nodes, and rebalanced the topologies, the
> cluster returned to work properly.
>
> We would like to know if there is any way to alert nimbus, when a node
> fall down, in order to rebalance the affected topologies and  create new
> workers in the healthy nodes of the cluster that supply those who were
> working on the failed ones.
>
> This would have helped us so much, because we could have kept consistency
> in our service in spite of the failed nodes.
>
> Any advice?
>
> Tahnks in advance!
>
>
>
>
>
>
> *JULIÁN BERMEJO FERREIRO*
> *Departamento de Tecnología *
> *julian.bermejo@beeva.com <ju...@beeva.com>*
> <http://www.beeva.com/>
>
>
>
>