You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Joaquin Menchaca <jm...@gobalto.com> on 2017/01/14 04:56:56 UTC

Re: How could one debug the root cause of this error?

I bounce everything across the cluster and it fixed the problem.  Zookeeper
ocassionally has data in a broken state.  There is no data integrity check
yet.

I also found I ran out of space on Zookeeper as it is chatting and keeping
gigabytes of archives. I turned that off.

One time when i upgraded from 0.9 to 1.0, zk data was so mess up, lots of
crashes.  I blasted manually (rm -rf) all zk data, and that fixed things up.

On Dec 22, 2016 4:37 PM, "Hugo Da Cruz Louro" <hl...@hortonworks.com>
wrote:

> Is it doable for you to restart your zookeeper cluster? If possible, can
> you do so, and then restart storm and deploy your storm topology again.
>
> On Dec 22, 2016, at 3:22 PM, Joaquin Menchaca <jm...@gobalto.com>
> wrote:
>
> Found nimbuses [] none of which is elected as leader, please try again after some time
>
>
>

Re: How could one debug the root cause of this error?

Posted by Joaquin Menchaca <jm...@gobalto.com>.
I am used to working with ActiveRecord in Rails.  I wish there was a
'zk:migrate' script, or other tooling to avoid this use case as I never
touched zookeeper before storm, so i am not an expert on it.  If I knew how
to wipe out current storm, I would have done that instead of the nuclear
option, i.e. rm -rf.

I wish that the root key would be unique or have a schema version in it, so
that storm 1.x would use a diff root. Then this could be avoided.

On Jan 14, 2017 12:20 PM, "Erik Weathers" <ew...@groupon.com> wrote:

> On Fri, Jan 13, 2017 at 8:56 PM, Joaquin Menchaca <jm...@gobalto.com>
> wrote:
>
>> I bounce everything across the cluster and it fixed the problem.
>> Zookeeper ocassionally has data in a broken state.  There is no data
>> integrity check yet.
>>
>> I also found I ran out of space on Zookeeper as it is chatting and
>> keeping gigabytes of archives. I turned that off.
>>
>> One time when i upgraded from 0.9 to 1.0, zk data was so mess up, lots of
>> crashes.
>>
>
> For completeness, that's an unfortunate but expected behavior, because
> Storm stores lots of serialized objects into ZooKeeper, and the 0.9 to 1.0
> change included backwards-incompatible changes that broke the
> deserialization.  The most pervasive of those changes was the package path
> change from "backtype.*" to "org.apache.*", but there might have been
> others.   I agree that it would be nice if there was some validation to
> decide whether state should be rejected.
>
> - Erik
>
>
>> I blasted manually (rm -rf) all zk data, and that fixed things up.
>>
>> On Dec 22, 2016 4:37 PM, "Hugo Da Cruz Louro" <hl...@hortonworks.com>
>> wrote:
>>
>>> Is it doable for you to restart your zookeeper cluster? If possible, can
>>> you do so, and then restart storm and deploy your storm topology again.
>>>
>>> On Dec 22, 2016, at 3:22 PM, Joaquin Menchaca <jm...@gobalto.com>
>>> wrote:
>>>
>>> Found nimbuses [] none of which is elected as leader, please try again after some time
>>>
>>>
>>>
>

Re: How could one debug the root cause of this error?

Posted by Erik Weathers <ew...@groupon.com>.
On Fri, Jan 13, 2017 at 8:56 PM, Joaquin Menchaca <jm...@gobalto.com>
wrote:

> I bounce everything across the cluster and it fixed the problem.
> Zookeeper ocassionally has data in a broken state.  There is no data
> integrity check yet.
>
> I also found I ran out of space on Zookeeper as it is chatting and keeping
> gigabytes of archives. I turned that off.
>
> One time when i upgraded from 0.9 to 1.0, zk data was so mess up, lots of
> crashes.
>

For completeness, that's an unfortunate but expected behavior, because
Storm stores lots of serialized objects into ZooKeeper, and the 0.9 to 1.0
change included backwards-incompatible changes that broke the
deserialization.  The most pervasive of those changes was the package path
change from "backtype.*" to "org.apache.*", but there might have been
others.   I agree that it would be nice if there was some validation to
decide whether state should be rejected.

- Erik


> I blasted manually (rm -rf) all zk data, and that fixed things up.
>
> On Dec 22, 2016 4:37 PM, "Hugo Da Cruz Louro" <hl...@hortonworks.com>
> wrote:
>
>> Is it doable for you to restart your zookeeper cluster? If possible, can
>> you do so, and then restart storm and deploy your storm topology again.
>>
>> On Dec 22, 2016, at 3:22 PM, Joaquin Menchaca <jm...@gobalto.com>
>> wrote:
>>
>> Found nimbuses [] none of which is elected as leader, please try again after some time
>>
>>
>>