You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Andrew Neilson <ar...@gmail.com> on 2021/10/25 17:49:25 UTC

Old state crashing nimbus? (v2.2.0)

Hi,

We're running a v2.2.0 cluster with two nimbus hosts and recently noticed
storm-nimbus on the leader is effectively in a restart loop.

When I look at nimbus.log on that host it is full of log entries related to
old versions of topologies we're running. There are the two types of
exceptions I am seeing

1. get blob meta exception:

For *topology-A *for example, we're currently on topology-A-25:

2021-10-25 13:39:51.064 o.a.s.d.n.Nimbus pool-29-thread-62 [WARN] Exception
when getting heartbeat timeout.
2021-10-25 13:39:51.075 o.a.s.d.n.Nimbus pool-29-thread-16 [WARN] get blob
meta exception.
org.apache.storm.utils.WrappedKeyNotFoundException:
topology-A-5-1633368551-stormjar.jar

For *topology-B*, we're on topology-B-24:

2021-10-25 13:38:51.106 o.a.s.d.n.Nimbus pool-29-thread-21 [WARN] get blob
meta exception.
org.apache.storm.utils.WrappedKeyNotFoundException:
topology-B-11-1632770137-stormcode.ser

2. Send HB exception:

2021-10-25 13:39:51.745 o.a.s.d.n.Nimbus pool-29-thread-36 [WARN] Exception
when getting heartbeat timeout.
2021-10-25 13:39:51.760 o.a.s.d.n.Nimbus pool-29-thread-37 [WARN] Send HB
exception. (topology id='topology-A-10-1632769783')
org.apache.storm.utils.WrappedNotAliveException: topology-A-10-1632769783

This seems isolated to two versions of "topology-A" and one version of
"topology-B".

I'm not seeing references to these topology versions in Zookeeper. Does
anyone know how to safely clear out this old state? If not, any suggestions
on how to debug this? Further, is this related to any known bug?

Thanks,
Andrew

Re: Old state crashing nimbus? (v2.2.0)

Posted by Rui Abreu <ru...@gmail.com>.
This open issue seems very close to the one you reported:

https://issues.apache.org/jira/browse/STORM-3628

Have you tried checking your supervisors for old references?
$storm.local.dir/supervisor/stormdist

On Tue, 26 Oct 2021 at 19:05, Andrew Neilson <ar...@gmail.com> wrote:

> I had not tried that. Not seeing references to the old versions there
> though.
>
> On Mon, Oct 25, 2021 at 1:39 PM Rui Abreu <ru...@gmail.com> wrote:
>
>> Have you tried listing your blobs?
>>
>> https://storm.apache.org/releases/2.0.0/Command-line-client.html
>>
>> [image: image.png]
>>
>> On Mon, 25 Oct 2021 at 20:42, Andrew Neilson <ar...@gmail.com>
>> wrote:
>>
>>> As a follow-up, you can disregard what I was saying about nimbus
>>> crashing but I'm still interested in fixing these noisy errors in logs.
>>>
>>> @Rui thanks. I did check ZK and did not see refs to the old versions in
>>> there at least?
>>>
>>> On Mon, Oct 25, 2021 at 11:31 AM Rui Abreu <ru...@gmail.com> wrote:
>>>
>>>> Hi Andrew,
>>>>
>>>> Not sure how much this helps, but in version 1.x, state was on the
>>>> following znodes:
>>>>
>>>> /$storm-znode/storms
>>>> /$storm-znode/assignments
>>>> /$storm-znode/blobstore
>>>>
>>>>
>>>> Deleting all references (with rm or deleteall, depending on Zookeeper's
>>>> version), followed by a Nimbus's rolling restart should suffice.
>>>>
>>>> On Mon, Oct 25, 2021, 18:49 Andrew Neilson <ar...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> We're running a v2.2.0 cluster with two nimbus hosts and recently
>>>>> noticed storm-nimbus on the leader is effectively in a restart loop.
>>>>>
>>>>> When I look at nimbus.log on that host it is full of log entries
>>>>> related to old versions of topologies we're running. There are the two
>>>>> types of exceptions I am seeing
>>>>>
>>>>> 1. get blob meta exception:
>>>>>
>>>>> For *topology-A *for example, we're currently on topology-A-25:
>>>>>
>>>>> 2021-10-25 13:39:51.064 o.a.s.d.n.Nimbus pool-29-thread-62 [WARN]
>>>>> Exception when getting heartbeat timeout.
>>>>> 2021-10-25 13:39:51.075 o.a.s.d.n.Nimbus pool-29-thread-16 [WARN] get
>>>>> blob meta exception.
>>>>> org.apache.storm.utils.WrappedKeyNotFoundException:
>>>>> topology-A-5-1633368551-stormjar.jar
>>>>>
>>>>> For *topology-B*, we're on topology-B-24:
>>>>>
>>>>> 2021-10-25 13:38:51.106 o.a.s.d.n.Nimbus pool-29-thread-21 [WARN] get
>>>>> blob meta exception.
>>>>> org.apache.storm.utils.WrappedKeyNotFoundException:
>>>>> topology-B-11-1632770137-stormcode.ser
>>>>>
>>>>> 2. Send HB exception:
>>>>>
>>>>> 2021-10-25 13:39:51.745 o.a.s.d.n.Nimbus pool-29-thread-36 [WARN]
>>>>> Exception when getting heartbeat timeout.
>>>>> 2021-10-25 13:39:51.760 o.a.s.d.n.Nimbus pool-29-thread-37 [WARN] Send
>>>>> HB exception. (topology id='topology-A-10-1632769783')
>>>>> org.apache.storm.utils.WrappedNotAliveException:
>>>>> topology-A-10-1632769783
>>>>>
>>>>> This seems isolated to two versions of "topology-A" and one version of
>>>>> "topology-B".
>>>>>
>>>>> I'm not seeing references to these topology versions in Zookeeper.
>>>>> Does anyone know how to safely clear out this old state? If not, any
>>>>> suggestions on how to debug this? Further, is this related to any known bug?
>>>>>
>>>>> Thanks,
>>>>> Andrew
>>>>>
>>>>

Re: Old state crashing nimbus? (v2.2.0)

Posted by Andrew Neilson <ar...@gmail.com>.
I had not tried that. Not seeing references to the old versions there
though.

On Mon, Oct 25, 2021 at 1:39 PM Rui Abreu <ru...@gmail.com> wrote:

> Have you tried listing your blobs?
>
> https://storm.apache.org/releases/2.0.0/Command-line-client.html
>
> [image: image.png]
>
> On Mon, 25 Oct 2021 at 20:42, Andrew Neilson <ar...@gmail.com> wrote:
>
>> As a follow-up, you can disregard what I was saying about nimbus crashing
>> but I'm still interested in fixing these noisy errors in logs.
>>
>> @Rui thanks. I did check ZK and did not see refs to the old versions in
>> there at least?
>>
>> On Mon, Oct 25, 2021 at 11:31 AM Rui Abreu <ru...@gmail.com> wrote:
>>
>>> Hi Andrew,
>>>
>>> Not sure how much this helps, but in version 1.x, state was on the
>>> following znodes:
>>>
>>> /$storm-znode/storms
>>> /$storm-znode/assignments
>>> /$storm-znode/blobstore
>>>
>>>
>>> Deleting all references (with rm or deleteall, depending on Zookeeper's
>>> version), followed by a Nimbus's rolling restart should suffice.
>>>
>>> On Mon, Oct 25, 2021, 18:49 Andrew Neilson <ar...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> We're running a v2.2.0 cluster with two nimbus hosts and recently
>>>> noticed storm-nimbus on the leader is effectively in a restart loop.
>>>>
>>>> When I look at nimbus.log on that host it is full of log entries
>>>> related to old versions of topologies we're running. There are the two
>>>> types of exceptions I am seeing
>>>>
>>>> 1. get blob meta exception:
>>>>
>>>> For *topology-A *for example, we're currently on topology-A-25:
>>>>
>>>> 2021-10-25 13:39:51.064 o.a.s.d.n.Nimbus pool-29-thread-62 [WARN]
>>>> Exception when getting heartbeat timeout.
>>>> 2021-10-25 13:39:51.075 o.a.s.d.n.Nimbus pool-29-thread-16 [WARN] get
>>>> blob meta exception.
>>>> org.apache.storm.utils.WrappedKeyNotFoundException:
>>>> topology-A-5-1633368551-stormjar.jar
>>>>
>>>> For *topology-B*, we're on topology-B-24:
>>>>
>>>> 2021-10-25 13:38:51.106 o.a.s.d.n.Nimbus pool-29-thread-21 [WARN] get
>>>> blob meta exception.
>>>> org.apache.storm.utils.WrappedKeyNotFoundException:
>>>> topology-B-11-1632770137-stormcode.ser
>>>>
>>>> 2. Send HB exception:
>>>>
>>>> 2021-10-25 13:39:51.745 o.a.s.d.n.Nimbus pool-29-thread-36 [WARN]
>>>> Exception when getting heartbeat timeout.
>>>> 2021-10-25 13:39:51.760 o.a.s.d.n.Nimbus pool-29-thread-37 [WARN] Send
>>>> HB exception. (topology id='topology-A-10-1632769783')
>>>> org.apache.storm.utils.WrappedNotAliveException:
>>>> topology-A-10-1632769783
>>>>
>>>> This seems isolated to two versions of "topology-A" and one version of
>>>> "topology-B".
>>>>
>>>> I'm not seeing references to these topology versions in Zookeeper. Does
>>>> anyone know how to safely clear out this old state? If not, any suggestions
>>>> on how to debug this? Further, is this related to any known bug?
>>>>
>>>> Thanks,
>>>> Andrew
>>>>
>>>

Re: Old state crashing nimbus? (v2.2.0)

Posted by Rui Abreu <ru...@gmail.com>.
Have you tried listing your blobs?

https://storm.apache.org/releases/2.0.0/Command-line-client.html

[image: image.png]

On Mon, 25 Oct 2021 at 20:42, Andrew Neilson <ar...@gmail.com> wrote:

> As a follow-up, you can disregard what I was saying about nimbus crashing
> but I'm still interested in fixing these noisy errors in logs.
>
> @Rui thanks. I did check ZK and did not see refs to the old versions in
> there at least?
>
> On Mon, Oct 25, 2021 at 11:31 AM Rui Abreu <ru...@gmail.com> wrote:
>
>> Hi Andrew,
>>
>> Not sure how much this helps, but in version 1.x, state was on the
>> following znodes:
>>
>> /$storm-znode/storms
>> /$storm-znode/assignments
>> /$storm-znode/blobstore
>>
>>
>> Deleting all references (with rm or deleteall, depending on Zookeeper's
>> version), followed by a Nimbus's rolling restart should suffice.
>>
>> On Mon, Oct 25, 2021, 18:49 Andrew Neilson <ar...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> We're running a v2.2.0 cluster with two nimbus hosts and recently
>>> noticed storm-nimbus on the leader is effectively in a restart loop.
>>>
>>> When I look at nimbus.log on that host it is full of log entries related
>>> to old versions of topologies we're running. There are the two types of
>>> exceptions I am seeing
>>>
>>> 1. get blob meta exception:
>>>
>>> For *topology-A *for example, we're currently on topology-A-25:
>>>
>>> 2021-10-25 13:39:51.064 o.a.s.d.n.Nimbus pool-29-thread-62 [WARN]
>>> Exception when getting heartbeat timeout.
>>> 2021-10-25 13:39:51.075 o.a.s.d.n.Nimbus pool-29-thread-16 [WARN] get
>>> blob meta exception.
>>> org.apache.storm.utils.WrappedKeyNotFoundException:
>>> topology-A-5-1633368551-stormjar.jar
>>>
>>> For *topology-B*, we're on topology-B-24:
>>>
>>> 2021-10-25 13:38:51.106 o.a.s.d.n.Nimbus pool-29-thread-21 [WARN] get
>>> blob meta exception.
>>> org.apache.storm.utils.WrappedKeyNotFoundException:
>>> topology-B-11-1632770137-stormcode.ser
>>>
>>> 2. Send HB exception:
>>>
>>> 2021-10-25 13:39:51.745 o.a.s.d.n.Nimbus pool-29-thread-36 [WARN]
>>> Exception when getting heartbeat timeout.
>>> 2021-10-25 13:39:51.760 o.a.s.d.n.Nimbus pool-29-thread-37 [WARN] Send
>>> HB exception. (topology id='topology-A-10-1632769783')
>>> org.apache.storm.utils.WrappedNotAliveException: topology-A-10-1632769783
>>>
>>> This seems isolated to two versions of "topology-A" and one version of
>>> "topology-B".
>>>
>>> I'm not seeing references to these topology versions in Zookeeper. Does
>>> anyone know how to safely clear out this old state? If not, any suggestions
>>> on how to debug this? Further, is this related to any known bug?
>>>
>>> Thanks,
>>> Andrew
>>>
>>

Re: Old state crashing nimbus? (v2.2.0)

Posted by Andrew Neilson <ar...@gmail.com>.
As a follow-up, you can disregard what I was saying about nimbus crashing
but I'm still interested in fixing these noisy errors in logs.

@Rui thanks. I did check ZK and did not see refs to the old versions in
there at least?

On Mon, Oct 25, 2021 at 11:31 AM Rui Abreu <ru...@gmail.com> wrote:

> Hi Andrew,
>
> Not sure how much this helps, but in version 1.x, state was on the
> following znodes:
>
> /$storm-znode/storms
> /$storm-znode/assignments
> /$storm-znode/blobstore
>
>
> Deleting all references (with rm or deleteall, depending on Zookeeper's
> version), followed by a Nimbus's rolling restart should suffice.
>
> On Mon, Oct 25, 2021, 18:49 Andrew Neilson <ar...@gmail.com> wrote:
>
>> Hi,
>>
>> We're running a v2.2.0 cluster with two nimbus hosts and recently noticed
>> storm-nimbus on the leader is effectively in a restart loop.
>>
>> When I look at nimbus.log on that host it is full of log entries related
>> to old versions of topologies we're running. There are the two types of
>> exceptions I am seeing
>>
>> 1. get blob meta exception:
>>
>> For *topology-A *for example, we're currently on topology-A-25:
>>
>> 2021-10-25 13:39:51.064 o.a.s.d.n.Nimbus pool-29-thread-62 [WARN]
>> Exception when getting heartbeat timeout.
>> 2021-10-25 13:39:51.075 o.a.s.d.n.Nimbus pool-29-thread-16 [WARN] get
>> blob meta exception.
>> org.apache.storm.utils.WrappedKeyNotFoundException:
>> topology-A-5-1633368551-stormjar.jar
>>
>> For *topology-B*, we're on topology-B-24:
>>
>> 2021-10-25 13:38:51.106 o.a.s.d.n.Nimbus pool-29-thread-21 [WARN] get
>> blob meta exception.
>> org.apache.storm.utils.WrappedKeyNotFoundException:
>> topology-B-11-1632770137-stormcode.ser
>>
>> 2. Send HB exception:
>>
>> 2021-10-25 13:39:51.745 o.a.s.d.n.Nimbus pool-29-thread-36 [WARN]
>> Exception when getting heartbeat timeout.
>> 2021-10-25 13:39:51.760 o.a.s.d.n.Nimbus pool-29-thread-37 [WARN] Send HB
>> exception. (topology id='topology-A-10-1632769783')
>> org.apache.storm.utils.WrappedNotAliveException: topology-A-10-1632769783
>>
>> This seems isolated to two versions of "topology-A" and one version of
>> "topology-B".
>>
>> I'm not seeing references to these topology versions in Zookeeper. Does
>> anyone know how to safely clear out this old state? If not, any suggestions
>> on how to debug this? Further, is this related to any known bug?
>>
>> Thanks,
>> Andrew
>>
>

Re: Old state crashing nimbus? (v2.2.0)

Posted by Rui Abreu <ru...@gmail.com>.
Hi Andrew,

Not sure how much this helps, but in version 1.x, state was on the
following znodes:

/$storm-znode/storms
/$storm-znode/assignments
/$storm-znode/blobstore


Deleting all references (with rm or deleteall, depending on Zookeeper's
version), followed by a Nimbus's rolling restart should suffice.

On Mon, Oct 25, 2021, 18:49 Andrew Neilson <ar...@gmail.com> wrote:

> Hi,
>
> We're running a v2.2.0 cluster with two nimbus hosts and recently noticed
> storm-nimbus on the leader is effectively in a restart loop.
>
> When I look at nimbus.log on that host it is full of log entries related
> to old versions of topologies we're running. There are the two types of
> exceptions I am seeing
>
> 1. get blob meta exception:
>
> For *topology-A *for example, we're currently on topology-A-25:
>
> 2021-10-25 13:39:51.064 o.a.s.d.n.Nimbus pool-29-thread-62 [WARN]
> Exception when getting heartbeat timeout.
> 2021-10-25 13:39:51.075 o.a.s.d.n.Nimbus pool-29-thread-16 [WARN] get blob
> meta exception.
> org.apache.storm.utils.WrappedKeyNotFoundException:
> topology-A-5-1633368551-stormjar.jar
>
> For *topology-B*, we're on topology-B-24:
>
> 2021-10-25 13:38:51.106 o.a.s.d.n.Nimbus pool-29-thread-21 [WARN] get blob
> meta exception.
> org.apache.storm.utils.WrappedKeyNotFoundException:
> topology-B-11-1632770137-stormcode.ser
>
> 2. Send HB exception:
>
> 2021-10-25 13:39:51.745 o.a.s.d.n.Nimbus pool-29-thread-36 [WARN]
> Exception when getting heartbeat timeout.
> 2021-10-25 13:39:51.760 o.a.s.d.n.Nimbus pool-29-thread-37 [WARN] Send HB
> exception. (topology id='topology-A-10-1632769783')
> org.apache.storm.utils.WrappedNotAliveException: topology-A-10-1632769783
>
> This seems isolated to two versions of "topology-A" and one version of
> "topology-B".
>
> I'm not seeing references to these topology versions in Zookeeper. Does
> anyone know how to safely clear out this old state? If not, any suggestions
> on how to debug this? Further, is this related to any known bug?
>
> Thanks,
> Andrew
>