You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Andrey Yegorov <an...@gmail.com> on 2014/08/23 01:24:43 UTC

Re: Topology Restart due to Executor Not Alive

Have you figured out the rootcause/fix for this issue?
I just hit it and would really appreciate some time-saving advise.

----------
Andrey Yegorov


On Wed, Mar 12, 2014 at 10:31 AM, Josh Walton <jw...@gmail.com> wrote:

> Overnight last night, it appears my Storm Trident topology restarted
> itself. When I checked the Storm UI, it said the topology had been running
> for 24 hours, and showed no error or exceptions in any of the bolts.
>
> I check the nimbus log and see the following:
>
> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[34
> 34] not alive
> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[4
> 4] not alive
> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[40
> 40] not alive
> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[10
> 10] not alive
> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[16
> 16] not alive
> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[22
> 22] not alive
> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[28
> 28] not alive
> 2014-03-12 10:55:06 b.s.s.EvenScheduler [INFO] Available slots:
> (["5d105f66-1add-421b-8265-e7340a95928c" 6700]
> ["32ab1745-c260-4491-ae4d-92dcc5d14a62" 6700])
> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassigning MITAS3-74-1394565794
> to 6 slots
> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassign executors: [[34 34] [4 4]
> [40 40] [10 10] [16 16] [22 22] [28 28]]
>
> It appears that an executor was alive, and must have timed out somehow
> since I didn't see any exceptions or stack traces in the logs.
>
> Is there a way to change the timeout? I see several timeout settings, but
> I'm not sure if any of those would help prevent this type of restart. I am
> using a custom TridentState which holds data in memory so we lost data as a
> result of this restart, and would like to prevent this from happening again.
>
> Thanks
>
> Josh
>

Re: Topology Restart due to Executor Not Alive

Posted by Spico Florin <sp...@gmail.com>.

Hello!
  The nimbus.childopts is to set up the heap for the nimbus master node.
You can set up this either. For the worker use worker.childopts.
Please respond to this mail if it help myou or how did you solve it.
 Regards,
 Florin


On Tue, Sep 2, 2014 at 11:54 AM, Spico Florin <sp...@gmail.com> wrote:

> Hello!
>   I have encountered the same issue in a case of out of memory in worker
> process. Try increase the memory of the wokers by setting nimbus.childopts
> property. Also, if you are creating short living object at higher rate use +UseG1GC
> . Since you are saying that you hold data in your memory, I'm suspecting
> (as I said) an OutOfMemeory error. Don't cover this just by increasing the
> heap size, but also I recommend to profile your worker and see if you have
> a memory leak.
>   Hope that these help.
>   Regards,
>  Florin
>
>
> On Sat, Aug 23, 2014 at 2:24 AM, Andrey Yegorov <an...@gmail.com>
> wrote:
>
>>
>> Have you figured out the rootcause/fix for this issue?
>> I just hit it and would really appreciate some time-saving advise.
>>
>> ----------
>> Andrey Yegorov
>>
>>
>> On Wed, Mar 12, 2014 at 10:31 AM, Josh Walton <jw...@gmail.com>
>> wrote:
>>
>>> Overnight last night, it appears my Storm Trident topology restarted
>>> itself. When I checked the Storm UI, it said the topology had been running
>>> for 24 hours, and showed no error or exceptions in any of the bolts.
>>>
>>> I check the nimbus log and see the following:
>>>
>>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor
>>> MITAS3-74-1394565794:[34 34] not alive
>>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[4
>>> 4] not alive
>>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor
>>> MITAS3-74-1394565794:[40 40] not alive
>>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor
>>> MITAS3-74-1394565794:[10 10] not alive
>>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor
>>> MITAS3-74-1394565794:[16 16] not alive
>>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor
>>> MITAS3-74-1394565794:[22 22] not alive
>>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor
>>> MITAS3-74-1394565794:[28 28] not alive
>>> 2014-03-12 10:55:06 b.s.s.EvenScheduler [INFO] Available slots:
>>> (["5d105f66-1add-421b-8265-e7340a95928c" 6700]
>>> ["32ab1745-c260-4491-ae4d-92dcc5d14a62" 6700])
>>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassigning MITAS3-74-1394565794
>>> to 6 slots
>>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassign executors: [[34 34] [4
>>> 4] [40 40] [10 10] [16 16] [22 22] [28 28]]
>>>
>>> It appears that an executor was alive, and must have timed out somehow
>>> since I didn't see any exceptions or stack traces in the logs.
>>>
>>> Is there a way to change the timeout? I see several timeout settings,
>>> but I'm not sure if any of those would help prevent this type of restart. I
>>> am using a custom TridentState which holds data in memory so we lost data
>>> as a result of this restart, and would like to prevent this from happening
>>> again.
>>>
>>> Thanks
>>>
>>> Josh
>>>
>>
>>
>

Re: Topology Restart due to Executor Not Alive

Posted by Spico Florin <sp...@gmail.com>.

Hello!
  I have encountered the same issue in a case of out of memory in worker
process. Try increase the memory of the wokers by setting nimbus.childopts
property. Also, if you are creating short living object at higher rate
use +UseG1GC
. Since you are saying that you hold data in your memory, I'm suspecting
(as I said) an OutOfMemeory error. Don't cover this just by increasing the
heap size, but also I recommend to profile your worker and see if you have
a memory leak.
  Hope that these help.
  Regards,
 Florin


On Sat, Aug 23, 2014 at 2:24 AM, Andrey Yegorov <an...@gmail.com>
wrote:

>
> Have you figured out the rootcause/fix for this issue?
> I just hit it and would really appreciate some time-saving advise.
>
> ----------
> Andrey Yegorov
>
>
> On Wed, Mar 12, 2014 at 10:31 AM, Josh Walton <jw...@gmail.com>
> wrote:
>
>> Overnight last night, it appears my Storm Trident topology restarted
>> itself. When I checked the Storm UI, it said the topology had been running
>> for 24 hours, and showed no error or exceptions in any of the bolts.
>>
>> I check the nimbus log and see the following:
>>
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[34
>> 34] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[4
>> 4] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[40
>> 40] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[10
>> 10] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[16
>> 16] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[22
>> 22] not alive
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Executor MITAS3-74-1394565794:[28
>> 28] not alive
>> 2014-03-12 10:55:06 b.s.s.EvenScheduler [INFO] Available slots:
>> (["5d105f66-1add-421b-8265-e7340a95928c" 6700]
>> ["32ab1745-c260-4491-ae4d-92dcc5d14a62" 6700])
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassigning MITAS3-74-1394565794
>> to 6 slots
>> 2014-03-12 10:55:06 b.s.d.nimbus [INFO] Reassign executors: [[34 34] [4
>> 4] [40 40] [10 10] [16 16] [22 22] [28 28]]
>>
>> It appears that an executor was alive, and must have timed out somehow
>> since I didn't see any exceptions or stack traces in the logs.
>>
>> Is there a way to change the timeout? I see several timeout settings, but
>> I'm not sure if any of those would help prevent this type of restart. I am
>> using a custom TridentState which holds data in memory so we lost data as a
>> result of this restart, and would like to prevent this from happening again.
>>
>> Thanks
>>
>> Josh
>>
>
>