You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Chuanlei Ni <ni...@gmail.com> on 2015/07/10 06:16:20 UTC

The supervisor cannot launch the workers

Hi,

    I got an error while using storm.
    I turned the time of my machine back an hour, the supervisor and
workers are all suspend( they don't print the log any more but the process
 is still alive).Maybe it because the zk session expired, when i restart
the supervisor, it will kill the suspended workers.  but the supervisor
cannot launch the worker. It is weird.

the log is as below
2015-07-10 12:06:09,856 util=[INFO] Touching file at
/home/storm/storm/workers/165e1c3c-0ce4-42f4-8e64-4f949c413fb5/cglimitpids/8862
2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess stderr reader:
165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess monitor:
165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
2015-07-10 12:06:09,856 supervisor=[INFO] the worker id
165e1c3c-0ce4-42f4-8e64-4f949c413fb5 pid 8862
2015-07-10 12:06:09,857 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:10,357 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:10,857 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:11,358 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:11,858 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:12,358 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:12,859 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:13,359 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:13,859 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:14,360 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:14,860 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:15,360 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
2015-07-10 12:06:15,861 supervisor=[INFO]
81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started

My questions are:
1. why the supervisor and the workers are suspended?
2. when i restart the supervisor, why it cannot launch the workers?

Thx in advance.

Re: The supervisor cannot launch the workers

Posted by 임정택 <ka...@gmail.com>.
I'm sorry, but personally I don't have the base knowledge of such old Storm
versions.
0.8.1 was released at 7 Sep 2012 (nearly 3 years ago), and huge things are
changed.

I'd recommend you to move on latest release of Storm, but maybe you need to
modify your topology codes, so it is up to you.

Thanks,
Jungtaek Lim (HeartSaVioR)

2015-07-10 15:49 GMT+09:00 Chuanlei Ni <ni...@gmail.com>:

> The version of storm is 0.8.1
> And the jstack result is attached.
> Thanks for responding.
>
> 2015-07-10 14:07 GMT+08:00 임정택 <ka...@gmail.com>:
>
>> Hi, Chuanlei.
>>
>> Could you share your Storm version?
>> And is it reproducible? I'd like to see thread dump while supervisor and
>> workers are freezing.
>> I can't find out your issue deeply with only symptom.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>> 2015-07-10 13:36 GMT+09:00 Chuanlei Ni <ni...@gmail.com>:
>>
>>> Maybe the supervisor can launch the workers, but after a few seconds the
>>> workers are died.
>>>
>>> 2015-07-10 12:16 GMT+08:00 Chuanlei Ni <ni...@gmail.com>:
>>>
>>>> Hi,
>>>>
>>>>     I got an error while using storm.
>>>>     I turned the time of my machine back an hour, the supervisor and
>>>> workers are all suspend( they don't print the log any more but the process
>>>>  is still alive).Maybe it because the zk session expired, when i restart
>>>> the supervisor, it will kill the suspended workers.  but the supervisor
>>>> cannot launch the worker. It is weird.
>>>>
>>>> the log is as below
>>>> 2015-07-10 12:06:09,856 util=[INFO] Touching file at
>>>> /home/storm/storm/workers/165e1c3c-0ce4-42f4-8e64-4f949c413fb5/cglimitpids/8862
>>>> 2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess stderr reader:
>>>> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
>>>> 2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess monitor:
>>>> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
>>>> 2015-07-10 12:06:09,856 supervisor=[INFO] the worker id
>>>> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 pid 8862
>>>> 2015-07-10 12:06:09,857 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:10,357 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:10,857 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:11,358 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:11,858 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:12,358 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:12,859 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:13,359 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:13,859 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:14,360 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:14,860 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:15,360 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>> 2015-07-10 12:06:15,861 supervisor=[INFO]
>>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>>
>>>> My questions are:
>>>> 1. why the supervisor and the workers are suspended?
>>>> 2. when i restart the supervisor, why it cannot launch the workers?
>>>>
>>>> Thx in advance.
>>>>
>>>
>>>
>>
>>
>> --
>> Name : 임 정택
>> Blog : http://www.heartsavior.net / http://dev.heartsavior.net
>> Twitter : http://twitter.com/heartsavior
>> LinkedIn : http://www.linkedin.com/in/heartsavior
>>
>
>


-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Re: The supervisor cannot launch the workers

Posted by Chuanlei Ni <ni...@gmail.com>.
The version of storm is 0.8.1
And the jstack result is attached.
Thanks for responding.

2015-07-10 14:07 GMT+08:00 임정택 <ka...@gmail.com>:

> Hi, Chuanlei.
>
> Could you share your Storm version?
> And is it reproducible? I'd like to see thread dump while supervisor and
> workers are freezing.
> I can't find out your issue deeply with only symptom.
>
> Thanks,
> Jungtaek Lim (HeartSaVioR)
>
> 2015-07-10 13:36 GMT+09:00 Chuanlei Ni <ni...@gmail.com>:
>
>> Maybe the supervisor can launch the workers, but after a few seconds the
>> workers are died.
>>
>> 2015-07-10 12:16 GMT+08:00 Chuanlei Ni <ni...@gmail.com>:
>>
>>> Hi,
>>>
>>>     I got an error while using storm.
>>>     I turned the time of my machine back an hour, the supervisor and
>>> workers are all suspend( they don't print the log any more but the process
>>>  is still alive).Maybe it because the zk session expired, when i restart
>>> the supervisor, it will kill the suspended workers.  but the supervisor
>>> cannot launch the worker. It is weird.
>>>
>>> the log is as below
>>> 2015-07-10 12:06:09,856 util=[INFO] Touching file at
>>> /home/storm/storm/workers/165e1c3c-0ce4-42f4-8e64-4f949c413fb5/cglimitpids/8862
>>> 2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess stderr reader:
>>> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
>>> 2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess monitor:
>>> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
>>> 2015-07-10 12:06:09,856 supervisor=[INFO] the worker id
>>> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 pid 8862
>>> 2015-07-10 12:06:09,857 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:10,357 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:10,857 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:11,358 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:11,858 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:12,358 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:12,859 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:13,359 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:13,859 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:14,360 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:14,860 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:15,360 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>> 2015-07-10 12:06:15,861 supervisor=[INFO]
>>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>>
>>> My questions are:
>>> 1. why the supervisor and the workers are suspended?
>>> 2. when i restart the supervisor, why it cannot launch the workers?
>>>
>>> Thx in advance.
>>>
>>
>>
>
>
> --
> Name : 임 정택
> Blog : http://www.heartsavior.net / http://dev.heartsavior.net
> Twitter : http://twitter.com/heartsavior
> LinkedIn : http://www.linkedin.com/in/heartsavior
>

Re: The supervisor cannot launch the workers

Posted by 임정택 <ka...@gmail.com>.
Hi, Chuanlei.

Could you share your Storm version?
And is it reproducible? I'd like to see thread dump while supervisor and
workers are freezing.
I can't find out your issue deeply with only symptom.

Thanks,
Jungtaek Lim (HeartSaVioR)

2015-07-10 13:36 GMT+09:00 Chuanlei Ni <ni...@gmail.com>:

> Maybe the supervisor can launch the workers, but after a few seconds the
> workers are died.
>
> 2015-07-10 12:16 GMT+08:00 Chuanlei Ni <ni...@gmail.com>:
>
>> Hi,
>>
>>     I got an error while using storm.
>>     I turned the time of my machine back an hour, the supervisor and
>> workers are all suspend( they don't print the log any more but the process
>>  is still alive).Maybe it because the zk session expired, when i restart
>> the supervisor, it will kill the suspended workers.  but the supervisor
>> cannot launch the worker. It is weird.
>>
>> the log is as below
>> 2015-07-10 12:06:09,856 util=[INFO] Touching file at
>> /home/storm/storm/workers/165e1c3c-0ce4-42f4-8e64-4f949c413fb5/cglimitpids/8862
>> 2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess stderr reader:
>> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
>> 2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess monitor:
>> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
>> 2015-07-10 12:06:09,856 supervisor=[INFO] the worker id
>> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 pid 8862
>> 2015-07-10 12:06:09,857 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:10,357 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:10,857 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:11,358 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:11,858 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:12,358 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:12,859 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:13,359 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:13,859 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:14,360 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:14,860 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:15,360 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>> 2015-07-10 12:06:15,861 supervisor=[INFO]
>> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>>
>> My questions are:
>> 1. why the supervisor and the workers are suspended?
>> 2. when i restart the supervisor, why it cannot launch the workers?
>>
>> Thx in advance.
>>
>
>


-- 
Name : 임 정택
Blog : http://www.heartsavior.net / http://dev.heartsavior.net
Twitter : http://twitter.com/heartsavior
LinkedIn : http://www.linkedin.com/in/heartsavior

Re: The supervisor cannot launch the workers

Posted by Chuanlei Ni <ni...@gmail.com>.
Maybe the supervisor can launch the workers, but after a few seconds the
workers are died.

2015-07-10 12:16 GMT+08:00 Chuanlei Ni <ni...@gmail.com>:

> Hi,
>
>     I got an error while using storm.
>     I turned the time of my machine back an hour, the supervisor and
> workers are all suspend( they don't print the log any more but the process
>  is still alive).Maybe it because the zk session expired, when i restart
> the supervisor, it will kill the suspended workers.  but the supervisor
> cannot launch the worker. It is weird.
>
> the log is as below
> 2015-07-10 12:06:09,856 util=[INFO] Touching file at
> /home/storm/storm/workers/165e1c3c-0ce4-42f4-8e64-4f949c413fb5/cglimitpids/8862
> 2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess stderr reader:
> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
> 2015-07-10 12:06:09,856 Utils=[ERROR] [ChildProcess monitor:
> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 - 8862] starts
> 2015-07-10 12:06:09,856 supervisor=[INFO] the worker id
> 165e1c3c-0ce4-42f4-8e64-4f949c413fb5 pid 8862
> 2015-07-10 12:06:09,857 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:10,357 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:10,857 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:11,358 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:11,858 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:12,358 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:12,859 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:13,359 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:13,859 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:14,360 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:14,860 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:15,360 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
> 2015-07-10 12:06:15,861 supervisor=[INFO]
> 81122780-afa1-4465-8d8c-65084dae5be7 still hasn't started
>
> My questions are:
> 1. why the supervisor and the workers are suspended?
> 2. when i restart the supervisor, why it cannot launch the workers?
>
> Thx in advance.
>