You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by George Liaw <ge...@gmail.com> on 2019/04/01 23:29:06 UTC

Resource Manager UI showing running jobs but no actual jobs running

Hi all,

Using Hadoop 2.7.2.
Wondering if anyone's seen an issue before where every once in a while the
resource manager gets into a weird state where the Applications dashboard
shows jobs running, but there are no actual jobs running on the cluster.
When this happens we'll see RM cpu usage flat-lining at very high levels
(around 85%), but the datanodes/nodemanagers will have no load because of
no jobs running. If we restart the RM and let it fail over to the stand-by,
the cluster will go back to normal behavior and start running jobs again
after 15-30 minutes.

Bit of a strange situation - not entirely sure why the RM would fail to
realize that the jobs have finished running and that the jobs sitting in
accepted state are free to run. Also strange that the RM gets stuck at high
cpu usage.

If anyone can point me in the right direction on how to debug or resolve
this, that would be much appreciated!

-- 
George A. Liaw

(408) 318-7920
george.a.liaw@gmail.com
LinkedIn <http://www.linkedin.com/in/georgeliaw/>

Re: Resource Manager UI showing running jobs but no actual jobs running

Posted by Prabhu Josephraj <pj...@cloudera.com.INVALID>.
Hi George,

     The symptoms of YARN-7163 are - RM UI shows old completed jobs, high
Heap and CPU Usage.

High CPU Usage usually happens during continous Full GC which will inturn
causes OOM if no more
heap available to allocate new objects. High CPU Usage could be a symptom
of High Heap Usage.

1. Can you check if the jobs shown as Running are already completed ones.

2. Heap Dump from RM when UI shows old completed jobs as Running will help
to prove -
it will match the image [1] where RMActiveServiceContext applications will
have completed
applications list.

Also check comment [2] and YARN-7065 (Dup of YARN-7163) which matches the
issue reported.

[1] https://issues.apache.org/jira/secure/attachment/12885607/suspect-1.png
[2]
https://issues.apache.org/jira/browse/YARN-7163?focusedCommentId=16158652&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16158652

Thanks,
Prabhu Joseph



On Tue, Apr 2, 2019 at 8:27 PM George Liaw <ge...@gmail.com> wrote:

> Hi Prabhu,
>
> Unfortunately I don't believe that is the same issue we are seeing. We are
> experiencing high cpu usage and we are not getting OOM errors.
>
> Is there reason to believe they're the same issue?
>
>
> On Tue, Apr 2, 2019, 2:15 AM Prabhu Josephraj <pj...@cloudera.com>
> wrote:
>
>> Hi George,
>>
>>     Have seen this issue - RM UI will show the old job list and the RM
>> process heap usage will be high. This is due to a Bug fixed by YARN-7163.
>> Can you test with patch from YARN-7163.
>>
>> Thanks,
>> Prabhu Joseph
>>
>>
>> On Tue, Apr 2, 2019 at 4:59 AM George Liaw <ge...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> Using Hadoop 2.7.2.
>>> Wondering if anyone's seen an issue before where every once in a while
>>> the resource manager gets into a weird state where the Applications
>>> dashboard shows jobs running, but there are no actual jobs running on the
>>> cluster. When this happens we'll see RM cpu usage flat-lining at very high
>>> levels (around 85%), but the datanodes/nodemanagers will have no load
>>> because of no jobs running. If we restart the RM and let it fail over to
>>> the stand-by, the cluster will go back to normal behavior and start running
>>> jobs again after 15-30 minutes.
>>>
>>> Bit of a strange situation - not entirely sure why the RM would fail to
>>> realize that the jobs have finished running and that the jobs sitting in
>>> accepted state are free to run. Also strange that the RM gets stuck at high
>>> cpu usage.
>>>
>>> If anyone can point me in the right direction on how to debug or resolve
>>> this, that would be much appreciated!
>>>
>>> --
>>> George A. Liaw
>>>
>>> (408) 318-7920
>>> george.a.liaw@gmail.com
>>> LinkedIn <http://www.linkedin.com/in/georgeliaw/>
>>>
>>

Re: Resource Manager UI showing running jobs but no actual jobs running

Posted by George Liaw <ge...@gmail.com>.
Hi Prabhu,

Unfortunately I don't believe that is the same issue we are seeing. We are
experiencing high cpu usage and we are not getting OOM errors.

Is there reason to believe they're the same issue?


On Tue, Apr 2, 2019, 2:15 AM Prabhu Josephraj <pj...@cloudera.com> wrote:

> Hi George,
>
>     Have seen this issue - RM UI will show the old job list and the RM
> process heap usage will be high. This is due to a Bug fixed by YARN-7163.
> Can you test with patch from YARN-7163.
>
> Thanks,
> Prabhu Joseph
>
>
> On Tue, Apr 2, 2019 at 4:59 AM George Liaw <ge...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Using Hadoop 2.7.2.
>> Wondering if anyone's seen an issue before where every once in a while
>> the resource manager gets into a weird state where the Applications
>> dashboard shows jobs running, but there are no actual jobs running on the
>> cluster. When this happens we'll see RM cpu usage flat-lining at very high
>> levels (around 85%), but the datanodes/nodemanagers will have no load
>> because of no jobs running. If we restart the RM and let it fail over to
>> the stand-by, the cluster will go back to normal behavior and start running
>> jobs again after 15-30 minutes.
>>
>> Bit of a strange situation - not entirely sure why the RM would fail to
>> realize that the jobs have finished running and that the jobs sitting in
>> accepted state are free to run. Also strange that the RM gets stuck at high
>> cpu usage.
>>
>> If anyone can point me in the right direction on how to debug or resolve
>> this, that would be much appreciated!
>>
>> --
>> George A. Liaw
>>
>> (408) 318-7920
>> george.a.liaw@gmail.com
>> LinkedIn <http://www.linkedin.com/in/georgeliaw/>
>>
>

Re: Resource Manager UI showing running jobs but no actual jobs running

Posted by Prabhu Josephraj <pj...@cloudera.com.INVALID>.
Hi George,

    Have seen this issue - RM UI will show the old job list and the RM
process heap usage will be high. This is due to a Bug fixed by YARN-7163.
Can you test with patch from YARN-7163.

Thanks,
Prabhu Joseph


On Tue, Apr 2, 2019 at 4:59 AM George Liaw <ge...@gmail.com> wrote:

> Hi all,
>
> Using Hadoop 2.7.2.
> Wondering if anyone's seen an issue before where every once in a while the
> resource manager gets into a weird state where the Applications dashboard
> shows jobs running, but there are no actual jobs running on the cluster.
> When this happens we'll see RM cpu usage flat-lining at very high levels
> (around 85%), but the datanodes/nodemanagers will have no load because of
> no jobs running. If we restart the RM and let it fail over to the stand-by,
> the cluster will go back to normal behavior and start running jobs again
> after 15-30 minutes.
>
> Bit of a strange situation - not entirely sure why the RM would fail to
> realize that the jobs have finished running and that the jobs sitting in
> accepted state are free to run. Also strange that the RM gets stuck at high
> cpu usage.
>
> If anyone can point me in the right direction on how to debug or resolve
> this, that would be much appreciated!
>
> --
> George A. Liaw
>
> (408) 318-7920
> george.a.liaw@gmail.com
> LinkedIn <http://www.linkedin.com/in/georgeliaw/>
>