You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by 麦树荣 <sh...@qunar.com> on 2013/11/27 10:28:22 UTC

答复: problems of FairScheduler in hadoop2.2.0

Hi,

sorry, I complement some information.

The hadoop 2.2.0 had been running normally for some days since I start up the hadoop server. I can run jobs  without any problems.
Today suddenly the jobs cannot run and all the jobs’ status were keeping “submitted” after submitting.
There are 3 slavers and every slave has 32G memory and 24 cpus.

The contents of my fair-scheduler.xml is as follows:

<?xml version="1.0"?>
<allocations>
    <queue name="root">
    <minResources>10000mb,10vcores</minResources>
    <maxResources>90000mb,100vcores</maxResources>
    <maxRunningApps>50</maxRunningApps>
    <weight>2.0</weight>
    <schedulingMode>fair</schedulingMode>
    <aclSubmitApps> </aclSubmitApps>
    <aclAdministerApps> </aclAdministerApps>
        <queue name="queue1">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
        </queue>
        <queue name="queue2">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1 datadev</aclSubmitApps>
        </queue>
        <queue name="queue3">
                <minResources>5000mb,5vcores</minResources>
                <maxResources>10000mb,10vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
        </queue>
        <queue name="default">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
        </queue>
      </queue>
  <user name="xxx">
    <maxRunningApps>10</maxRunningApps>
  </user>
  <userMaxAppsDefault>10</userMaxAppsDefault>
</allocations>

发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
发送时间: 2013年11月27日 16:33
收件人: user@hadoop.apache.org
主题: Re: problems of FairScheduler in hadoop2.2.0

Hi,

Can you share the contents of your fair-scheduler.xml?  If you submit just a single job, does it run?  What do you see if you go to <resourcemanagerwebui>/ws/v1/cluster/scheduler?

-Sandy

On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <sh...@qunar.com>> wrote:
Hi, all

When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the hadoop resourcemanager cannot work normally: When I submit jobs and the jobs’ status all are “submitted” and cannot run.
I cannot find any answers in the internet, who can give me some help? Thanks.

The resourcemanager log is as follows:

2013-11-27 14:39:10,749 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,753 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,754 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001


答复: 答复: problems of FairScheduler in hadoop2.2.0

Posted by 麦树荣 <sh...@qunar.com>.
Hi,

Thanks for your attention.

When jobs cannot run and all the jobs’ status were keeping “submitted” after submitting,   the scheduler part (the red frame of the picture below )of resourcemanager web UI cann’t  be opened  and the exception log is as follows in the resourcemanager log:

2013-11-27 14:41:36,414 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/scheduler
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.FairSchedulerInfo.getAppFairShare(FairSchedulerInfo.java:49)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:97)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:40)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(Hamlet.java:30347)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerPage$QueuesBlock.render(FairSchedulerPage.java:176)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
        at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
        at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
        at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:82)
        ... 40 more

[cid:image001.png@01CEEC42.BE0E06B0]
发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
发送时间: 2013年11月28日 1:20
收件人: user@hadoop.apache.org
主题: Re: 答复: problems of FairScheduler in hadoop2.2.0

Thanks for the additional info.  Still not sure what could be going on.  Do you notice any other suspicious LOG messages in the resourcemanager log?  Are you able to show the results of <resourcemanagerwebaddress>/ws/v1/cluster/scheduler?  On the resourcemanager web UI, how much memory does it say is used?

On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <sh...@qunar.com>> wrote:
Hi,

sorry, I complement some information.

The hadoop 2.2.0 had been running normally for some days since I start up the hadoop server. I can run jobs  without any problems.
Today suddenly the jobs cannot run and all the jobs’ status were keeping “submitted” after submitting.
There are 3 slavers and every slave has 32G memory and 24 cpus.

The contents of my fair-scheduler.xml is as follows:

<?xml version="1.0"?>
<allocations>
    <queue name="root">
    <minResources>10000mb,10vcores</minResources>
    <maxResources>90000mb,100vcores</maxResources>
    <maxRunningApps>50</maxRunningApps>
    <weight>2.0</weight>
    <schedulingMode>fair</schedulingMode>
    <aclSubmitApps> </aclSubmitApps>
    <aclAdministerApps> </aclAdministerApps>
        <queue name="queue1">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
        </queue>
        <queue name="queue2">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1 datadev</aclSubmitApps>
        </queue>
        <queue name="queue3">
                <minResources>5000mb,5vcores</minResources>
                <maxResources>10000mb,10vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
        </queue>
        <queue name="default">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
        </queue>
      </queue>
  <user name="xxx">
    <maxRunningApps>10</maxRunningApps>
  </user>
  <userMaxAppsDefault>10</userMaxAppsDefault>
</allocations>

发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com<ma...@cloudera.com>]
发送时间: 2013年11月27日 16:33
收件人: user@hadoop.apache.org<ma...@hadoop.apache.org>
主题: Re: problems of FairScheduler in hadoop2.2.0

Hi,

Can you share the contents of your fair-scheduler.xml?  If you submit just a single job, does it run?  What do you see if you go to <resourcemanagerwebui>/ws/v1/cluster/scheduler?

-Sandy

On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <sh...@qunar.com>> wrote:
Hi, all

When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the hadoop resourcemanager cannot work normally: When I submit jobs and the jobs’ status all are “submitted” and cannot run.
I cannot find any answers in the internet, who can give me some help? Thanks.

The resourcemanager log is as follows:

2013-11-27 14:39:10,749 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,753 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,754 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001



答复: 答复: problems of FairScheduler in hadoop2.2.0

Posted by 麦树荣 <sh...@qunar.com>.
Hi,

Thanks for your attention.

When jobs cannot run and all the jobs’ status were keeping “submitted” after submitting,   the scheduler part (the red frame of the picture below )of resourcemanager web UI cann’t  be opened  and the exception log is as follows in the resourcemanager log:

2013-11-27 14:41:36,414 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/scheduler
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.FairSchedulerInfo.getAppFairShare(FairSchedulerInfo.java:49)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:97)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:40)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(Hamlet.java:30347)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerPage$QueuesBlock.render(FairSchedulerPage.java:176)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
        at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
        at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
        at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:82)
        ... 40 more

[cid:image001.png@01CEEC42.BE0E06B0]
发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
发送时间: 2013年11月28日 1:20
收件人: user@hadoop.apache.org
主题: Re: 答复: problems of FairScheduler in hadoop2.2.0

Thanks for the additional info.  Still not sure what could be going on.  Do you notice any other suspicious LOG messages in the resourcemanager log?  Are you able to show the results of <resourcemanagerwebaddress>/ws/v1/cluster/scheduler?  On the resourcemanager web UI, how much memory does it say is used?

On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <sh...@qunar.com>> wrote:
Hi,

sorry, I complement some information.

The hadoop 2.2.0 had been running normally for some days since I start up the hadoop server. I can run jobs  without any problems.
Today suddenly the jobs cannot run and all the jobs’ status were keeping “submitted” after submitting.
There are 3 slavers and every slave has 32G memory and 24 cpus.

The contents of my fair-scheduler.xml is as follows:

<?xml version="1.0"?>
<allocations>
    <queue name="root">
    <minResources>10000mb,10vcores</minResources>
    <maxResources>90000mb,100vcores</maxResources>
    <maxRunningApps>50</maxRunningApps>
    <weight>2.0</weight>
    <schedulingMode>fair</schedulingMode>
    <aclSubmitApps> </aclSubmitApps>
    <aclAdministerApps> </aclAdministerApps>
        <queue name="queue1">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
        </queue>
        <queue name="queue2">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1 datadev</aclSubmitApps>
        </queue>
        <queue name="queue3">
                <minResources>5000mb,5vcores</minResources>
                <maxResources>10000mb,10vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
        </queue>
        <queue name="default">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
        </queue>
      </queue>
  <user name="xxx">
    <maxRunningApps>10</maxRunningApps>
  </user>
  <userMaxAppsDefault>10</userMaxAppsDefault>
</allocations>

发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com<ma...@cloudera.com>]
发送时间: 2013年11月27日 16:33
收件人: user@hadoop.apache.org<ma...@hadoop.apache.org>
主题: Re: problems of FairScheduler in hadoop2.2.0

Hi,

Can you share the contents of your fair-scheduler.xml?  If you submit just a single job, does it run?  What do you see if you go to <resourcemanagerwebui>/ws/v1/cluster/scheduler?

-Sandy

On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <sh...@qunar.com>> wrote:
Hi, all

When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the hadoop resourcemanager cannot work normally: When I submit jobs and the jobs’ status all are “submitted” and cannot run.
I cannot find any answers in the internet, who can give me some help? Thanks.

The resourcemanager log is as follows:

2013-11-27 14:39:10,749 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,753 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,754 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001



答复: 答复: problems of FairScheduler in hadoop2.2.0

Posted by 麦树荣 <sh...@qunar.com>.
Hi,

Thanks for your attention.

When jobs cannot run and all the jobs’ status were keeping “submitted” after submitting,   the scheduler part (the red frame of the picture below )of resourcemanager web UI cann’t  be opened  and the exception log is as follows in the resourcemanager log:

2013-11-27 14:41:36,414 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/scheduler
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.FairSchedulerInfo.getAppFairShare(FairSchedulerInfo.java:49)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:97)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:40)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(Hamlet.java:30347)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerPage$QueuesBlock.render(FairSchedulerPage.java:176)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
        at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
        at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
        at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:82)
        ... 40 more

[cid:image001.png@01CEEC42.BE0E06B0]
发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
发送时间: 2013年11月28日 1:20
收件人: user@hadoop.apache.org
主题: Re: 答复: problems of FairScheduler in hadoop2.2.0

Thanks for the additional info.  Still not sure what could be going on.  Do you notice any other suspicious LOG messages in the resourcemanager log?  Are you able to show the results of <resourcemanagerwebaddress>/ws/v1/cluster/scheduler?  On the resourcemanager web UI, how much memory does it say is used?

On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <sh...@qunar.com>> wrote:
Hi,

sorry, I complement some information.

The hadoop 2.2.0 had been running normally for some days since I start up the hadoop server. I can run jobs  without any problems.
Today suddenly the jobs cannot run and all the jobs’ status were keeping “submitted” after submitting.
There are 3 slavers and every slave has 32G memory and 24 cpus.

The contents of my fair-scheduler.xml is as follows:

<?xml version="1.0"?>
<allocations>
    <queue name="root">
    <minResources>10000mb,10vcores</minResources>
    <maxResources>90000mb,100vcores</maxResources>
    <maxRunningApps>50</maxRunningApps>
    <weight>2.0</weight>
    <schedulingMode>fair</schedulingMode>
    <aclSubmitApps> </aclSubmitApps>
    <aclAdministerApps> </aclAdministerApps>
        <queue name="queue1">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
        </queue>
        <queue name="queue2">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1 datadev</aclSubmitApps>
        </queue>
        <queue name="queue3">
                <minResources>5000mb,5vcores</minResources>
                <maxResources>10000mb,10vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
        </queue>
        <queue name="default">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
        </queue>
      </queue>
  <user name="xxx">
    <maxRunningApps>10</maxRunningApps>
  </user>
  <userMaxAppsDefault>10</userMaxAppsDefault>
</allocations>

发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com<ma...@cloudera.com>]
发送时间: 2013年11月27日 16:33
收件人: user@hadoop.apache.org<ma...@hadoop.apache.org>
主题: Re: problems of FairScheduler in hadoop2.2.0

Hi,

Can you share the contents of your fair-scheduler.xml?  If you submit just a single job, does it run?  What do you see if you go to <resourcemanagerwebui>/ws/v1/cluster/scheduler?

-Sandy

On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <sh...@qunar.com>> wrote:
Hi, all

When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the hadoop resourcemanager cannot work normally: When I submit jobs and the jobs’ status all are “submitted” and cannot run.
I cannot find any answers in the internet, who can give me some help? Thanks.

The resourcemanager log is as follows:

2013-11-27 14:39:10,749 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,753 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,754 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001



答复: 答复: problems of FairScheduler in hadoop2.2.0

Posted by 麦树荣 <sh...@qunar.com>.
Hi,

Thanks for your attention.

When jobs cannot run and all the jobs’ status were keeping “submitted” after submitting,   the scheduler part (the red frame of the picture below )of resourcemanager web UI cann’t  be opened  and the exception log is as follows in the resourcemanager log:

2013-11-27 14:41:36,414 ERROR org.apache.hadoop.yarn.webapp.Dispatcher: error handling URI: /cluster/scheduler
java.lang.reflect.InvocationTargetException
        at sun.reflect.GeneratedMethodAccessor22.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.yarn.webapp.Dispatcher.service(Dispatcher.java:153)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
        at com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
        at com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
        at com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:900)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:834)
        at com.sun.jersey.spi.container.servlet.ServletContainer.doFilter(ServletContainer.java:795)
        at com.google.inject.servlet.FilterDefinition.doFilter(FilterDefinition.java:163)
        at com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:58)
        at com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
        at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.dao.FairSchedulerInfo.getAppFairShare(FairSchedulerInfo.java:49)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerAppsBlock.render(FairSchedulerAppsBlock.java:97)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock$Block.subView(HtmlBlock.java:40)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet._(Hamlet.java:30347)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.FairSchedulerPage$QueuesBlock.render(FairSchedulerPage.java:176)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.render(HtmlBlock.java:66)
        at org.apache.hadoop.yarn.webapp.view.HtmlBlock.renderPartial(HtmlBlock.java:76)
        at org.apache.hadoop.yarn.webapp.View.render(View.java:235)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage$Page.subView(HtmlPage.java:49)
        at org.apache.hadoop.yarn.webapp.hamlet.HamletImpl$EImp._v(HamletImpl.java:117)
        at org.apache.hadoop.yarn.webapp.hamlet.Hamlet$TD._(Hamlet.java:845)
        at org.apache.hadoop.yarn.webapp.view.TwoColumnLayout.render(TwoColumnLayout.java:56)
        at org.apache.hadoop.yarn.webapp.view.HtmlPage.render(HtmlPage.java:82)
        at org.apache.hadoop.yarn.webapp.Controller.render(Controller.java:212)
        at org.apache.hadoop.yarn.server.resourcemanager.webapp.RmController.scheduler(RmController.java:82)
        ... 40 more

[cid:image001.png@01CEEC42.BE0E06B0]
发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com]
发送时间: 2013年11月28日 1:20
收件人: user@hadoop.apache.org
主题: Re: 答复: problems of FairScheduler in hadoop2.2.0

Thanks for the additional info.  Still not sure what could be going on.  Do you notice any other suspicious LOG messages in the resourcemanager log?  Are you able to show the results of <resourcemanagerwebaddress>/ws/v1/cluster/scheduler?  On the resourcemanager web UI, how much memory does it say is used?

On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <sh...@qunar.com>> wrote:
Hi,

sorry, I complement some information.

The hadoop 2.2.0 had been running normally for some days since I start up the hadoop server. I can run jobs  without any problems.
Today suddenly the jobs cannot run and all the jobs’ status were keeping “submitted” after submitting.
There are 3 slavers and every slave has 32G memory and 24 cpus.

The contents of my fair-scheduler.xml is as follows:

<?xml version="1.0"?>
<allocations>
    <queue name="root">
    <minResources>10000mb,10vcores</minResources>
    <maxResources>90000mb,100vcores</maxResources>
    <maxRunningApps>50</maxRunningApps>
    <weight>2.0</weight>
    <schedulingMode>fair</schedulingMode>
    <aclSubmitApps> </aclSubmitApps>
    <aclAdministerApps> </aclAdministerApps>
        <queue name="queue1">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
        </queue>
        <queue name="queue2">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1 datadev</aclSubmitApps>
        </queue>
        <queue name="queue3">
                <minResources>5000mb,5vcores</minResources>
                <maxResources>10000mb,10vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>datadev admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
        </queue>
        <queue name="default">
                <minResources>10000mb,10vcores</minResources>
                <maxResources>30000mb,30vcores</maxResources>
                <maxRunningApps>10</maxRunningApps>
                <weight>2.0</weight>
                <schedulingMode>fair</schedulingMode>
                <aclAdministerApps>xxx1 admins</aclAdministerApps>
                <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
        </queue>
      </queue>
  <user name="xxx">
    <maxRunningApps>10</maxRunningApps>
  </user>
  <userMaxAppsDefault>10</userMaxAppsDefault>
</allocations>

发件人: Sandy Ryza [mailto:sandy.ryza@cloudera.com<ma...@cloudera.com>]
发送时间: 2013年11月27日 16:33
收件人: user@hadoop.apache.org<ma...@hadoop.apache.org>
主题: Re: problems of FairScheduler in hadoop2.2.0

Hi,

Can you share the contents of your fair-scheduler.xml?  If you submit just a single job, does it run?  What do you see if you go to <resourcemanagerwebui>/ws/v1/cluster/scheduler?

-Sandy

On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <sh...@qunar.com>> wrote:
Hi, all

When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the hadoop resourcemanager cannot work normally: When I submit jobs and the jobs’ status all are “submitted” and cannot run.
I cannot find any answers in the internet, who can give me some help? Thanks.

The resourcemanager log is as follows:

2013-11-27 14:39:10,749 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,050 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:11,051 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:11,753 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:11,754 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1129_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,055 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1128_000001
2013-11-27 14:39:12,056 ERROR org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Request for appInfo of unknown attemptappattempt_138474337603
8_1127_000001



Re: 答复: problems of FairScheduler in hadoop2.2.0

Posted by Sandy Ryza <sa...@cloudera.com>.
Thanks for the additional info.  Still not sure what could be going on.  Do
you notice any other suspicious LOG messages in the resourcemanager log?
 Are you able to show the results of <resourcemanagerwebaddress>/ws/v1/
cluster/scheduler?  On the resourcemanager web UI, how much memory does it
say is used?


On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <sh...@qunar.com> wrote:

>  Hi,
>
>
>
> sorry, I complement some information.
>
>
>
> The hadoop 2.2.0 had been running normally for some days since I start up
> the hadoop server. I can run jobs  without any problems.
>
> Today suddenly the jobs cannot run and all the jobs’ status were keeping
> “submitted” after submitting.
>
>  There are 3 slavers and every slave has 32G memory and 24 cpus.
>
>
>
> The contents of my fair-scheduler.xml is as follows:
>
>
>
> <?xml version="1.0"?>
>
> <allocations>
>
>     <queue name="root">
>
>     <minResources>10000mb,10vcores</minResources>
>
>     <maxResources>90000mb,100vcores</maxResources>
>
>     <maxRunningApps>50</maxRunningApps>
>
>     <weight>2.0</weight>
>
>     <schedulingMode>fair</schedulingMode>
>
>     <aclSubmitApps> </aclSubmitApps>
>
>     <aclAdministerApps> </aclAdministerApps>
>
>         <queue name="queue1">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="queue2">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>datadev admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="queue3">
>
>                 <minResources>5000mb,5vcores</minResources>
>
>                 <maxResources>10000mb,10vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>datadev admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="default">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>xxx1 admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
>
>         </queue>
>
>       </queue>
>
>   <user name="xxx">
>
>     <maxRunningApps>10</maxRunningApps>
>
>   </user>
>
>   <userMaxAppsDefault>10</userMaxAppsDefault>
>
> </allocations>
>
>
>
> *发件人:* Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> *发送时间:* 2013年11月27日 16:33
> *收件人:* user@hadoop.apache.org
> *主题:* Re: problems of FairScheduler in hadoop2.2.0
>
>
>
> Hi,
>
>
>
> Can you share the contents of your fair-scheduler.xml?  If you submit just
> a single job, does it run?  What do you see if you go to
> <resourcemanagerwebui>/ws/v1/cluster/scheduler?
>
>
>
> -Sandy
>
>
>
> On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <sh...@qunar.com> wrote:
>
> Hi, all
>
>
>
> When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the
> hadoop resourcemanager cannot work normally: When I submit jobs and the
> jobs’ status all are “submitted” and cannot run.
>
> I cannot find any answers in the internet, who can give me some help?
> Thanks.
>
>
>
> The resourcemanager log is as follows:
>
>
>
> 2013-11-27 14:39:10,749 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:11,050 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:11,050 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:11,051 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:11,051 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:11,753 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:11,754 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:12,055 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:12,055 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:12,056 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:12,056 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
>
>

Re: 答复: problems of FairScheduler in hadoop2.2.0

Posted by Sandy Ryza <sa...@cloudera.com>.
Thanks for the additional info.  Still not sure what could be going on.  Do
you notice any other suspicious LOG messages in the resourcemanager log?
 Are you able to show the results of <resourcemanagerwebaddress>/ws/v1/
cluster/scheduler?  On the resourcemanager web UI, how much memory does it
say is used?


On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <sh...@qunar.com> wrote:

>  Hi,
>
>
>
> sorry, I complement some information.
>
>
>
> The hadoop 2.2.0 had been running normally for some days since I start up
> the hadoop server. I can run jobs  without any problems.
>
> Today suddenly the jobs cannot run and all the jobs’ status were keeping
> “submitted” after submitting.
>
>  There are 3 slavers and every slave has 32G memory and 24 cpus.
>
>
>
> The contents of my fair-scheduler.xml is as follows:
>
>
>
> <?xml version="1.0"?>
>
> <allocations>
>
>     <queue name="root">
>
>     <minResources>10000mb,10vcores</minResources>
>
>     <maxResources>90000mb,100vcores</maxResources>
>
>     <maxRunningApps>50</maxRunningApps>
>
>     <weight>2.0</weight>
>
>     <schedulingMode>fair</schedulingMode>
>
>     <aclSubmitApps> </aclSubmitApps>
>
>     <aclAdministerApps> </aclAdministerApps>
>
>         <queue name="queue1">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="queue2">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>datadev admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="queue3">
>
>                 <minResources>5000mb,5vcores</minResources>
>
>                 <maxResources>10000mb,10vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>datadev admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="default">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>xxx1 admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
>
>         </queue>
>
>       </queue>
>
>   <user name="xxx">
>
>     <maxRunningApps>10</maxRunningApps>
>
>   </user>
>
>   <userMaxAppsDefault>10</userMaxAppsDefault>
>
> </allocations>
>
>
>
> *发件人:* Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> *发送时间:* 2013年11月27日 16:33
> *收件人:* user@hadoop.apache.org
> *主题:* Re: problems of FairScheduler in hadoop2.2.0
>
>
>
> Hi,
>
>
>
> Can you share the contents of your fair-scheduler.xml?  If you submit just
> a single job, does it run?  What do you see if you go to
> <resourcemanagerwebui>/ws/v1/cluster/scheduler?
>
>
>
> -Sandy
>
>
>
> On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <sh...@qunar.com> wrote:
>
> Hi, all
>
>
>
> When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the
> hadoop resourcemanager cannot work normally: When I submit jobs and the
> jobs’ status all are “submitted” and cannot run.
>
> I cannot find any answers in the internet, who can give me some help?
> Thanks.
>
>
>
> The resourcemanager log is as follows:
>
>
>
> 2013-11-27 14:39:10,749 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:11,050 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:11,050 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:11,051 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:11,051 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:11,753 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:11,754 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:12,055 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:12,055 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:12,056 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:12,056 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
>
>

Re: 答复: problems of FairScheduler in hadoop2.2.0

Posted by Sandy Ryza <sa...@cloudera.com>.
Thanks for the additional info.  Still not sure what could be going on.  Do
you notice any other suspicious LOG messages in the resourcemanager log?
 Are you able to show the results of <resourcemanagerwebaddress>/ws/v1/
cluster/scheduler?  On the resourcemanager web UI, how much memory does it
say is used?


On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <sh...@qunar.com> wrote:

>  Hi,
>
>
>
> sorry, I complement some information.
>
>
>
> The hadoop 2.2.0 had been running normally for some days since I start up
> the hadoop server. I can run jobs  without any problems.
>
> Today suddenly the jobs cannot run and all the jobs’ status were keeping
> “submitted” after submitting.
>
>  There are 3 slavers and every slave has 32G memory and 24 cpus.
>
>
>
> The contents of my fair-scheduler.xml is as follows:
>
>
>
> <?xml version="1.0"?>
>
> <allocations>
>
>     <queue name="root">
>
>     <minResources>10000mb,10vcores</minResources>
>
>     <maxResources>90000mb,100vcores</maxResources>
>
>     <maxRunningApps>50</maxRunningApps>
>
>     <weight>2.0</weight>
>
>     <schedulingMode>fair</schedulingMode>
>
>     <aclSubmitApps> </aclSubmitApps>
>
>     <aclAdministerApps> </aclAdministerApps>
>
>         <queue name="queue1">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="queue2">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>datadev admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="queue3">
>
>                 <minResources>5000mb,5vcores</minResources>
>
>                 <maxResources>10000mb,10vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>datadev admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="default">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>xxx1 admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
>
>         </queue>
>
>       </queue>
>
>   <user name="xxx">
>
>     <maxRunningApps>10</maxRunningApps>
>
>   </user>
>
>   <userMaxAppsDefault>10</userMaxAppsDefault>
>
> </allocations>
>
>
>
> *发件人:* Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> *发送时间:* 2013年11月27日 16:33
> *收件人:* user@hadoop.apache.org
> *主题:* Re: problems of FairScheduler in hadoop2.2.0
>
>
>
> Hi,
>
>
>
> Can you share the contents of your fair-scheduler.xml?  If you submit just
> a single job, does it run?  What do you see if you go to
> <resourcemanagerwebui>/ws/v1/cluster/scheduler?
>
>
>
> -Sandy
>
>
>
> On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <sh...@qunar.com> wrote:
>
> Hi, all
>
>
>
> When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the
> hadoop resourcemanager cannot work normally: When I submit jobs and the
> jobs’ status all are “submitted” and cannot run.
>
> I cannot find any answers in the internet, who can give me some help?
> Thanks.
>
>
>
> The resourcemanager log is as follows:
>
>
>
> 2013-11-27 14:39:10,749 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:11,050 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:11,050 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:11,051 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:11,051 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:11,753 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:11,754 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:12,055 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:12,055 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:12,056 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:12,056 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
>
>

Re: 答复: problems of FairScheduler in hadoop2.2.0

Posted by Sandy Ryza <sa...@cloudera.com>.
Thanks for the additional info.  Still not sure what could be going on.  Do
you notice any other suspicious LOG messages in the resourcemanager log?
 Are you able to show the results of <resourcemanagerwebaddress>/ws/v1/
cluster/scheduler?  On the resourcemanager web UI, how much memory does it
say is used?


On Wed, Nov 27, 2013 at 1:28 AM, 麦树荣 <sh...@qunar.com> wrote:

>  Hi,
>
>
>
> sorry, I complement some information.
>
>
>
> The hadoop 2.2.0 had been running normally for some days since I start up
> the hadoop server. I can run jobs  without any problems.
>
> Today suddenly the jobs cannot run and all the jobs’ status were keeping
> “submitted” after submitting.
>
>  There are 3 slavers and every slave has 32G memory and 24 cpus.
>
>
>
> The contents of my fair-scheduler.xml is as follows:
>
>
>
> <?xml version="1.0"?>
>
> <allocations>
>
>     <queue name="root">
>
>     <minResources>10000mb,10vcores</minResources>
>
>     <maxResources>90000mb,100vcores</maxResources>
>
>     <maxRunningApps>50</maxRunningApps>
>
>     <weight>2.0</weight>
>
>     <schedulingMode>fair</schedulingMode>
>
>     <aclSubmitApps> </aclSubmitApps>
>
>     <aclAdministerApps> </aclAdministerApps>
>
>         <queue name="queue1">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>xxx1,xxx2 admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2,xxx3 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="queue2">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>datadev admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="queue3">
>
>                 <minResources>5000mb,5vcores</minResources>
>
>                 <maxResources>10000mb,10vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>datadev admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2 datadev</aclSubmitApps>
>
>         </queue>
>
>         <queue name="default">
>
>                 <minResources>10000mb,10vcores</minResources>
>
>                 <maxResources>30000mb,30vcores</maxResources>
>
>                 <maxRunningApps>10</maxRunningApps>
>
>                 <weight>2.0</weight>
>
>                 <schedulingMode>fair</schedulingMode>
>
>                 <aclAdministerApps>xxx1 admins</aclAdministerApps>
>
>                 <aclSubmitApps>xxx1,xxx2,xxx3,root datadev</aclSubmitApps>
>
>         </queue>
>
>       </queue>
>
>   <user name="xxx">
>
>     <maxRunningApps>10</maxRunningApps>
>
>   </user>
>
>   <userMaxAppsDefault>10</userMaxAppsDefault>
>
> </allocations>
>
>
>
> *发件人:* Sandy Ryza [mailto:sandy.ryza@cloudera.com]
> *发送时间:* 2013年11月27日 16:33
> *收件人:* user@hadoop.apache.org
> *主题:* Re: problems of FairScheduler in hadoop2.2.0
>
>
>
> Hi,
>
>
>
> Can you share the contents of your fair-scheduler.xml?  If you submit just
> a single job, does it run?  What do you see if you go to
> <resourcemanagerwebui>/ws/v1/cluster/scheduler?
>
>
>
> -Sandy
>
>
>
> On Wed, Nov 27, 2013 at 12:09 AM, 麦树荣 <sh...@qunar.com> wrote:
>
> Hi, all
>
>
>
> When I run jobs in hadoop 2.2.0,  I encounter a problem. Suddenly, the
> hadoop resourcemanager cannot work normally: When I submit jobs and the
> jobs’ status all are “submitted” and cannot run.
>
> I cannot find any answers in the internet, who can give me some help?
> Thanks.
>
>
>
> The resourcemanager log is as follows:
>
>
>
> 2013-11-27 14:39:10,749 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:11,050 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:11,050 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:11,051 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:11,051 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:11,753 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:11,754 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1129_000001
>
> 2013-11-27 14:39:12,055 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:12,055 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
> 2013-11-27 14:39:12,056 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1128_000001
>
> 2013-11-27 14:39:12,056 ERROR
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler:
> Request for appInfo of unknown attemptappattempt_138474337603
>
> 8_1127_000001
>
>
>