You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Bing Jiang <bi...@gmail.com> on 2011/12/30 10:16:22 UTC

Exception from launching container due to fail to create appattempt local dir in NM

I get help from Hadoop-Yarn to deploy my real-time distributed computation
system, and I get reply from mapreduce-user@hadoop.apache.org to follow
these guilders below:

http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html
>>
>> http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/WritingYarnApplications.html
>>
> When I follow the steps to construct my Client, ApplicationMaster.And an
issue occurs to me that  NM fail to launch a Container because of
java.io.FileNotFoundException.
So the part of NM log  has been attached below:
 ....
2011-12-29 15:49:16,250 INFO org.apache.hadoop.yarn.server.
nodemanager.containermanager.application.Application: Adding
container_1325062142731_0006_01_000001 to application
application_1325062142731_0006
2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ApplicationLocalizationEvent.EventType:
INIT_APPLICATION_RESOURCES
2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationInitedEvent.EventType:
APPLICATION_INITED
2011-12-29 15:49:16,250 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Processing application_1325062142731_0006 of type APPLICATION_INITED
2011-12-29 15:49:16,250 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Application application_1325062142731_0006 transitioned from INITING to
RUNNING
2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerAppStartedEvent.EventType:
APPLICATION_STARTED
2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerInitEvent.EventType:
INIT_CONTAINER
2011-12-29 15:49:16,250 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_1325062142731_0006_01_000001 of type INIT_CONTAINER
2011-12-29 15:49:16,250 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1325062142731_0006_01_000001 transitioned from NEW to
LOCALIZED
2011-12-29 15:49:16,250 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:
LAUNCH_CONTAINER
2011-12-29 15:49:16,287 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEvent.EventType:
CONTAINER_LAUNCHED
2011-12-29 15:49:16,287 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_1325062142731_0006_01_000001 of type CONTAINER_LAUNCHED
2011-12-29 15:49:16,287 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1325062142731_0006_01_000001 transitioned from
LOCALIZED to RUNNING
2011-12-29 15:49:16,288 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainerStartMonitoringEvent.EventType:
START_MONITORING_CONTAINER
2011-12-29 15:49:16,289 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Failed to launch container
java.io.FileNotFoundException: File
/tmp/nm-local-dir/usercache/jiangbing/appcache/application_1325062142731_0006
does not exist
    at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:431)
    at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:815)
    at
org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
    at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
    at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:700)
    at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:697)
   at
org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
    at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:697)
    at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:123)
    at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:237)
    at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:67)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
    at java.util.concurrent.FutureTask.run(FutureTask.java:138)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:662)
2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerExitEvent.EventType:
CONTAINER_EXITED_WITH_FAILURE
2011-12-29 15:49:16,290 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Processing container_1325062142731_0006_01_000001 of type
CONTAINER_EXITED_WITH_FAILURE
2011-12-29 15:49:16,290 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1325062142731_0006_01_000001 transitioned from RUNNING
to EXITED_WITH_FAILURE
2011-12-29 15:49:16,290 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEvent.EventType:
CLEANUP_CONTAINER
2011-12-29 15:49:16,290 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Cleaning up container container_1325062142731_0006_01_000001
2011-12-29 15:49:16,290 DEBUG
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Marking container container_1325062142731_0006_01_000001 as inactive
2011-12-29 15:49:16,290 DEBUG
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Getting pid for container container_1325062142731_0006_01_000001 to kill
from pid file
/tmp/nm-local-dir/nmPrivate/container_1325062142731_0006_01_000001.pid
2011-12-29 15:49:16,290 DEBUG
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch:
Accessing pid for container container_1325062142731_0006_01_000001 from pid
file /tmp/nm-local-dir/nmPrivate/container_1325062142731_0006_01_000001.pid
2011-12-29 15:49:16,307 DEBUG org.apache.hadoop.yarn.event.AsyncDispatcher:
Dispatching the event
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.ContainerLocalizationCleanupEvent.EventType:
CLEANUP_CONTAINER_RESOURCES

In order to figure out the fact, I trace back to source code. I find that *
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor*:

@Override
  public int launchContainer(Container container,
      Path nmPrivateContainerScriptPath, Path nmPrivateTokensPath,
      String userName, String appId, Path containerWorkDir)
      throws IOException {
      ....
       String[] sLocalDirs = getConf().getStrings(
        YarnConfiguration.NM_LOCAL_DIRS,
        YarnConfiguration.DEFAULT_NM_LOCAL_DIRS);
    for (String sLocalDir : sLocalDirs) {
      Path usersdir = new Path(sLocalDir, ContainerLocalizer.USERCACHE);
      Path userdir = new Path(usersdir, userName);
      Path appCacheDir = new Path(userdir, ContainerLocalizer.APPCACHE);
      Path appDir = new Path(appCacheDir, appIdStr);
      Path containerDir = new Path(appDir, containerIdStr);
      lfs.mkdir(containerDir, null, false);
   }
  ....

lfs.mkdir(containerDir, null, false);  refer to the api of mkdir, false
means cannot create parent path here if not exists.
In my hadoop project, I revise  lfs.mkdir(containerDir, null, false);  to
lfs.mkdir(containerDir, null, true); , then my program goes well.

I want to ask why you set false here, or I missed out some important issues?

Thanks!



-- 
Bing Jiang
Blog: http://blog.sina.com.cn/jiangbinglover
National Research Center for Intelligent Computing Systems
Institute of Computing technology
Graduate University of Chinese Academy of Science