You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@twill.apache.org by Peter Grman <pe...@gmail.com> on 2014/06/21 12:48:06 UTC

Application does not start on remote container

Hi,

When I run my application only on the local nodemanager, everything works
fine. But when I try to start it on multiple nodes, it fails. Looking in
the nodemanager logs I could find a possible cause for the error:

2014-06-21 10:24:05,118 WARN
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:ubuntu (auth:SIMPLE) cause:java.io.FileNotFoundException: File
file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
does not exist

This path is only valid for the local machine where Twill generates the
folder and files, but not for the remote machine. Why isn't that file
copied to HDFS for distribution? Is there anything I could change, so the
file gets delivered together with everything else? - I'm using Apache Twill
0.3.0-snapshot. And Hadoop 2.3.0 as well as Hadoop 2.3.0 libraries in my
application

Thank you

The complete log file can be found formatted here:
https://gist.github.com/pgrm/68d07084b1e2cb9e2ce4
And is also below here:

2014-06-21 10:24:03,729 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Stopping resource-monitoring for container_1403345039835_0001_01_000009
2014-06-21 10:24:04,961 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
Start request for container_1403345039835_0001_01_000010 by user ubuntu
2014-06-21 10:24:04,961 INFO
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu
IP=10.216.60.23 OPERATION=Start Container Request
TARGET=ContainerManageImpl      RESULT=SUCCESS
 APPID=application_1403345039835_0001
 CONTAINERID=container_1403345039835_0001_01_000010
2014-06-21 10:24:04,961 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Adding container_1403345039835_0001_01_000010 to application
application_1403345039835_0001
2014-06-21 10:24:04,966 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1403345039835_0001_01_000010 transitioned from NEW to
LOCALIZING
2014-06-21 10:24:04,966 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
event CONTAINER_INIT for appId application_1403345039835_0001
2014-06-21 10:24:04,966 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource
file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
transitioned from INIT to DOWNLOADING
2014-06-21 10:24:04,967 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Created localizer for container_1403345039835_0001_01_000010
2014-06-21 10:24:04,977 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Writing credentials to the nmPrivate file
/tmp/hadoop-ubuntu/nm-local-dir/nmPrivate/container_1403345039835_0001_01_000010.tokens.
Credentials list:
2014-06-21 10:24:04,980 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Initializing user ubuntu
2014-06-21 10:24:05,050 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying
from
/tmp/hadoop-ubuntu/nm-local-dir/nmPrivate/container_1403345039835_0001_01_000010.tokens
to
/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001/container_1403345039835_0001_01_000010.tokens
2014-06-21 10:24:05,050 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set
to
/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001
=
file:/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001
2014-06-21 10:24:05,118 WARN
org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:ubuntu (auth:SIMPLE) cause:java.io.FileNotFoundException: File
file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
does not exist
2014-06-21 10:24:05,120 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
DEBUG: FAILED {
file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml,
1403346214000, FILE, null }, File
file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
does not exist
2014-06-21 10:24:05,121 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
Resource
file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
transitioned from DOWNLOADING to FAILED
2014-06-21 10:24:05,121 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1403345039835_0001_01_000010 transitioned from
LOCALIZING to LOCALIZATION_FAILED
2014-06-21 10:24:05,121 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
Container container_1403345039835_0001_01_000010 sent RELEASE event on a
resource request {
file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml,
1403346214000, FILE, null } not present in cache.
2014-06-21 10:24:05,122 WARN org.apache.hadoop.ipc.Client: interrupted
waiting to send rpc request to server
java.lang.InterruptedException
        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
        at java.util.concurrent.FutureTask.get(FutureTask.java:187)
        at
org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1025)
        at org.apache.hadoop.ipc.Client.call(Client.java:1379)
        at org.apache.hadoop.ipc.Client.call(Client.java:1359)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy25.heartbeat(Unknown Source)
        at
org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:255)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
        at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981)
2014-06-21 10:24:05,122 WARN
org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu
OPERATION=Container Finished - Failed   TARGET=ContainerImpl
 RESULT=FAILURE  DESCRIPTION=Container failed with state:
LOCALIZATION_FAILED    APPID=application_1403345039835_0001
 CONTAINERID=container_1403345039835_0001_01_000010
2014-06-21 10:24:05,122 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
Unknown localizer with localizerId container_1403345039835_0001_01_000010
is sending heartbeat. Ordering it to DIE
2014-06-21 10:24:05,122 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
Container container_1403345039835_0001_01_000010 transitioned from
LOCALIZATION_FAILED to DONE
2014-06-21 10:24:05,122 INFO
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
Deleting absolute path :
/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001/container_1403345039835_0001_01_000010
2014-06-21 10:24:05,123 WARN
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: delete
returned false for path:
[/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001/container_1403345039835_0001_01_000010]
2014-06-21 10:24:05,123 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
Removing container_1403345039835_0001_01_000010 from application
application_1403345039835_0001
2014-06-21 10:24:05,123 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
event CONTAINER_STOP for appId application_1403345039835_0001
2014-06-21 10:24:05,968 INFO
org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed
completed container container_1403345039835_0001_01_000010
2014-06-21 10:24:06,730 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Stopping resource-monitoring for container_1403345039835_0001_01_000010

Re: Application does not start on remote container

Posted by Terence Yim <ch...@gmail.com>.
Hi,

When you construct the YarnTwillRunnerService instance, do you provide it with the YarnConfiguration that has fs.default.name pointed to the Hdfs name node of your cluster or did you provide any LocationFactory when creating the instance? Twill depends on those information in order be able to use the HDFS.

Terence 

Sent from my iPhone

> On Jun 21, 2014, at 3:48 AM, Peter Grman <pe...@gmail.com> wrote:
> 
> Hi,
> 
> When I run my application only on the local nodemanager, everything works
> fine. But when I try to start it on multiple nodes, it fails. Looking in
> the nodemanager logs I could find a possible cause for the error:
> 
> 2014-06-21 10:24:05,118 WARN
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:ubuntu (auth:SIMPLE) cause:java.io.FileNotFoundException: File
> file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
> does not exist
> 
> This path is only valid for the local machine where Twill generates the
> folder and files, but not for the remote machine. Why isn't that file
> copied to HDFS for distribution? Is there anything I could change, so the
> file gets delivered together with everything else? - I'm using Apache Twill
> 0.3.0-snapshot. And Hadoop 2.3.0 as well as Hadoop 2.3.0 libraries in my
> application
> 
> Thank you
> 
> The complete log file can be found formatted here:
> https://gist.github.com/pgrm/68d07084b1e2cb9e2ce4
> And is also below here:
> 
> 2014-06-21 10:24:03,729 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Stopping resource-monitoring for container_1403345039835_0001_01_000009
> 2014-06-21 10:24:04,961 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Start request for container_1403345039835_0001_01_000010 by user ubuntu
> 2014-06-21 10:24:04,961 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu
> IP=10.216.60.23 OPERATION=Start Container Request
> TARGET=ContainerManageImpl      RESULT=SUCCESS
> APPID=application_1403345039835_0001
> CONTAINERID=container_1403345039835_0001_01_000010
> 2014-06-21 10:24:04,961 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Adding container_1403345039835_0001_01_000010 to application
> application_1403345039835_0001
> 2014-06-21 10:24:04,966 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1403345039835_0001_01_000010 transitioned from NEW to
> LOCALIZING
> 2014-06-21 10:24:04,966 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_INIT for appId application_1403345039835_0001
> 2014-06-21 10:24:04,966 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
> transitioned from INIT to DOWNLOADING
> 2014-06-21 10:24:04,967 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Created localizer for container_1403345039835_0001_01_000010
> 2014-06-21 10:24:04,977 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Writing credentials to the nmPrivate file
> /tmp/hadoop-ubuntu/nm-local-dir/nmPrivate/container_1403345039835_0001_01_000010.tokens.
> Credentials list:
> 2014-06-21 10:24:04,980 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Initializing user ubuntu
> 2014-06-21 10:24:05,050 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying
> from
> /tmp/hadoop-ubuntu/nm-local-dir/nmPrivate/container_1403345039835_0001_01_000010.tokens
> to
> /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001/container_1403345039835_0001_01_000010.tokens
> 2014-06-21 10:24:05,050 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set
> to
> /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001
> =
> file:/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001
> 2014-06-21 10:24:05,118 WARN
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:ubuntu (auth:SIMPLE) cause:java.io.FileNotFoundException: File
> file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
> does not exist
> 2014-06-21 10:24:05,120 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> DEBUG: FAILED {
> file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml,
> 1403346214000, FILE, null }, File
> file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
> does not exist
> 2014-06-21 10:24:05,121 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml
> transitioned from DOWNLOADING to FAILED
> 2014-06-21 10:24:05,121 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1403345039835_0001_01_000010 transitioned from
> LOCALIZING to LOCALIZATION_FAILED
> 2014-06-21 10:24:05,121 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl:
> Container container_1403345039835_0001_01_000010 sent RELEASE event on a
> resource request {
> file:/home/ubuntu/DrillbitRunnable/c134a771-c65f-4595-b51a-dc6d282cd4ad/logback-template.7c14a388-e69b-4133-bfb8-6102445c8098.xml,
> 1403346214000, FILE, null } not present in cache.
> 2014-06-21 10:24:05,122 WARN org.apache.hadoop.ipc.Client: interrupted
> waiting to send rpc request to server
> java.lang.InterruptedException
>        at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:400)
>        at java.util.concurrent.FutureTask.get(FutureTask.java:187)
>        at
> org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1025)
>        at org.apache.hadoop.ipc.Client.call(Client.java:1379)
>        at org.apache.hadoop.ipc.Client.call(Client.java:1359)
>        at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>        at com.sun.proxy.$Proxy25.heartbeat(Unknown Source)
>        at
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
>        at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:255)
>        at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
>        at
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:107)
>        at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:981)
> 2014-06-21 10:24:05,122 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=ubuntu
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state:
> LOCALIZATION_FAILED    APPID=application_1403345039835_0001
> CONTAINERID=container_1403345039835_0001_01_000010
> 2014-06-21 10:24:05,122 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Unknown localizer with localizerId container_1403345039835_0001_01_000010
> is sending heartbeat. Ordering it to DIE
> 2014-06-21 10:24:05,122 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1403345039835_0001_01_000010 transitioned from
> LOCALIZATION_FAILED to DONE
> 2014-06-21 10:24:05,122 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Deleting absolute path :
> /tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001/container_1403345039835_0001_01_000010
> 2014-06-21 10:24:05,123 WARN
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: delete
> returned false for path:
> [/tmp/hadoop-ubuntu/nm-local-dir/usercache/ubuntu/appcache/application_1403345039835_0001/container_1403345039835_0001_01_000010]
> 2014-06-21 10:24:05,123 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Removing container_1403345039835_0001_01_000010 from application
> application_1403345039835_0001
> 2014-06-21 10:24:05,123 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got
> event CONTAINER_STOP for appId application_1403345039835_0001
> 2014-06-21 10:24:05,968 INFO
> org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Removed
> completed container container_1403345039835_0001_01_000010
> 2014-06-21 10:24:06,730 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Stopping resource-monitoring for container_1403345039835_0001_01_000010