You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Rod Paulk <rm...@gmail.com> on 2013/08/13 17:10:26 UTC

Re: YARN with local filesystem

I was able to execute the example by running the job as the yarn user.

For example the following successfully completes:
sudo -u yarn yarn org.apache.hadoop.examples.RandomWriter /tmp/random-out

Whereas this fails with the local user rpaulk:
 yarn org.apache.hadoop.examples.RandomWriter /tmp/random-out

On Wed, Jul 31, 2013 at 2:28 PM, Rod Paulk <rm...@gmail.com> wrote:

> I am having an issue running 2.0.5-alpha (BigTop-0.6.0) YARN-MapReduce on
> the local filesystem instead of HDFS.   The appTokens file that the error
> states is missing, does exist after the job fails.  I saw other 'similar'
> issues noted in YARN-917, YARN-513, YARN-993.   When I switch to HDFS, the
> jobs run fine.
>
> In core-site.xml
> <property>
>   <name>fs.defaultFS</name>
>   <value>file:///</value>
> </property>
>
> In mapred-site.xml
> <property>
>   <name>mapreduce.framework.name</name>
>   <value>yarn</value>
> </property>
>
> 2013-07-29 16:13:06,549 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Start request for container_1375138534137_0003_01_000001 by user rpaulk
>
> 2013-07-29 16:13:06,549 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl:
> Creating a new application reference for app application_1375138534137_0003
>
> 2013-07-29 16:13:06,549 INFO
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=rpaulk
> IP=172.20.130.215       OPERATION=Start Container Request
> TARGET=ContainerManageImpl      RESULT=SUCCESS
> APPID=application_1375138534137_0003
> CONTAINERID=container_1375138534137_0003_01_000001
>
> 2013-07-29 16:13:06,551 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1375138534137_0003 transitioned from NEW to INITING
>
> 2013-07-29 16:13:06,551 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Adding container_1375138534137_0003_01_000001 to application
> application_1375138534137_0003
>
> 2013-07-29 16:13:06,554 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Application application_1375138534137_0003 transitioned from INITING to
> RUNNING
>
> 2013-07-29 16:13:06,555 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1375138534137_0003_01_000001 transitioned from NEW to
> LOCALIZING
>
> *2013-07-29 16:13:06,555 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_13751385
> *
>
> *34137_0003/appTokens transitioned from INIT to DOWNLOADING*
>
> 2013-07-29 16:13:06,556 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_13751385
>
> 34137_0003/job.jar transitioned from INIT to DOWNLOADING
>
> 2013-07-29 16:13:06,556 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_13751385
>
> 34137_0003/job.splitmetainfo transitioned from INIT to DOWNLOADING
>
> 2013-07-29 16:13:06,556 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_13751385
>
> 34137_0003/job.split transitioned from INIT to DOWNLOADING
>
> 2013-07-29 16:13:06,556 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/job.xml
> transitioned from INIT to DOWNLOADING
>
> 2013-07-29 16:13:06,556 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Created localizer for container_1375138534137_0003_01_000001
>
> 2013-07-29 16:13:06,559 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> Writing credentials to the nmPrivate file
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1375138534137_0003_01_000001.tokens.
> Credentials list:
>
> 2013-07-29 16:13:06,560 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Initializing user rpaulk
>
> 2013-07-29 16:13:06,564 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: Copying
> from
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/nmPrivate/container_1375138534137_0003_01_000001.tokens
> to
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/rpaulk/appcache/application_1375138534137_0003/container_1375138534137_0003_01_000001.tokens
>
> 2013-07-29 16:13:06,564 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor: CWD set
> to
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/rpaulk/appcache/application_1375138534137_0003
> =
> file:/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/rpaulk/appcache/application_1375138534137_0003
>
> *2013-07-29 16:13:06,646 ERROR
> org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
> as:rpaulk (auth:SIMPLE) cause:java.io.FileNotFoundException: File
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/appTokens
> does not exist*
>
> 2013-07-29 16:13:06,648 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService:
> DEBUG: FAILED {
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/appTokens,
> 1375139459000, FILE, null }
>
> RemoteTrace:
>
> java.io.FileNotFoundException: File
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/appTokens
> does not exist
>
>         at
> org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:492)
>
>         at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:395)
>
>         at
> org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:176)
>
>         at
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:51)
>
>         at
> org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:284)
>
>         at
> org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:282)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
>
>         at
> org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:280)
>
>         at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:51)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>
>         at java.lang.Thread.run(Thread.java:662)
>
>  at LocalTrace:
>
>         org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl:
> File
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/appTokens
> does not exist
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:819)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:491)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:218)
>
>         at
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>
>         at
> org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:396)
>
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)
>
> 2013-07-29 16:13:06,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1375138534137_0003_01_000001 transitioned from
> LOCALIZING to LOCALIZATION_FAILED
>
> *2013-07-29 16:13:06,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/appTokens
> transitioned from DOWNLOADING to INIT*
>
> 2013-07-29 16:13:06,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/job.jar
> transitioned from DOWNLOADING to INIT
>
> 2013-07-29 16:13:06,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/job.splitmetainfo
> transitioned from DOWNLOADING to INIT
>
> 2013-07-29 16:13:06,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/job.split
> transitioned from DOWNLOADING to INIT
>
> 2013-07-29 16:13:06,650 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalizedResource:
> Resource
> file:/nas/scratch/localfs-1/hadoop-yarn/staging/rpaulk/.staging/job_1375138534137_0003/job.xml
> transitioned from DOWNLOADING to INIT
>
> 2013-07-29 16:13:06,652 WARN
> org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=rpaulk
> OPERATION=Container Finished - Failed   TARGET=ContainerImpl
> RESULT=FAILURE  DESCRIPTION=Container failed with state:
> LOCALIZATION_FAILED  APPID=application_1375138534137_0003
> CONTAINERID=container_1375138534137_0003_01_000001
>
> 2013-07-29 16:13:06,652 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.Container:
> Container container_1375138534137_0003_01_000001 transitioned from
> LOCALIZATION_FAILED to DONE
>
> 2013-07-29 16:13:06,652 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application:
> Removing container_1375138534137_0003_01_000001 from application
> application_1375138534137_0003
>
> 2013-07-29 16:13:06,652 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl:
> Considering container container_1375138534137_0003_01_000001 for
> log-aggregation
>
> 2013-07-29 16:13:06,652 INFO
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor:
> Deleting absolute path :
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/rpaulk/appcache/application_1375138534137_0003/container_1375138534137_0003_01_000001
>