You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2015/07/13 23:58:05 UTC
[jira] [Commented] (YARN-3921) Permission denied errors for local
usercache directories when attempting to run MapReduce job on Kerberos
enabled cluster
[ https://issues.apache.org/jira/browse/YARN-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625438#comment-14625438 ]
Allen Wittenauer commented on YARN-3921:
----------------------------------------
AFAIK, YARN won't change the permissions on the work dirs when you switch modes. The assumption is the ops folks/tools will handle this as part of the transition.
amabari-qa's dir changing seems to be more related to something else (are these machines being managed via ambari and, like a naughty child, ambari is putting these were they don't belong?) given that you didn't say that a job belonging to the user ambari-qa job was run...
> Permission denied errors for local usercache directories when attempting to run MapReduce job on Kerberos enabled cluster
> --------------------------------------------------------------------------------------------------------------------------
>
> Key: YARN-3921
> URL: https://issues.apache.org/jira/browse/YARN-3921
> Project: Hadoop YARN
> Issue Type: Bug
> Affects Versions: 2.7.1
> Environment: sles11sp3
> Reporter: Zack Marsh
>
> Prior to enabling Kerberos on the Hadoop cluster, I am able to run a simple MapReduce example as the Linux user 'tdatuser':
> {code}
> iripiri1:~ # su tdatuser
> tdatuser@piripiri1:/root> yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 10000
> Number of Maps = 16
> Samples per Map = 10000
> Wrote input for Map #0
> Wrote input for Map #1
> Wrote input for Map #2
> Wrote input for Map #3
> Wrote input for Map #4
> Wrote input for Map #5
> Wrote input for Map #6
> Wrote input for Map #7
> Wrote input for Map #8
> Wrote input for Map #9
> Wrote input for Map #10
> Wrote input for Map #11
> Wrote input for Map #12
> Wrote input for Map #13
> Wrote input for Map #14
> Wrote input for Map #15
> Starting Job
> 15/07/13 17:02:31 INFO impl.TimelineClientImpl: Timeline service address: http:/ s/v1/timeline/
> 15/07/13 17:02:31 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to
> 15/07/13 17:02:31 INFO input.FileInputFormat: Total input paths to process : 16
> 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: number of splits:16
> 15/07/13 17:02:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_14
> 15/07/13 17:02:32 INFO impl.YarnClientImpl: Submitted application application_14
> 15/07/13 17:02:32 INFO mapreduce.Job: The url to track the job: http://piripiri3 cation_1436821014431_0003/
> 15/07/13 17:02:32 INFO mapreduce.Job: Running job: job_1436821014431_0003
> 15/07/13 17:05:50 INFO mapreduce.Job: Job job_1436821014431_0003 running in uber mode : false
> 15/07/13 17:05:50 INFO mapreduce.Job: map 0% reduce 0%
> 15/07/13 17:05:56 INFO mapreduce.Job: map 6% reduce 0%
> 15/07/13 17:06:00 INFO mapreduce.Job: map 13% reduce 0%
> 15/07/13 17:06:01 INFO mapreduce.Job: map 38% reduce 0%
> 15/07/13 17:06:05 INFO mapreduce.Job: map 44% reduce 0%
> 15/07/13 17:06:07 INFO mapreduce.Job: map 63% reduce 0%
> 15/07/13 17:06:09 INFO mapreduce.Job: map 69% reduce 0%
> 15/07/13 17:06:11 INFO mapreduce.Job: map 75% reduce 0%
> 15/07/13 17:06:12 INFO mapreduce.Job: map 81% reduce 0%
> 15/07/13 17:06:13 INFO mapreduce.Job: map 81% reduce 25%
> 15/07/13 17:06:14 INFO mapreduce.Job: map 94% reduce 25%
> 15/07/13 17:06:16 INFO mapreduce.Job: map 100% reduce 31%
> 15/07/13 17:06:17 INFO mapreduce.Job: map 100% reduce 100%
> 15/07/13 17:06:17 INFO mapreduce.Job: Job job_1436821014431_0003 completed successfully
> 15/07/13 17:06:17 INFO mapreduce.Job: Counters: 49
> File System Counters
> FILE: Number of bytes read=358
> FILE: Number of bytes written=2249017
> FILE: Number of read operations=0
> FILE: Number of large read operations=0
> FILE: Number of write operations=0
> HDFS: Number of bytes read=4198
> HDFS: Number of bytes written=215
> HDFS: Number of read operations=67
> HDFS: Number of large read operations=0
> HDFS: Number of write operations=3
> Job Counters
> Launched map tasks=16
> Launched reduce tasks=1
> Data-local map tasks=16
> Total time spent by all maps in occupied slots (ms)=160498
> Total time spent by all reduces in occupied slots (ms)=27302
> Total time spent by all map tasks (ms)=80249
> Total time spent by all reduce tasks (ms)=13651
> Total vcore-seconds taken by all map tasks=80249
> Total vcore-seconds taken by all reduce tasks=13651
> Total megabyte-seconds taken by all map tasks=246524928
> Total megabyte-seconds taken by all reduce tasks=41935872
> Map-Reduce Framework
> Map input records=16
> Map output records=32
> Map output bytes=288
> Map output materialized bytes=448
> Input split bytes=2310
> Combine input records=0
> Combine output records=0
> Reduce input groups=2
> Reduce shuffle bytes=448
> Reduce input records=32
> Reduce output records=0
> Spilled Records=64
> Shuffled Maps =16
> Failed Shuffles=0
> Merged Map outputs=16
> GC time elapsed (ms)=1501
> CPU time spent (ms)=13670
> Physical memory (bytes) snapshot=13480296448
> Virtual memory (bytes) snapshot=72598511616
> Total committed heap usage (bytes)=12508463104
> Shuffle Errors
> BAD_ID=0
> CONNECTION=0
> IO_ERROR=0
> WRONG_LENGTH=0
> WRONG_MAP=0
> WRONG_REDUCE=0
> File Input Format Counters
> Bytes Read=1888
> File Output Format Counters
> Bytes Written=97
> Job Finished in 226.813 seconds
> Estimated value of Pi is 3.14127500000000000000
> {code}
> However, after enabling Kerberos, the job fails:
> {code}
> tdatuser@piripiri1:/root> kinit -kt /etc/security/keytabs/tdatuser.headless.keytab tdatuser
> tdatuser@piripiri1:/root> yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples-2.*.jar pi 16 10000
> Number of Maps = 16
> Samples per Map = 10000
> Wrote input for Map #0
> Wrote input for Map #1
> Wrote input for Map #2
> Wrote input for Map #3
> Wrote input for Map #4
> Wrote input for Map #5
> Wrote input for Map #6
> Wrote input for Map #7
> Wrote input for Map #8
> Wrote input for Map #9
> Wrote input for Map #10
> Wrote input for Map #11
> Wrote input for Map #12
> Wrote input for Map #13
> Wrote input for Map #14
> Wrote input for Map #15
> Starting Job
> 15/07/13 17:27:05 INFO impl.TimelineClientImpl: Timeline service address: http://piripiri1.labs.teradata.com:8188/ws/v1/timeline/
> 15/07/13 17:27:05 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 140 for tdatuser on ha-hdfs:PIRIPIRI
> 15/07/13 17:27:05 INFO security.TokenCache: Got dt for hdfs://PIRIPIRI; Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:PIRIPIRI, Ident: (HDFS_DELEGATION_TOKEN token 140 for tdatuser)
> 15/07/13 17:27:06 INFO input.FileInputFormat: Total input paths to process : 16
> 15/07/13 17:27:06 INFO mapreduce.JobSubmitter: number of splits:16
> 15/07/13 17:27:06 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1436822321287_0007
> 15/07/13 17:27:06 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:PIRIPIRI, Ident: (HDFS_DELEGATION_TOKEN token 140 for tdatuser)
> 15/07/13 17:27:06 INFO impl.YarnClientImpl: Submitted application application_1436822321287_0007
> 15/07/13 17:27:06 INFO mapreduce.Job: The url to track the job: http://piripiri2.labs.teradata.com:8088/proxy/application_1436822321287_0007/
> 15/07/13 17:27:06 INFO mapreduce.Job: Running job: job_1436822321287_0007
> 15/07/13 17:27:09 INFO mapreduce.Job: Job job_1436822321287_0007 running in uber mode : false
> 15/07/13 17:27:09 INFO mapreduce.Job: map 0% reduce 0%
> 15/07/13 17:27:09 INFO mapreduce.Job: Job job_1436822321287_0007 failed with state FAILED due to: Application application_1436822321287_0007 failed 2 times due to AM Container for appattempt_1436822321287_0007_000002 exited with exitCode: -1000
> For more detailed output, check application tracking page:http://piripiri2.labs.teradata.com:8088/cluster/app/application_1436822321287_0007Then, click on links to logs of each attempt.
> Diagnostics: Application application_1436822321287_0007 initialization failed (exitCode=255) with output: main : command provided 0
> main : run as user is tdatuser
> main : requested yarn user is tdatuser
> Can't create directory /data1/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data2/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data3/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data4/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data5/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data6/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data7/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data8/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data9/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data10/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data11/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Can't create directory /data12/hadoop/yarn/local/usercache/tdatuser/appcache/application_1436822321287_0007 - Permission denied
> Did not create any app directories
> Failing this attempt. Failing the application.
> 15/07/13 17:27:09 INFO mapreduce.Job: Counters: 0
> Job Finished in 4.748 seconds
> java.io.FileNotFoundException: File does not exist: hdfs://PIRIPIRI/user/tdatuser/QuasiMonteCarlo_1436822823095_2120947622/out/reduce-out
> at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
> at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1752)
> at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1776)
> at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)
> at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:354)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:363)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {code}
> As seen above there are many "Can't create directory... Permission denied errors" related to the local usercache directory for the 'tdatuser'.
> Prior to enabling Kerberos, the contents of a usercache directory was as follows:
> {code}
> piripiri4:~ # ls -l /data1/hadoop/yarn/local/usercache/
> total 0
> drwxr-xr-x 3 yarn hadoop 21 Jul 13 16:59 ambari-qa
> drwxr-x--- 4 yarn hadoop 37 Jul 13 17:00 tdatuser
> {code}
> After enabling Kerberos the contents are:
> {code}
> piripiri4:~ # ls -l /data1/hadoop/yarn/local/usercache/
> total 0
> drwxr-s--- 4 ambari-qa hadoop 37 Jul 13 17:21 ambari-qa
> drwxr-x--- 4 yarn hadoop 37 Jul 13 17:00 tdatuser
> {code}
> It appears that the owner of the usercache directory for the 'ambari-qa' user was updated, but the 'tdatuser' directory was not.
> Is this expected behavior, and is there a recommended work-around for this issue?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)