You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Andy Li <an...@gmail.com> on 2008/02/28 20:39:46 UTC

MapReduce - JobClient create new folders with superuser permission and owner:group instead of the account that runs it

I have encountered the same problem when running the MapReduce code as a
different user name.
This issue was brought up in the core-dev mailing list, but I didn't see any
work around or solution.
Therefore, I would like to bring up this topic again to gain some input.

Sorry for cross posting, but not sure if this also belongs to the core-user
mailing list.

For example, assuming I have installed Hadoop with an account 'hadoop' and I
am going to run my
program with user account 'test'.
I have created an input folder as /user/test/input/ with user 'test' and the
permission is set to 0775.
/user/test/input      <dir>           2008-02-27 01:20
rwxr-xr-x       test  hadoop

When I run the MapReduce code, the output I specified will be set to user
'hadoop' instead of 'test'.
${HADOOP_HOME}/bin/hadoop jar /tmp/test_perm.jar -m 57 -r 3
"/user/test/input/l" "/user/test/output/"

The directory "/user/test/output/" will have the following permission and
user:group.
/user/test/output    <dir>           2008-02-27 03:53        rwxr-xr-x
hadoop  hadoop

My question will be - Why is the output folder set to the super user
'hadoop' ?
and of course, the MapReduce code cannot access this folder because the
permission does not allow user 'test'
to write to this folder.  So the output folder was created, but the user
account 'test' cannot write anything to this
folder and therefore threw an exception.  See the following for the
exception.

I have been looking for solution to solve this, but cannot find an exact
answer.
How do I set the default umask to 0775?  I can add the user 'test' to group
'hadoop' so
the user 'test' can have write access to the folder within 'hadoop' group.
In other word, as long as the folder is set to 'rwxrwxr-x', user 'test' can
read/write to the
folder and share the folder with 'hadoop:hadoop'.  Any idea how I can set or
modify the global
default umask for Hadoop?  or do I have to always override the default umask
value in my configuration
or FileSystem?

======= COPY/PASTE STARTS HERE =======
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.fs.permission.AccessControlException: Permission denied:
user=test, access=WRITE,
inode="_task_200802262256_0007_r_000001_1":hadoop:hadoop:rwxr-xr-x
        at org.apache.hadoop.dfs.PermissionChecker.check(
PermissionChecker.java:173)
        at org.apache.hadoop.dfs.PermissionChecker.check(
PermissionChecker.java:154)
        at org.apache.hadoop.dfs.PermissionChecker.checkPermission(
PermissionChecker.java:102)
        at org.apache.hadoop.dfs.FSNamesystem.checkPermission(
FSNamesystem.java:4035)
        at org.apache.hadoop.dfs.FSNamesystem.checkAncestorAccess(
FSNamesystem.java:4005)
        at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(
FSNamesystem.java:963)
        at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java
:938)
        at org.apache.hadoop.dfs.NameNode.create(NameNode.java:281)
        at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:409)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:899)

        at org.apache.hadoop.ipc.Client.call(Client.java:512)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:198)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(
RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(
RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy1.create(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.<init>(
DFSClient.java:1927)
        at org.apache.hadoop.dfs.DFSClient.create(DFSClient.java:382)
        at org.apache.hadoop.dfs.DistributedFileSystem.create(
DistributedFileSystem.java:135)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:436)
        at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:336)
        at org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(
TextOutputFormat.java:116)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:308)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
:2089)
======= COPY/PASTE ENDS HERE =======