You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rick Hangartner <ha...@strands.com> on 2008/11/10 21:21:52 UTC
Can you specify the user on the map-reduce cluster in Hadoop streaming
Hi,
To make a Hadoop/MapReduce available for developers to experiment
with, we are setting up a cluster with Hadoop/MapReduce and a dataset,
and providing instructions how developers can use streaming to submit
jobs from their own machines.
For purposes of explanation here, we can assume each user has access
to their own login account on every node in the cluster and has a
"home" directory set up for them in the HDFS (e.g. Unix/Linux login:
<remote_username> on the nodes, and a directory /user/
<remote_username> on the HDFS with permissions 755).
What we've discovered is that if developers are working in the
<local_username> on his or her machine, and
<local_username>=<remote_username> of their account on the cluster,
streaming works just fine. If the <local_username> and
<remote_username> don't match, though, when when they specify "-input"
and "-output" should come from and go to files in the
<remote_username> directory on HDFS, they get an:
"org.apache.hadoop.fs.permission.AccessControlException:
org.apache.hadoop.fs.permission.AccessControlException: Permission
denied: user=<local_username>, access=WRITE,
inode="<remote_username>":<remote_username>:users:rwxr-xr-x"
exception.
Specifying "-jobconf user.name=<remote_username>" with the streaming
command on doesn't seem to help.
Of course, this all makes sense from a security viewpoint and from how
I think I understand HDFS derives permissions from the OS. But is
there a proper way to allow developers to specify a <remote_username>
they legitimately have access to on the cluster if it is not the same
as the <local_username> of the account on their own machine they are
using to submit a streaming job without setting HDFS permissions to
777? I've search the documentation and email list for this info and
perhaps have overlooked the answer to this question, apologies in
advance if I have.
Thanks.
Best regards,
Rick Hangartner
Re: Can you specify the user on the map-reduce cluster in Hadoop
streaming
Posted by Allen Wittenauer <aw...@yahoo-inc.com>.
On 11/10/08 12:21 PM, "Rick Hangartner" <ha...@strands.com> wrote:
> But is there a proper way to allow developers to specify a <remote_username>
> they legitimately have access to on the cluster if it is not the same
> as the <local_username> of the account on their own machine they are
> using to submit a streaming job without setting HDFS permissions to
> 777?
There are ways that the Hadoop "security" as currently implemented can
be bypassed. If you really want to know how, that's probably better not
asked on a public list. ;)
But I'm curious as to your actual use case.
From what I can gather from your description, there are two possible
solutions, depending upon what you're looking to accomplish:
A) Turn off permissions
B) Create a group and make the output directory group writable
We use B a lot. We don't use A at all.