You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Rick Hangartner <ha...@strands.com> on 2008/11/10 21:21:52 UTC

Can you specify the user on the map-reduce cluster in Hadoop streaming

Hi,

To make a Hadoop/MapReduce available for developers to experiment  
with, we are setting up a cluster with Hadoop/MapReduce and a dataset,  
and providing instructions how developers can use streaming to submit  
jobs from their own machines.

For purposes of explanation here, we can assume each user has access  
to their own login account on every node in the cluster and has a  
"home" directory set up for them in the HDFS (e.g. Unix/Linux login:  
<remote_username> on the nodes, and a directory /user/ 
<remote_username> on the HDFS with permissions 755).

What we've discovered is that if developers are working in the  
<local_username> on his or her machine, and  
<local_username>=<remote_username> of their account on the cluster,  
streaming works just fine.  If the <local_username> and  
<remote_username> don't match, though, when when they specify "-input"  
and "-output" should come from and go to files in the  
<remote_username> directory on HDFS, they get an:

"org.apache.hadoop.fs.permission.AccessControlException:  
org.apache.hadoop.fs.permission.AccessControlException: Permission  
denied: user=<local_username>, access=WRITE,  
inode="<remote_username>":<remote_username>:users:rwxr-xr-x"

exception.

Specifying "-jobconf user.name=<remote_username>" with the streaming  
command on doesn't seem to help.

Of course, this all makes sense from a security viewpoint and from how  
I think I understand HDFS derives permissions from the OS.  But is  
there a proper way to allow developers to specify a <remote_username>  
they legitimately have access to on the cluster if it is not the same  
as the <local_username> of the account on their own machine they are  
using to submit a streaming job without setting HDFS permissions to  
777?  I've search the documentation and email list for this info and  
perhaps have overlooked the answer to this question, apologies in  
advance if I have.

Thanks.

Best regards,
Rick Hangartner



Re: Can you specify the user on the map-reduce cluster in Hadoop streaming

Posted by Allen Wittenauer <aw...@yahoo-inc.com>.


On 11/10/08 12:21 PM, "Rick Hangartner" <ha...@strands.com> wrote:
>  But is there a proper way to allow developers to specify a <remote_username>
> they legitimately have access to on the cluster if it is not the same
> as the <local_username> of the account on their own machine they are
> using to submit a streaming job without setting HDFS permissions to
> 777? 

    There are ways that the Hadoop "security" as currently implemented can
be bypassed. If you really want to know how, that's probably better not
asked on a public list. ;)

    But I'm curious as to your actual use case.

    From what I can gather from your description, there are two possible
solutions, depending upon what you're looking to accomplish:

A) Turn off permissions

B) Create a group and make the output directory group writable

    We use B a lot.  We don't use A at all.