You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Steve Gao <st...@yahoo.com> on 2008/10/16 01:25:15 UTC

How to change number of mappers in Hadoop streaming?

Is there a way to change number of mappers in Hadoop streaming command line?
I know I can change hadoop-default.xml:

<property>
  <name>mapred.map.tasks</name>
  <value>10</value>
  <description>The default number of map tasks per job.  Typically set
  to a prime several times greater than number of available hosts.
  Ignored when mapred.job.tracker is "local".
  </description>
</property>

But that's for all jobs. What if I just want each job has different NUM_OF_Mappers themselves? Thanks




      

Re: Can you specify the user on the map-reduce cluster in Hadoop streaming

Posted by Allen Wittenauer <aw...@yahoo-inc.com>.


On 11/10/08 12:21 PM, "Rick Hangartner" <ha...@strands.com> wrote:
>  But is there a proper way to allow developers to specify a <remote_username>
> they legitimately have access to on the cluster if it is not the same
> as the <local_username> of the account on their own machine they are
> using to submit a streaming job without setting HDFS permissions to
> 777? 

    There are ways that the Hadoop "security" as currently implemented can
be bypassed. If you really want to know how, that's probably better not
asked on a public list. ;)

    But I'm curious as to your actual use case.

    From what I can gather from your description, there are two possible
solutions, depending upon what you're looking to accomplish:

A) Turn off permissions

B) Create a group and make the output directory group writable

    We use B a lot.  We don't use A at all.


Can you specify the user on the map-reduce cluster in Hadoop streaming

Posted by Rick Hangartner <ha...@strands.com>.
Hi,

To make a Hadoop/MapReduce available for developers to experiment  
with, we are setting up a cluster with Hadoop/MapReduce and a dataset,  
and providing instructions how developers can use streaming to submit  
jobs from their own machines.

For purposes of explanation here, we can assume each user has access  
to their own login account on every node in the cluster and has a  
"home" directory set up for them in the HDFS (e.g. Unix/Linux login:  
<remote_username> on the nodes, and a directory /user/ 
<remote_username> on the HDFS with permissions 755).

What we've discovered is that if developers are working in the  
<local_username> on his or her machine, and  
<local_username>=<remote_username> of their account on the cluster,  
streaming works just fine.  If the <local_username> and  
<remote_username> don't match, though, when when they specify "-input"  
and "-output" should come from and go to files in the  
<remote_username> directory on HDFS, they get an:

"org.apache.hadoop.fs.permission.AccessControlException:  
org.apache.hadoop.fs.permission.AccessControlException: Permission  
denied: user=<local_username>, access=WRITE,  
inode="<remote_username>":<remote_username>:users:rwxr-xr-x"

exception.

Specifying "-jobconf user.name=<remote_username>" with the streaming  
command on doesn't seem to help.

Of course, this all makes sense from a security viewpoint and from how  
I think I understand HDFS derives permissions from the OS.  But is  
there a proper way to allow developers to specify a <remote_username>  
they legitimately have access to on the cluster if it is not the same  
as the <local_username> of the account on their own machine they are  
using to submit a streaming job without setting HDFS permissions to  
777?  I've search the documentation and email list for this info and  
perhaps have overlooked the answer to this question, apologies in  
advance if I have.

Thanks.

Best regards,
Rick Hangartner



Re: How to change number of mappers in Hadoop streaming?

Posted by Erik Holstad <er...@gmail.com>.
Hi Steve!
I you can pass -jobconf mapred.map.tasks=$MAPPERS  -jobconf
mapred.reduce.tasks=$REDUCERS
to the streaming job to set the number of reducers and mappers.

Regards Erik

On Wed, Oct 15, 2008 at 4:25 PM, Steve Gao <st...@yahoo.com> wrote:

> Is there a way to change number of mappers in Hadoop streaming command
> line?
> I know I can change hadoop-default.xml:
>
> <property>
>   <name>mapred.map.tasks</name>
>   <value>10</value>
>   <description>The default number of map tasks per job.  Typically set
>   to a prime several times greater than number of available hosts.
>   Ignored when mapred.job.tracker is "local".
>   </description>
> </property>
>
> But that's for all jobs. What if I just want each job has different
> NUM_OF_Mappers themselves? Thanks
>
>
>
>
>