You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Dan Milstein <dm...@hubteam.com> on 2009/05/05 16:44:22 UTC

What User Accounts Do People Use For Team Dev?

Best-practices-type question: when a single cluster is being used by a  
team of folks to run jobs, how do people on this list handle user  
accounts?

Many of the examples seem to show everything being run as root on the  
master, which is hard to imagine is a great idea.

Do you:

  - Create a distinct account for every developer who will need to run  
jobs?

  - Create a single hadoop-dev or hadoop-jobs account, have everyone  
use that?

  - Just stick with running it all as root?

Thanks,
-Dan Milstein

Re: PIG and Hive

Posted by asif md <as...@gmail.com>.
http://www.cloudera.com/hadoop-training-hive-introduction

http://www.cloudera.com/hadoop-training-pig-introduction

On Wed, May 6, 2009 at 1:17 AM, Ricky Ho <rh...@adobe.com> wrote:

> Are they competing technologies of providing a higher level language for
> Map/Reduce programming ?
>
> Or are they complementary ?
>
> Any comparison between them ?
>
> Rgds,
> Ricky
>

PIG and Hive

Posted by Ricky Ho <rh...@adobe.com>.
Are they competing technologies of providing a higher level language for Map/Reduce programming ?

Or are they complementary ?

Any comparison between them ?

Rgds,
Ricky

Re: What User Accounts Do People Use For Team Dev?

Posted by Owen O'Malley <ow...@gmail.com>.
Best is to use one user for map/reduce and another for hdfs. Neither
of them should be root or "real" users. With the setuid patch
(HADOOP-4490), it is possible to run the jobs as the submitted user.
Note that if you do that, you no doubt want to block certain system
uids (bin, mysql, etc.)

-- Owen

Re: What User Accounts Do People Use For Team Dev?

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, May 5, 2009 at 10:44 AM, Dan Milstein <dm...@hubteam.com> wrote:
> Best-practices-type question: when a single cluster is being used by a team
> of folks to run jobs, how do people on this list handle user accounts?
>
> Many of the examples seem to show everything being run as root on the
> master, which is hard to imagine is a great idea.
>
> Do you:
>
>  - Create a distinct account for every developer who will need to run jobs?
>
>  - Create a single hadoop-dev or hadoop-jobs account, have everyone use
> that?
>
>  - Just stick with running it all as root?
>
> Thanks,
> -Dan Milstein
>

This is an interesting issue. First remember that the user that starts
hadoop is considered the hadoop 'superuser'.

You probably do not want to run hadoop as root, or someone could make
an 'rm -rf /' map/reduce application and execute it across your
cluster.

We run hadoop as hadoop user. We use LDAP public key over ssh
authentication. Every user has their own account and their own home
directory /usr/<username> and /user/<username>. (hadoop)

Now the fun comes, user1 runs a process that creates files owned by
'user1'. No surprise. What happens when 'user2' needs to modify this
file?

This issue is not a hadoop issue, we have this same scenario with
people trying to share any file system in unix. On the unix side
sticky bit and umask help.

What I do is give each user the ability to login as themselves and the
team user.

passwd
hadoop:hadoop
user1:user1
user2:user2
team1:team1

group
groups team1:user1:user2

In this way the burden falls on the user to make sure the file
ownership is correct. If user1 intends for user2 to see the work they
have two options.
1) set liberal HDFS file permissions
2) start the process as team1 not 'user1'

This is suitable for a development style cluster. Some people follow
the policy that a production environment should not allow user access.
In that case only one user would exist.

passwd
hadoop:hadoop
mysoftware:mysoftware

Code that runs on that type of cluster would be deployed and run by an
automated process or configuration management system. Individual users
could not directly log in.