You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by stephen mulcahy <st...@deri.org> on 2011/11/09 19:13:07 UTC

hadoop 0.20.205.0 multi-user cluster

Hi,

I've just installed a small hadoop 0.20.205.0 cluster (4 nodes) and am 
trying to configure it such that different users can run jobs on it 
(rather than having everyone submit jobs as the superuser).

Using a pretty standard config, I've found I need to make the following 
changes before users can successfully submit jobs.

hadoop/bin/hadoop fs -chmod 777 /tmp/hadoop/mapred/staging

and

hadoop/bin/hadoop fs -chmod 777 /hadoop/mapred/system

where /hadoop/mapred/system is my mapred.system.dir

The second one seems to be required everytime I restart the cluster.

Are these to be expected on a multi-user cluster or am I missing something?

If these aren't specified, user jobs fail with an Exception like

11/11/09 16:32:53 INFO mapred.FileInputFormat: Total input paths to 
process : 2
11/11/09 16:32:53 INFO mapred.JobClient: Running job: job_201111091731_0003
11/11/09 16:32:54 INFO mapred.JobClient:  map 0% reduce 0%
11/11/09 16:32:54 INFO mapred.JobClient: Job complete: job_201111091731_0003
11/11/09 16:32:54 INFO mapred.JobClient: Counters: 0
11/11/09 16:32:54 INFO mapred.JobClient: Job Failed: Job initialization 
failed:
org.apache.hadoop.security.AccessControlException: 
org.apache.hadoop.security.AccessControlException: Permission denied: 
user=smulcahy, access=EXECUTE, inode="system":hadoop:supergroup:rwx------
.....


Thanks,

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

hadoop 0.20.205.0 conf/ versus etc/hadoop

Posted by stephen mulcahy <st...@deri.org>.
Why is there a conf/ directory and an etc/hadoop directory in the 
distributed tar-file? They seem to contain the same files?

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: hadoop 0.20.205.0 multi-user cluster

Posted by stephen mulcahy <st...@deri.org>.
On 16/11/11 14:52, stephen mulcahy wrote:
>
> So, digging further - hadoop seems to want to create a file
>
> <mapred.system.dir>/<job id>/jobToken
>
> for each job I submit.
>
> I assume this file is related to the new security stuff. Can I disable
> this activity until I require the security functionality or can I get
> this file created somewhere else?
>
> Or should the permissions enforced by the JobTracker on
> mapred.system.dir be changed if this is always required?

I've explicitly set the following

<property>
   <name>hadoop.security.authorization</name>
   <value>false</value>
</property>

<property>
   <name>hadoop.security.authentication</name>
   <value>simple</value>
</property>

and the jobToken file is still created so every-user seems to need 
access to this directory.

Should I open a bug about this behaviour or is what I'm trying to do 
unsupported?

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: hadoop 0.20.205.0 multi-user cluster

Posted by stephen mulcahy <st...@deri.org>.
So, digging further - hadoop seems to want to create a file

<mapred.system.dir>/<job id>/jobToken

for each job I submit.

I assume this file is related to the new security stuff. Can I disable 
this activity until I require the security functionality or can I get 
this file created somewhere else?

Or should the permissions enforced by the JobTracker on 
mapred.system.dir be changed if this is always required?

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: hadoop 0.20.205.0 multi-user cluster

Posted by stephen mulcahy <st...@deri.org>.
On 16/11/11 14:07, stephen mulcahy wrote:
> On 14/11/11 20:46, Raj V wrote:
>> Hi Stephen
>>
>> THis is probably happening during jobtracker start. Can you provide
>> any relevant logs from the task tracker log fiile?
>
> You are correct, there is even a helpful message
>
> 2011-11-16 15:05:58,076 WARN org.apache.hadoop.mapred.JobTracker:
> Incorrect permissions on hdfs://testXXXX/hadoop/mapred/system. Setting
> it to rwx------
>
> Is there a reason for this policy? And how does that fit with multi-user
> hadoop?

Seems to have been introduced in 
https://issues.apache.org/jira/browse/MAPREDUCE-2219 .

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: hadoop 0.20.205.0 multi-user cluster

Posted by stephen mulcahy <st...@deri.org>.
On 14/11/11 20:46, Raj V wrote:
> Hi Stephen
>
> THis is probably happening during jobtracker start. Can you provide any relevant logs from the task tracker log fiile?

You are correct, there is even a helpful message

2011-11-16 15:05:58,076 WARN org.apache.hadoop.mapred.JobTracker: 
Incorrect permissions on hdfs://testXXXX/hadoop/mapred/system. Setting 
it to rwx------

Is there a reason for this policy? And how does that fit with multi-user 
hadoop?

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: hadoop 0.20.205.0 multi-user cluster

Posted by Raj V <ra...@yahoo.com>.
Hi Stephen

THis is probably happening during jobtracker start. Can you provide any relevant logs from the task tracker log fiile?

Raj



>________________________________
>From: stephen mulcahy <st...@deri.org>
>To: common-user@hadoop.apache.org
>Sent: Monday, November 14, 2011 5:33 AM
>Subject: Re: hadoop 0.20.205.0 multi-user cluster
>
>On 14/11/11 09:38, stephen mulcahy wrote:
>> Hi Raj,
>>
>> Thanks for your reply, comments below.
>>
>> On 09/11/11 18:45, Raj V wrote:
>>> Can you try the following?
>>>
>>> - Change the permisson to 775 for /hadoop/mapred/system
>
>As per the previous problem, the permissions still get reset on cluster 
>restart.
>
>Am I the only one trying to use the cluster in this way?
>Is everyone else submitting all jobs as a single user or using the full 
>authentication support?
>
>-stephen
>
>-- 
>Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
>NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
>http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com
>
>
>

RE: setting up eclipse env for hadoop

Posted by Uma Maheswara Rao G <ma...@huawei.com>.
You are right.
________________________________________
From: Tim Broberg [Tim.Broberg@exar.com]
Sent: Tuesday, November 15, 2011 1:02 AM
To: common-user@hadoop.apache.org
Subject: RE: setting up eclipse env for hadoop

The ant steps for building the eclipse plugin are replaced by "mvn eclipse:eclipse," for versions 0.23+, correct?

________________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, November 14, 2011 10:11 AM
To: common-user@hadoop.apache.org
Subject: RE: setting up eclipse env for hadoop

Yes, you can follow that.
  mvn eclipse:eclipse will generate eclipse related files. After that directly import in your eclipse.
note: Repository links need to update. hdfs and mapreduce are moved inside to common folder.

Regatrds,
Uma
________________________________________
From: Amir Sanjar [v1sanjar@us.ibm.com]
Sent: Monday, November 14, 2011 9:07 PM
To: common-user@hadoop.apache.org
Subject: setting up eclipse env for hadoop

I am trying to build hadoop-trunk using eclipse, is this
http://wiki.apache.org/hadoop/EclipseEnvironment the latest document?

Best Regards
Amir Sanjar

Linux System Management Architect and Lead
IBM Senior Software Engineer
Phone# 512-286-8393
Fax#      512-838-8858

The information and any attached documents contained in this message
may be confidential and/or legally privileged.  The message is
intended solely for the addressee(s).  If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful.  If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.

RE: setting up eclipse env for hadoop

Posted by Tim Broberg <Ti...@exar.com>.
The ant steps for building the eclipse plugin are replaced by "mvn eclipse:eclipse," for versions 0.23+, correct?

________________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, November 14, 2011 10:11 AM
To: common-user@hadoop.apache.org
Subject: RE: setting up eclipse env for hadoop

Yes, you can follow that.
  mvn eclipse:eclipse will generate eclipse related files. After that directly import in your eclipse.
note: Repository links need to update. hdfs and mapreduce are moved inside to common folder.

Regatrds,
Uma
________________________________________
From: Amir Sanjar [v1sanjar@us.ibm.com]
Sent: Monday, November 14, 2011 9:07 PM
To: common-user@hadoop.apache.org
Subject: setting up eclipse env for hadoop

I am trying to build hadoop-trunk using eclipse, is this
http://wiki.apache.org/hadoop/EclipseEnvironment the latest document?

Best Regards
Amir Sanjar

Linux System Management Architect and Lead
IBM Senior Software Engineer
Phone# 512-286-8393
Fax#      512-838-8858

The information and any attached documents contained in this message
may be confidential and/or legally privileged.  The message is
intended solely for the addressee(s).  If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful.  If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.

RE: setting up eclipse env for hadoop

Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Yes, you can follow that. 
  mvn eclipse:eclipse will generate eclipse related files. After that directly import in your eclipse.
note: Repository links need to update. hdfs and mapreduce are moved inside to common folder.

Regatrds,
Uma
________________________________________
From: Amir Sanjar [v1sanjar@us.ibm.com]
Sent: Monday, November 14, 2011 9:07 PM
To: common-user@hadoop.apache.org
Subject: setting up eclipse env for hadoop

I am trying to build hadoop-trunk using eclipse, is this
http://wiki.apache.org/hadoop/EclipseEnvironment the latest document?

Best Regards
Amir Sanjar

Linux System Management Architect and Lead
IBM Senior Software Engineer
Phone# 512-286-8393
Fax#      512-838-8858

setting up eclipse env for hadoop

Posted by Amir Sanjar <v1...@us.ibm.com>.
I am trying to build hadoop-trunk using eclipse, is this
http://wiki.apache.org/hadoop/EclipseEnvironment the latest document?

Best Regards
Amir Sanjar

Linux System Management Architect and Lead
IBM Senior Software Engineer
Phone# 512-286-8393
Fax#      512-838-8858

Re: hadoop 0.20.205.0 multi-user cluster

Posted by stephen mulcahy <st...@deri.org>.
On 14/11/11 15:31, Shi Jin wrote:
> I am guessing that /tmp is reset upon cluster restart. Maybe try to use
> a persistent directory.

Thanks for the suggestion but /tmp will only be reset on server reboot - 
not cluster restart (I'm talking about running stop-all.sh and 
start-all.sh, not a full reboot).

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: hadoop 0.20.205.0 multi-user cluster

Posted by Shi Jin <ji...@gmail.com>.
I am guessing that /tmp is reset upon cluster restart. Maybe try to use
a persistent directory.

Shi

On Mon, Nov 14, 2011 at 6:33 AM, stephen mulcahy
<st...@deri.org>wrote:

> On 14/11/11 09:38, stephen mulcahy wrote:
>
>> Hi Raj,
>>
>> Thanks for your reply, comments below.
>>
>> On 09/11/11 18:45, Raj V wrote:
>>
>>> Can you try the following?
>>>
>>> - Change the permisson to 775 for /hadoop/mapred/system
>>>
>>
> As per the previous problem, the permissions still get reset on cluster
> restart.
>
> Am I the only one trying to use the cluster in this way?
> Is everyone else submitting all jobs as a single user or using the full
> authentication support?
>
>
> -stephen
>
> --
> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
> http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com
>



-- 
Shi Jin, Ph.D.

Re: hadoop 0.20.205.0 multi-user cluster

Posted by stephen mulcahy <st...@deri.org>.
On 14/11/11 09:38, stephen mulcahy wrote:
> Hi Raj,
>
> Thanks for your reply, comments below.
>
> On 09/11/11 18:45, Raj V wrote:
>> Can you try the following?
>>
>> - Change the permisson to 775 for /hadoop/mapred/system

As per the previous problem, the permissions still get reset on cluster 
restart.

Am I the only one trying to use the cluster in this way?
Is everyone else submitting all jobs as a single user or using the full 
authentication support?

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: hadoop 0.20.205.0 multi-user cluster

Posted by stephen mulcahy <st...@deri.org>.
Hi Raj,

Thanks for your reply, comments below.

On 09/11/11 18:45, Raj V wrote:
> Can you try the following?
>
> - Change the permisson to 775 for  /hadoop/mapred/system

Done.

> - Change the group to hadoop

Done.

> - Make all users who need to submit hadoop jobs a part of the hadoop group.

The users are remote users. Do I need to create accounts on the hadoop 
cluster for those users to add them to the hadoop group or how should 
this work?

Thanks,

-stephen

-- 
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com

Re: hadoop 0.20.205.0 multi-user cluster

Posted by Raj V <ra...@yahoo.com>.
Can you try the following?

- Change the permisson to 775 for  /hadoop/mapred/system
- Change the group to hadoop 
- Make all users who need to submit hadoop jobs a part of the hadoop group.





>________________________________
>From: stephen mulcahy <st...@deri.org>
>To: common-user@hadoop.apache.org
>Sent: Wednesday, November 9, 2011 10:13 AM
>Subject: hadoop 0.20.205.0 multi-user cluster
>
>Hi,
>
>I've just installed a small hadoop 0.20.205.0 cluster (4 nodes) and am trying to configure it such that different users can run jobs on it (rather than having everyone submit jobs as the superuser).
>
>Using a pretty standard config, I've found I need to make the following changes before users can successfully submit jobs.
>
>hadoop/bin/hadoop fs -chmod 777 /tmp/hadoop/mapred/staging
>
>and
>
>hadoop/bin/hadoop fs -chmod 777 /hadoop/mapred/system
>
>where /hadoop/mapred/system is my mapred.system.dir
>
>The second one seems to be required everytime I restart the cluster.
>
>Are these to be expected on a multi-user cluster or am I missing something?
>
>If these aren't specified, user jobs fail with an Exception like
>
>11/11/09 16:32:53 INFO mapred.FileInputFormat: Total input paths to process : 2
>11/11/09 16:32:53 INFO mapred.JobClient: Running job: job_201111091731_0003
>11/11/09 16:32:54 INFO mapred.JobClient:  map 0% reduce 0%
>11/11/09 16:32:54 INFO mapred.JobClient: Job complete: job_201111091731_0003
>11/11/09 16:32:54 INFO mapred.JobClient: Counters: 0
>11/11/09 16:32:54 INFO mapred.JobClient: Job Failed: Job initialization failed:
>org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=smulcahy, access=EXECUTE, inode="system":hadoop:supergroup:rwx------
>.....
>
>
>Thanks,
>
>-stephen
>
>-- Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
>NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
>http://di2.deri.ie    http://webstar.deri.ie    http://sindice.com
>
>
>