You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by stephen mulcahy <st...@deri.org> on 2011/11/09 19:13:07 UTC
hadoop 0.20.205.0 multi-user cluster
Hi,
I've just installed a small hadoop 0.20.205.0 cluster (4 nodes) and am
trying to configure it such that different users can run jobs on it
(rather than having everyone submit jobs as the superuser).
Using a pretty standard config, I've found I need to make the following
changes before users can successfully submit jobs.
hadoop/bin/hadoop fs -chmod 777 /tmp/hadoop/mapred/staging
and
hadoop/bin/hadoop fs -chmod 777 /hadoop/mapred/system
where /hadoop/mapred/system is my mapred.system.dir
The second one seems to be required everytime I restart the cluster.
Are these to be expected on a multi-user cluster or am I missing something?
If these aren't specified, user jobs fail with an Exception like
11/11/09 16:32:53 INFO mapred.FileInputFormat: Total input paths to
process : 2
11/11/09 16:32:53 INFO mapred.JobClient: Running job: job_201111091731_0003
11/11/09 16:32:54 INFO mapred.JobClient: map 0% reduce 0%
11/11/09 16:32:54 INFO mapred.JobClient: Job complete: job_201111091731_0003
11/11/09 16:32:54 INFO mapred.JobClient: Counters: 0
11/11/09 16:32:54 INFO mapred.JobClient: Job Failed: Job initialization
failed:
org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=smulcahy, access=EXECUTE, inode="system":hadoop:supergroup:rwx------
.....
Thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
hadoop 0.20.205.0 conf/ versus etc/hadoop
Posted by stephen mulcahy <st...@deri.org>.
Why is there a conf/ directory and an etc/hadoop directory in the
distributed tar-file? They seem to contain the same files?
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
Re: hadoop 0.20.205.0 multi-user cluster
Posted by stephen mulcahy <st...@deri.org>.
On 16/11/11 14:52, stephen mulcahy wrote:
>
> So, digging further - hadoop seems to want to create a file
>
> <mapred.system.dir>/<job id>/jobToken
>
> for each job I submit.
>
> I assume this file is related to the new security stuff. Can I disable
> this activity until I require the security functionality or can I get
> this file created somewhere else?
>
> Or should the permissions enforced by the JobTracker on
> mapred.system.dir be changed if this is always required?
I've explicitly set the following
<property>
<name>hadoop.security.authorization</name>
<value>false</value>
</property>
<property>
<name>hadoop.security.authentication</name>
<value>simple</value>
</property>
and the jobToken file is still created so every-user seems to need
access to this directory.
Should I open a bug about this behaviour or is what I'm trying to do
unsupported?
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
Re: hadoop 0.20.205.0 multi-user cluster
Posted by stephen mulcahy <st...@deri.org>.
So, digging further - hadoop seems to want to create a file
<mapred.system.dir>/<job id>/jobToken
for each job I submit.
I assume this file is related to the new security stuff. Can I disable
this activity until I require the security functionality or can I get
this file created somewhere else?
Or should the permissions enforced by the JobTracker on
mapred.system.dir be changed if this is always required?
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
Re: hadoop 0.20.205.0 multi-user cluster
Posted by stephen mulcahy <st...@deri.org>.
On 16/11/11 14:07, stephen mulcahy wrote:
> On 14/11/11 20:46, Raj V wrote:
>> Hi Stephen
>>
>> THis is probably happening during jobtracker start. Can you provide
>> any relevant logs from the task tracker log fiile?
>
> You are correct, there is even a helpful message
>
> 2011-11-16 15:05:58,076 WARN org.apache.hadoop.mapred.JobTracker:
> Incorrect permissions on hdfs://testXXXX/hadoop/mapred/system. Setting
> it to rwx------
>
> Is there a reason for this policy? And how does that fit with multi-user
> hadoop?
Seems to have been introduced in
https://issues.apache.org/jira/browse/MAPREDUCE-2219 .
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
Re: hadoop 0.20.205.0 multi-user cluster
Posted by stephen mulcahy <st...@deri.org>.
On 14/11/11 20:46, Raj V wrote:
> Hi Stephen
>
> THis is probably happening during jobtracker start. Can you provide any relevant logs from the task tracker log fiile?
You are correct, there is even a helpful message
2011-11-16 15:05:58,076 WARN org.apache.hadoop.mapred.JobTracker:
Incorrect permissions on hdfs://testXXXX/hadoop/mapred/system. Setting
it to rwx------
Is there a reason for this policy? And how does that fit with multi-user
hadoop?
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
Re: hadoop 0.20.205.0 multi-user cluster
Posted by Raj V <ra...@yahoo.com>.
Hi Stephen
THis is probably happening during jobtracker start. Can you provide any relevant logs from the task tracker log fiile?
Raj
>________________________________
>From: stephen mulcahy <st...@deri.org>
>To: common-user@hadoop.apache.org
>Sent: Monday, November 14, 2011 5:33 AM
>Subject: Re: hadoop 0.20.205.0 multi-user cluster
>
>On 14/11/11 09:38, stephen mulcahy wrote:
>> Hi Raj,
>>
>> Thanks for your reply, comments below.
>>
>> On 09/11/11 18:45, Raj V wrote:
>>> Can you try the following?
>>>
>>> - Change the permisson to 775 for /hadoop/mapred/system
>
>As per the previous problem, the permissions still get reset on cluster
>restart.
>
>Am I the only one trying to use the cluster in this way?
>Is everyone else submitting all jobs as a single user or using the full
>authentication support?
>
>-stephen
>
>--
>Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
>NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
>http://di2.deri.ie http://webstar.deri.ie http://sindice.com
>
>
>
RE: setting up eclipse env for hadoop
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
You are right.
________________________________________
From: Tim Broberg [Tim.Broberg@exar.com]
Sent: Tuesday, November 15, 2011 1:02 AM
To: common-user@hadoop.apache.org
Subject: RE: setting up eclipse env for hadoop
The ant steps for building the eclipse plugin are replaced by "mvn eclipse:eclipse," for versions 0.23+, correct?
________________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, November 14, 2011 10:11 AM
To: common-user@hadoop.apache.org
Subject: RE: setting up eclipse env for hadoop
Yes, you can follow that.
mvn eclipse:eclipse will generate eclipse related files. After that directly import in your eclipse.
note: Repository links need to update. hdfs and mapreduce are moved inside to common folder.
Regatrds,
Uma
________________________________________
From: Amir Sanjar [v1sanjar@us.ibm.com]
Sent: Monday, November 14, 2011 9:07 PM
To: common-user@hadoop.apache.org
Subject: setting up eclipse env for hadoop
I am trying to build hadoop-trunk using eclipse, is this
http://wiki.apache.org/hadoop/EclipseEnvironment the latest document?
Best Regards
Amir Sanjar
Linux System Management Architect and Lead
IBM Senior Software Engineer
Phone# 512-286-8393
Fax# 512-838-8858
The information and any attached documents contained in this message
may be confidential and/or legally privileged. The message is
intended solely for the addressee(s). If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful. If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.
RE: setting up eclipse env for hadoop
Posted by Tim Broberg <Ti...@exar.com>.
The ant steps for building the eclipse plugin are replaced by "mvn eclipse:eclipse," for versions 0.23+, correct?
________________________________________
From: Uma Maheswara Rao G [maheswara@huawei.com]
Sent: Monday, November 14, 2011 10:11 AM
To: common-user@hadoop.apache.org
Subject: RE: setting up eclipse env for hadoop
Yes, you can follow that.
mvn eclipse:eclipse will generate eclipse related files. After that directly import in your eclipse.
note: Repository links need to update. hdfs and mapreduce are moved inside to common folder.
Regatrds,
Uma
________________________________________
From: Amir Sanjar [v1sanjar@us.ibm.com]
Sent: Monday, November 14, 2011 9:07 PM
To: common-user@hadoop.apache.org
Subject: setting up eclipse env for hadoop
I am trying to build hadoop-trunk using eclipse, is this
http://wiki.apache.org/hadoop/EclipseEnvironment the latest document?
Best Regards
Amir Sanjar
Linux System Management Architect and Lead
IBM Senior Software Engineer
Phone# 512-286-8393
Fax# 512-838-8858
The information and any attached documents contained in this message
may be confidential and/or legally privileged. The message is
intended solely for the addressee(s). If you are not the intended
recipient, you are hereby notified that any use, dissemination, or
reproduction is strictly prohibited and may be unlawful. If you are
not the intended recipient, please contact the sender immediately by
return e-mail and destroy all copies of the original message.
RE: setting up eclipse env for hadoop
Posted by Uma Maheswara Rao G <ma...@huawei.com>.
Yes, you can follow that.
mvn eclipse:eclipse will generate eclipse related files. After that directly import in your eclipse.
note: Repository links need to update. hdfs and mapreduce are moved inside to common folder.
Regatrds,
Uma
________________________________________
From: Amir Sanjar [v1sanjar@us.ibm.com]
Sent: Monday, November 14, 2011 9:07 PM
To: common-user@hadoop.apache.org
Subject: setting up eclipse env for hadoop
I am trying to build hadoop-trunk using eclipse, is this
http://wiki.apache.org/hadoop/EclipseEnvironment the latest document?
Best Regards
Amir Sanjar
Linux System Management Architect and Lead
IBM Senior Software Engineer
Phone# 512-286-8393
Fax# 512-838-8858
setting up eclipse env for hadoop
Posted by Amir Sanjar <v1...@us.ibm.com>.
I am trying to build hadoop-trunk using eclipse, is this
http://wiki.apache.org/hadoop/EclipseEnvironment the latest document?
Best Regards
Amir Sanjar
Linux System Management Architect and Lead
IBM Senior Software Engineer
Phone# 512-286-8393
Fax# 512-838-8858
Re: hadoop 0.20.205.0 multi-user cluster
Posted by stephen mulcahy <st...@deri.org>.
On 14/11/11 15:31, Shi Jin wrote:
> I am guessing that /tmp is reset upon cluster restart. Maybe try to use
> a persistent directory.
Thanks for the suggestion but /tmp will only be reset on server reboot -
not cluster restart (I'm talking about running stop-all.sh and
start-all.sh, not a full reboot).
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
Re: hadoop 0.20.205.0 multi-user cluster
Posted by Shi Jin <ji...@gmail.com>.
I am guessing that /tmp is reset upon cluster restart. Maybe try to use
a persistent directory.
Shi
On Mon, Nov 14, 2011 at 6:33 AM, stephen mulcahy
<st...@deri.org>wrote:
> On 14/11/11 09:38, stephen mulcahy wrote:
>
>> Hi Raj,
>>
>> Thanks for your reply, comments below.
>>
>> On 09/11/11 18:45, Raj V wrote:
>>
>>> Can you try the following?
>>>
>>> - Change the permisson to 775 for /hadoop/mapred/system
>>>
>>
> As per the previous problem, the permissions still get reset on cluster
> restart.
>
> Am I the only one trying to use the cluster in this way?
> Is everyone else submitting all jobs as a single user or using the full
> authentication support?
>
>
> -stephen
>
> --
> Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
> NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
> http://di2.deri.ie http://webstar.deri.ie http://sindice.com
>
--
Shi Jin, Ph.D.
Re: hadoop 0.20.205.0 multi-user cluster
Posted by stephen mulcahy <st...@deri.org>.
On 14/11/11 09:38, stephen mulcahy wrote:
> Hi Raj,
>
> Thanks for your reply, comments below.
>
> On 09/11/11 18:45, Raj V wrote:
>> Can you try the following?
>>
>> - Change the permisson to 775 for /hadoop/mapred/system
As per the previous problem, the permissions still get reset on cluster
restart.
Am I the only one trying to use the cluster in this way?
Is everyone else submitting all jobs as a single user or using the full
authentication support?
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
Re: hadoop 0.20.205.0 multi-user cluster
Posted by stephen mulcahy <st...@deri.org>.
Hi Raj,
Thanks for your reply, comments below.
On 09/11/11 18:45, Raj V wrote:
> Can you try the following?
>
> - Change the permisson to 775 for /hadoop/mapred/system
Done.
> - Change the group to hadoop
Done.
> - Make all users who need to submit hadoop jobs a part of the hadoop group.
The users are remote users. Do I need to create accounts on the hadoop
cluster for those users to add them to the hadoop group or how should
this work?
Thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.ie http://webstar.deri.ie http://sindice.com
Re: hadoop 0.20.205.0 multi-user cluster
Posted by Raj V <ra...@yahoo.com>.
Can you try the following?
- Change the permisson to 775 for /hadoop/mapred/system
- Change the group to hadoop
- Make all users who need to submit hadoop jobs a part of the hadoop group.
>________________________________
>From: stephen mulcahy <st...@deri.org>
>To: common-user@hadoop.apache.org
>Sent: Wednesday, November 9, 2011 10:13 AM
>Subject: hadoop 0.20.205.0 multi-user cluster
>
>Hi,
>
>I've just installed a small hadoop 0.20.205.0 cluster (4 nodes) and am trying to configure it such that different users can run jobs on it (rather than having everyone submit jobs as the superuser).
>
>Using a pretty standard config, I've found I need to make the following changes before users can successfully submit jobs.
>
>hadoop/bin/hadoop fs -chmod 777 /tmp/hadoop/mapred/staging
>
>and
>
>hadoop/bin/hadoop fs -chmod 777 /hadoop/mapred/system
>
>where /hadoop/mapred/system is my mapred.system.dir
>
>The second one seems to be required everytime I restart the cluster.
>
>Are these to be expected on a multi-user cluster or am I missing something?
>
>If these aren't specified, user jobs fail with an Exception like
>
>11/11/09 16:32:53 INFO mapred.FileInputFormat: Total input paths to process : 2
>11/11/09 16:32:53 INFO mapred.JobClient: Running job: job_201111091731_0003
>11/11/09 16:32:54 INFO mapred.JobClient: map 0% reduce 0%
>11/11/09 16:32:54 INFO mapred.JobClient: Job complete: job_201111091731_0003
>11/11/09 16:32:54 INFO mapred.JobClient: Counters: 0
>11/11/09 16:32:54 INFO mapred.JobClient: Job Failed: Job initialization failed:
>org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=smulcahy, access=EXECUTE, inode="system":hadoop:supergroup:rwx------
>.....
>
>
>Thanks,
>
>-stephen
>
>-- Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
>NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
>http://di2.deri.ie http://webstar.deri.ie http://sindice.com
>
>
>