You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Patai Sangbutsarakum <si...@gmail.com> on 2012/10/14 02:33:05 UTC

Fair scheduler.

Is that anyway to control who can submit job to a pool.?

Eg. Pool1, can run jobs submitted from any users except userx.

Userx can submit jobs to poolx only. Can't submit to pool1.

Hope this make sense.
Patai

Re: Fair scheduler.

Posted by Luke Lu <ll...@apache.org>.

You have a different issue (in addition to MAPREDUCE-4398). setting
mapreduce.jobtracker.staging.root.dir to /user will solve the first
problem. The "magic" number of 4 is the default number of hard-coded
job init threads (mapred.jobinit.threads). You have to submit 4 or
more jobs as the jobtracker user at the same time to make sure the job
init thread are initialized as the system user so they can access the
mapred.system.dir (for security reasons, it must be 700). Otherwise,
some of the job init threads will be initialized as whatever user who
first submits a job. This can lead to seemingly more bizarre behavior:
some time it works (the job is initialized by one of the system
threads) and sometime it doesn't (the job is initialized by one of the
user threads). Once you know the root cause, it's pretty trivial to
come up with a patch. The default fifo scheduler and capacity
scheduler do not have this bug.

On Tue, Oct 16, 2012 at 4:52 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks everyone, Seem like i hit the dead end.
> It's kind of funny when i read that jira; run it 4 time and everything
> will work.. where that magic number from..lol
>
> respects
>
> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>
>> is the bug that Robin is referring to.
>>
>> --
>> Arpit Gupta
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
>> wrote:
>>
>> This is similar to issues I ran into with permissions/ownership of
>> mapred.system.dir when using the fair scheduler.  We are instructed to set
>> the ownership of mapred.system.dir to mapred:hadoop and then when the job
>> tracker starts up (running as user mapred) it explicitly sets the
>> permissions on this directory to 700.  Meanwhile when I go to run a job as
>> a regular user, it is trying to write stuff into mapred.system.dir but it
>> can't due to the ownership/permissions that have been established.
>>
>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
>> it appears from your experience that there are similar issues with
>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
>> the user's identity rather than as user mapred.  This is good from a
>> security perspective yet it seems no one bothered to account for this in
>> terms of the permissions that need to be set in the various directories to
>> enable this.
>>
>> Until this is sorted out by the Hadoop developers, I've put my attempts to
>> use the fair scheduler on holdŠ
>>
>> Regards,
>> Robin Goldstone, LLNL
>>
>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>> wrote:
>>
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>>
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>>
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>>
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>>
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>>
>> Which is the right approach i should work with?
>> Please suggest
>>
>> Patai
>>
>>
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>>
>> That version should have the support for all of this.
>>
>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>>
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>>
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>> .html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J
>>
>>
>>

Re: Fair scheduler.

Posted by Luke Lu <ll...@apache.org>.

You have a different issue (in addition to MAPREDUCE-4398). setting
mapreduce.jobtracker.staging.root.dir to /user will solve the first
problem. The "magic" number of 4 is the default number of hard-coded
job init threads (mapred.jobinit.threads). You have to submit 4 or
more jobs as the jobtracker user at the same time to make sure the job
init thread are initialized as the system user so they can access the
mapred.system.dir (for security reasons, it must be 700). Otherwise,
some of the job init threads will be initialized as whatever user who
first submits a job. This can lead to seemingly more bizarre behavior:
some time it works (the job is initialized by one of the system
threads) and sometime it doesn't (the job is initialized by one of the
user threads). Once you know the root cause, it's pretty trivial to
come up with a patch. The default fifo scheduler and capacity
scheduler do not have this bug.

On Tue, Oct 16, 2012 at 4:52 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks everyone, Seem like i hit the dead end.
> It's kind of funny when i read that jira; run it 4 time and everything
> will work.. where that magic number from..lol
>
> respects
>
> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>
>> is the bug that Robin is referring to.
>>
>> --
>> Arpit Gupta
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
>> wrote:
>>
>> This is similar to issues I ran into with permissions/ownership of
>> mapred.system.dir when using the fair scheduler.  We are instructed to set
>> the ownership of mapred.system.dir to mapred:hadoop and then when the job
>> tracker starts up (running as user mapred) it explicitly sets the
>> permissions on this directory to 700.  Meanwhile when I go to run a job as
>> a regular user, it is trying to write stuff into mapred.system.dir but it
>> can't due to the ownership/permissions that have been established.
>>
>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
>> it appears from your experience that there are similar issues with
>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
>> the user's identity rather than as user mapred.  This is good from a
>> security perspective yet it seems no one bothered to account for this in
>> terms of the permissions that need to be set in the various directories to
>> enable this.
>>
>> Until this is sorted out by the Hadoop developers, I've put my attempts to
>> use the fair scheduler on holdŠ
>>
>> Regards,
>> Robin Goldstone, LLNL
>>
>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>> wrote:
>>
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>>
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>>
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>>
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>>
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>>
>> Which is the right approach i should work with?
>> Please suggest
>>
>> Patai
>>
>>
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>>
>> That version should have the support for all of this.
>>
>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>>
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>>
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>> .html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J
>>
>>
>>

Re: Fair scheduler.

Posted by Luke Lu <ll...@apache.org>.

You have a different issue (in addition to MAPREDUCE-4398). setting
mapreduce.jobtracker.staging.root.dir to /user will solve the first
problem. The "magic" number of 4 is the default number of hard-coded
job init threads (mapred.jobinit.threads). You have to submit 4 or
more jobs as the jobtracker user at the same time to make sure the job
init thread are initialized as the system user so they can access the
mapred.system.dir (for security reasons, it must be 700). Otherwise,
some of the job init threads will be initialized as whatever user who
first submits a job. This can lead to seemingly more bizarre behavior:
some time it works (the job is initialized by one of the system
threads) and sometime it doesn't (the job is initialized by one of the
user threads). Once you know the root cause, it's pretty trivial to
come up with a patch. The default fifo scheduler and capacity
scheduler do not have this bug.

On Tue, Oct 16, 2012 at 4:52 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks everyone, Seem like i hit the dead end.
> It's kind of funny when i read that jira; run it 4 time and everything
> will work.. where that magic number from..lol
>
> respects
>
> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>
>> is the bug that Robin is referring to.
>>
>> --
>> Arpit Gupta
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
>> wrote:
>>
>> This is similar to issues I ran into with permissions/ownership of
>> mapred.system.dir when using the fair scheduler.  We are instructed to set
>> the ownership of mapred.system.dir to mapred:hadoop and then when the job
>> tracker starts up (running as user mapred) it explicitly sets the
>> permissions on this directory to 700.  Meanwhile when I go to run a job as
>> a regular user, it is trying to write stuff into mapred.system.dir but it
>> can't due to the ownership/permissions that have been established.
>>
>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
>> it appears from your experience that there are similar issues with
>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
>> the user's identity rather than as user mapred.  This is good from a
>> security perspective yet it seems no one bothered to account for this in
>> terms of the permissions that need to be set in the various directories to
>> enable this.
>>
>> Until this is sorted out by the Hadoop developers, I've put my attempts to
>> use the fair scheduler on holdŠ
>>
>> Regards,
>> Robin Goldstone, LLNL
>>
>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>> wrote:
>>
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>>
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>>
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>>
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>>
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>>
>> Which is the right approach i should work with?
>> Please suggest
>>
>> Patai
>>
>>
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>>
>> That version should have the support for all of this.
>>
>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>>
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>>
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>> .html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J
>>
>>
>>

Re: Fair scheduler.

Posted by Luke Lu <ll...@apache.org>.

You have a different issue (in addition to MAPREDUCE-4398). setting
mapreduce.jobtracker.staging.root.dir to /user will solve the first
problem. The "magic" number of 4 is the default number of hard-coded
job init threads (mapred.jobinit.threads). You have to submit 4 or
more jobs as the jobtracker user at the same time to make sure the job
init thread are initialized as the system user so they can access the
mapred.system.dir (for security reasons, it must be 700). Otherwise,
some of the job init threads will be initialized as whatever user who
first submits a job. This can lead to seemingly more bizarre behavior:
some time it works (the job is initialized by one of the system
threads) and sometime it doesn't (the job is initialized by one of the
user threads). Once you know the root cause, it's pretty trivial to
come up with a patch. The default fifo scheduler and capacity
scheduler do not have this bug.

On Tue, Oct 16, 2012 at 4:52 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks everyone, Seem like i hit the dead end.
> It's kind of funny when i read that jira; run it 4 time and everything
> will work.. where that magic number from..lol
>
> respects
>
> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>
>> is the bug that Robin is referring to.
>>
>> --
>> Arpit Gupta
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
>> wrote:
>>
>> This is similar to issues I ran into with permissions/ownership of
>> mapred.system.dir when using the fair scheduler.  We are instructed to set
>> the ownership of mapred.system.dir to mapred:hadoop and then when the job
>> tracker starts up (running as user mapred) it explicitly sets the
>> permissions on this directory to 700.  Meanwhile when I go to run a job as
>> a regular user, it is trying to write stuff into mapred.system.dir but it
>> can't due to the ownership/permissions that have been established.
>>
>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
>> it appears from your experience that there are similar issues with
>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
>> the user's identity rather than as user mapred.  This is good from a
>> security perspective yet it seems no one bothered to account for this in
>> terms of the permissions that need to be set in the various directories to
>> enable this.
>>
>> Until this is sorted out by the Hadoop developers, I've put my attempts to
>> use the fair scheduler on holdŠ
>>
>> Regards,
>> Robin Goldstone, LLNL
>>
>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>> wrote:
>>
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>>
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>>
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>>
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>>
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>>
>> Which is the right approach i should work with?
>> Please suggest
>>
>> Patai
>>
>>
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>>
>> That version should have the support for all of this.
>>
>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>>
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>>
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>> .html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J
>>
>>
>>

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

No, you're right - to define the queue names at the cluster level, the
mapred.queue.names is the right config. To specify a queue at the job
level, mapred.job.queue.name is the right config.

On Wed, Oct 17, 2012 at 11:10 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Harsh.. i am testing it again according to your last instruction.
>
>>> 2. Define your required queues:
>>>mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>default, foo and bar.
>
> From http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4/cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons
> I couldn't find "mapred.job.queues" from that link so i have been
> using mapred.queue.names which might be the case that it is my fault.
>
> Please suggest
>
> On Wed, Oct 17, 2012 at 8:43 AM, Harsh J <ha...@cloudera.com> wrote:
>> Hey Robin,
>>
>> Thanks for the detailed post.
>>
>> Just looked at your older thread, and you're right, the JT does write
>> into its system dir for users' job info and token files when
>> initializing the Job. The bug you ran into and the exception+trace you
>> got makes sense now.
>>
>> I just didn't see it on version which Patai seems to be using. I think
>> if he specifies a proper staging directory, he'll go through, cause
>> his trace is different than that of MAPREDUCE-4398 (i.e. system dir
>> vs. staging dir - you had system dir unfortunately).
>>
>> On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
>> <go...@llnl.gov> wrote:
>>> Yes, you would think that users shouldn't need to write to
>>> mapred.system.dir, yet that seems to be the case.  I posted details about
>>> my configuration along with full stack traces last week.  I won't re-post
>>> everything but essentially I have mapred.system.dir defined as a directory
>>> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
>>> but when the job tracker started up it changed the permissions to 700.
>>> Then when I ran a job as a regular user I got this error:
>>>
>>> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
>>> failed:
>>> org.apache.hadoop.security.AccessControlException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>>>
>>>
>>> I then manually changed the permissions back to 755 and ran again and got
>>> this error:
>>> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
>>> failed:
>>> org.apache.hadoop.security.AccessControlException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>>>
>>> I then changed the permissions to 777 and the job ran successfully.  This
>>> suggests that some process was trying to write to write to
>>> mapred.system.dir but did not have sufficient permissions.  The
>>> speculation is that this was being attempted under my uid instead of
>>> mapred.  Perhaps it is something else. I welcome your suggestions.
>>>
>>>
>>> For completeness, I also have mapred.jobtracker.staging.root.dir set to
>>> /user within HDFS.  I can verify the staging files are going there but
>>> something else is still trying to access mapred.system.dir.
>>>
>>> Robin Goldstone, LLNL
>>>
>>> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>>
>>>>Hi,
>>>>
>>>>Regular users never write into the mapred.system.dir AFAICT. That
>>>>directory, is just for the JT to use to mark its presence and to
>>>>"expose" the distributed filesystem it will be relying on.
>>>>
>>>>Users write to their respective staging directories, which lies
>>>>elsewhere and is per-user.
>>>>
>>>>Let me post my environment:
>>>>
>>>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>>>the JT can feel free to reconfigure it.
>>>>
>>>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>>>the parent directory for user's to write their per-user job stage
>>>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>>>user's home directories, owned all by them. For example:
>>>>
>>>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>>>
>>>>All staging files from local user 'harsh' are hence written as the
>>>>proper user under /user/harsh/.staging since that user does have
>>>>permissions to write there. For any user to access HDFS, they'd need a
>>>>home directory created on the HDFS by the admin first - and after that
>>>>things users do under their own directory, will work just fine. The JT
>>>>would not have to try to create per-user directories.
>>>>
>>>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>>>><si...@gmail.com> wrote:
>>>>> Thanks everyone, Seem like i hit the dead end.
>>>>> It's kind of funny when i read that jira; run it 4 time and everything
>>>>> will work.. where that magic number from..lol
>>>>>
>>>>> respects
>>>>>
>>>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>>>wrote:
>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>>>
>>>>>> is the bug that Robin is referring to.
>>>>>>
>>>>>> --
>>>>>> Arpit Gupta
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>>>><go...@llnl.gov>
>>>>>> wrote:
>>>>>>
>>>>>> This is similar to issues I ran into with permissions/ownership of
>>>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>>>set
>>>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>>>job
>>>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>>>job as
>>>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>>>it
>>>>>> can't due to the ownership/permissions that have been established.
>>>>>>
>>>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>>>and
>>>>>> it appears from your experience that there are similar issues with
>>>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>>>under
>>>>>> the user's identity rather than as user mapred.  This is good from a
>>>>>> security perspective yet it seems no one bothered to account for this
>>>>>>in
>>>>>> terms of the permissions that need to be set in the various
>>>>>>directories to
>>>>>> enable this.
>>>>>>
>>>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>>>attempts to
>>>>>> use the fair scheduler on holdŠ
>>>>>>
>>>>>> Regards,
>>>>>> Robin Goldstone, LLNL
>>>>>>
>>>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Harsh,
>>>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>>>> from the instruction.
>>>>>> The 2% is about hadoop.tmp.dir
>>>>>>
>>>>>> let's say i have 2 users
>>>>>> userA is a user that start hdfs and mapred
>>>>>> userB is a regular user
>>>>>>
>>>>>> if i use default value of  hadoop.tmp.dir
>>>>>> /tmp/hadoop-${user.name}
>>>>>> I can submit job as usersA but not by usersB
>>>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>>>> :userA:supergroup:drwxr-xr-x
>>>>>>
>>>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>>>> /tmp/hadoop.
>>>>>> This way it is almost a yay way; the thing is
>>>>>>
>>>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>>>> ownership will be userA.userA,
>>>>>> and once I tried to submit job from the same machine as userB I will
>>>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>>>> Permission denied"
>>>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>>>> be able to submit job.
>>>>>>
>>>>>> Which is the right approach i should work with?
>>>>>> Please suggest
>>>>>>
>>>>>> Patai
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>>
>>>>>> Hi Patai,
>>>>>>
>>>>>> Reply inline.
>>>>>>
>>>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>>>> <si...@gmail.com> wrote:
>>>>>>
>>>>>> Thanks for input,
>>>>>>
>>>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>>>
>>>>>>
>>>>>> That version should have the support for all of this.
>>>>>>
>>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>>> can leverage the Per-Queue ACLs
>>>>>>
>>>>>>
>>>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>>>> can leverage Per-Queue ACL of each queue.?
>>>>>>
>>>>>>
>>>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>>>> Queue configuration.
>>>>>>
>>>>>> All you need to do is the following:
>>>>>>
>>>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>>>
>>>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>>>
>>>>>> 2. Define your required queues:
>>>>>>
>>>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>>>> default, foo and bar.
>>>>>>
>>>>>> 3. Define Submit ACLs for each Queue:
>>>>>>
>>>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>>>> (usernames groupnames)
>>>>>>
>>>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>>>
>>>>>> Likewise for remaining queues, as you need itŠ
>>>>>>
>>>>>> 4. Enable ACLs and restart JT.
>>>>>>
>>>>>> mapred.acls.enabled set to "true"
>>>>>>
>>>>>> 5. Users then use the right API to set queue names before submitting
>>>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>>>
>>>>>>
>>>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>>>f
>>>>>> .html#setQueueName(java.lang.String)
>>>>>>
>>>>>> 6. Done.
>>>>>>
>>>>>> Let us know if this works!
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Harsh J
>>>
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

No, you're right - to define the queue names at the cluster level, the
mapred.queue.names is the right config. To specify a queue at the job
level, mapred.job.queue.name is the right config.

On Wed, Oct 17, 2012 at 11:10 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Harsh.. i am testing it again according to your last instruction.
>
>>> 2. Define your required queues:
>>>mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>default, foo and bar.
>
> From http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4/cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons
> I couldn't find "mapred.job.queues" from that link so i have been
> using mapred.queue.names which might be the case that it is my fault.
>
> Please suggest
>
> On Wed, Oct 17, 2012 at 8:43 AM, Harsh J <ha...@cloudera.com> wrote:
>> Hey Robin,
>>
>> Thanks for the detailed post.
>>
>> Just looked at your older thread, and you're right, the JT does write
>> into its system dir for users' job info and token files when
>> initializing the Job. The bug you ran into and the exception+trace you
>> got makes sense now.
>>
>> I just didn't see it on version which Patai seems to be using. I think
>> if he specifies a proper staging directory, he'll go through, cause
>> his trace is different than that of MAPREDUCE-4398 (i.e. system dir
>> vs. staging dir - you had system dir unfortunately).
>>
>> On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
>> <go...@llnl.gov> wrote:
>>> Yes, you would think that users shouldn't need to write to
>>> mapred.system.dir, yet that seems to be the case.  I posted details about
>>> my configuration along with full stack traces last week.  I won't re-post
>>> everything but essentially I have mapred.system.dir defined as a directory
>>> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
>>> but when the job tracker started up it changed the permissions to 700.
>>> Then when I ran a job as a regular user I got this error:
>>>
>>> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
>>> failed:
>>> org.apache.hadoop.security.AccessControlException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>>>
>>>
>>> I then manually changed the permissions back to 755 and ran again and got
>>> this error:
>>> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
>>> failed:
>>> org.apache.hadoop.security.AccessControlException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>>>
>>> I then changed the permissions to 777 and the job ran successfully.  This
>>> suggests that some process was trying to write to write to
>>> mapred.system.dir but did not have sufficient permissions.  The
>>> speculation is that this was being attempted under my uid instead of
>>> mapred.  Perhaps it is something else. I welcome your suggestions.
>>>
>>>
>>> For completeness, I also have mapred.jobtracker.staging.root.dir set to
>>> /user within HDFS.  I can verify the staging files are going there but
>>> something else is still trying to access mapred.system.dir.
>>>
>>> Robin Goldstone, LLNL
>>>
>>> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>>
>>>>Hi,
>>>>
>>>>Regular users never write into the mapred.system.dir AFAICT. That
>>>>directory, is just for the JT to use to mark its presence and to
>>>>"expose" the distributed filesystem it will be relying on.
>>>>
>>>>Users write to their respective staging directories, which lies
>>>>elsewhere and is per-user.
>>>>
>>>>Let me post my environment:
>>>>
>>>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>>>the JT can feel free to reconfigure it.
>>>>
>>>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>>>the parent directory for user's to write their per-user job stage
>>>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>>>user's home directories, owned all by them. For example:
>>>>
>>>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>>>
>>>>All staging files from local user 'harsh' are hence written as the
>>>>proper user under /user/harsh/.staging since that user does have
>>>>permissions to write there. For any user to access HDFS, they'd need a
>>>>home directory created on the HDFS by the admin first - and after that
>>>>things users do under their own directory, will work just fine. The JT
>>>>would not have to try to create per-user directories.
>>>>
>>>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>>>><si...@gmail.com> wrote:
>>>>> Thanks everyone, Seem like i hit the dead end.
>>>>> It's kind of funny when i read that jira; run it 4 time and everything
>>>>> will work.. where that magic number from..lol
>>>>>
>>>>> respects
>>>>>
>>>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>>>wrote:
>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>>>
>>>>>> is the bug that Robin is referring to.
>>>>>>
>>>>>> --
>>>>>> Arpit Gupta
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>>>><go...@llnl.gov>
>>>>>> wrote:
>>>>>>
>>>>>> This is similar to issues I ran into with permissions/ownership of
>>>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>>>set
>>>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>>>job
>>>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>>>job as
>>>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>>>it
>>>>>> can't due to the ownership/permissions that have been established.
>>>>>>
>>>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>>>and
>>>>>> it appears from your experience that there are similar issues with
>>>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>>>under
>>>>>> the user's identity rather than as user mapred.  This is good from a
>>>>>> security perspective yet it seems no one bothered to account for this
>>>>>>in
>>>>>> terms of the permissions that need to be set in the various
>>>>>>directories to
>>>>>> enable this.
>>>>>>
>>>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>>>attempts to
>>>>>> use the fair scheduler on holdŠ
>>>>>>
>>>>>> Regards,
>>>>>> Robin Goldstone, LLNL
>>>>>>
>>>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Harsh,
>>>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>>>> from the instruction.
>>>>>> The 2% is about hadoop.tmp.dir
>>>>>>
>>>>>> let's say i have 2 users
>>>>>> userA is a user that start hdfs and mapred
>>>>>> userB is a regular user
>>>>>>
>>>>>> if i use default value of  hadoop.tmp.dir
>>>>>> /tmp/hadoop-${user.name}
>>>>>> I can submit job as usersA but not by usersB
>>>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>>>> :userA:supergroup:drwxr-xr-x
>>>>>>
>>>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>>>> /tmp/hadoop.
>>>>>> This way it is almost a yay way; the thing is
>>>>>>
>>>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>>>> ownership will be userA.userA,
>>>>>> and once I tried to submit job from the same machine as userB I will
>>>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>>>> Permission denied"
>>>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>>>> be able to submit job.
>>>>>>
>>>>>> Which is the right approach i should work with?
>>>>>> Please suggest
>>>>>>
>>>>>> Patai
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>>
>>>>>> Hi Patai,
>>>>>>
>>>>>> Reply inline.
>>>>>>
>>>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>>>> <si...@gmail.com> wrote:
>>>>>>
>>>>>> Thanks for input,
>>>>>>
>>>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>>>
>>>>>>
>>>>>> That version should have the support for all of this.
>>>>>>
>>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>>> can leverage the Per-Queue ACLs
>>>>>>
>>>>>>
>>>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>>>> can leverage Per-Queue ACL of each queue.?
>>>>>>
>>>>>>
>>>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>>>> Queue configuration.
>>>>>>
>>>>>> All you need to do is the following:
>>>>>>
>>>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>>>
>>>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>>>
>>>>>> 2. Define your required queues:
>>>>>>
>>>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>>>> default, foo and bar.
>>>>>>
>>>>>> 3. Define Submit ACLs for each Queue:
>>>>>>
>>>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>>>> (usernames groupnames)
>>>>>>
>>>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>>>
>>>>>> Likewise for remaining queues, as you need itŠ
>>>>>>
>>>>>> 4. Enable ACLs and restart JT.
>>>>>>
>>>>>> mapred.acls.enabled set to "true"
>>>>>>
>>>>>> 5. Users then use the right API to set queue names before submitting
>>>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>>>
>>>>>>
>>>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>>>f
>>>>>> .html#setQueueName(java.lang.String)
>>>>>>
>>>>>> 6. Done.
>>>>>>
>>>>>> Let us know if this works!
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Harsh J
>>>
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

No, you're right - to define the queue names at the cluster level, the
mapred.queue.names is the right config. To specify a queue at the job
level, mapred.job.queue.name is the right config.

On Wed, Oct 17, 2012 at 11:10 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Harsh.. i am testing it again according to your last instruction.
>
>>> 2. Define your required queues:
>>>mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>default, foo and bar.
>
> From http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4/cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons
> I couldn't find "mapred.job.queues" from that link so i have been
> using mapred.queue.names which might be the case that it is my fault.
>
> Please suggest
>
> On Wed, Oct 17, 2012 at 8:43 AM, Harsh J <ha...@cloudera.com> wrote:
>> Hey Robin,
>>
>> Thanks for the detailed post.
>>
>> Just looked at your older thread, and you're right, the JT does write
>> into its system dir for users' job info and token files when
>> initializing the Job. The bug you ran into and the exception+trace you
>> got makes sense now.
>>
>> I just didn't see it on version which Patai seems to be using. I think
>> if he specifies a proper staging directory, he'll go through, cause
>> his trace is different than that of MAPREDUCE-4398 (i.e. system dir
>> vs. staging dir - you had system dir unfortunately).
>>
>> On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
>> <go...@llnl.gov> wrote:
>>> Yes, you would think that users shouldn't need to write to
>>> mapred.system.dir, yet that seems to be the case.  I posted details about
>>> my configuration along with full stack traces last week.  I won't re-post
>>> everything but essentially I have mapred.system.dir defined as a directory
>>> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
>>> but when the job tracker started up it changed the permissions to 700.
>>> Then when I ran a job as a regular user I got this error:
>>>
>>> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
>>> failed:
>>> org.apache.hadoop.security.AccessControlException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>>>
>>>
>>> I then manually changed the permissions back to 755 and ran again and got
>>> this error:
>>> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
>>> failed:
>>> org.apache.hadoop.security.AccessControlException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>>>
>>> I then changed the permissions to 777 and the job ran successfully.  This
>>> suggests that some process was trying to write to write to
>>> mapred.system.dir but did not have sufficient permissions.  The
>>> speculation is that this was being attempted under my uid instead of
>>> mapred.  Perhaps it is something else. I welcome your suggestions.
>>>
>>>
>>> For completeness, I also have mapred.jobtracker.staging.root.dir set to
>>> /user within HDFS.  I can verify the staging files are going there but
>>> something else is still trying to access mapred.system.dir.
>>>
>>> Robin Goldstone, LLNL
>>>
>>> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>>
>>>>Hi,
>>>>
>>>>Regular users never write into the mapred.system.dir AFAICT. That
>>>>directory, is just for the JT to use to mark its presence and to
>>>>"expose" the distributed filesystem it will be relying on.
>>>>
>>>>Users write to their respective staging directories, which lies
>>>>elsewhere and is per-user.
>>>>
>>>>Let me post my environment:
>>>>
>>>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>>>the JT can feel free to reconfigure it.
>>>>
>>>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>>>the parent directory for user's to write their per-user job stage
>>>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>>>user's home directories, owned all by them. For example:
>>>>
>>>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>>>
>>>>All staging files from local user 'harsh' are hence written as the
>>>>proper user under /user/harsh/.staging since that user does have
>>>>permissions to write there. For any user to access HDFS, they'd need a
>>>>home directory created on the HDFS by the admin first - and after that
>>>>things users do under their own directory, will work just fine. The JT
>>>>would not have to try to create per-user directories.
>>>>
>>>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>>>><si...@gmail.com> wrote:
>>>>> Thanks everyone, Seem like i hit the dead end.
>>>>> It's kind of funny when i read that jira; run it 4 time and everything
>>>>> will work.. where that magic number from..lol
>>>>>
>>>>> respects
>>>>>
>>>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>>>wrote:
>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>>>
>>>>>> is the bug that Robin is referring to.
>>>>>>
>>>>>> --
>>>>>> Arpit Gupta
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>>>><go...@llnl.gov>
>>>>>> wrote:
>>>>>>
>>>>>> This is similar to issues I ran into with permissions/ownership of
>>>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>>>set
>>>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>>>job
>>>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>>>job as
>>>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>>>it
>>>>>> can't due to the ownership/permissions that have been established.
>>>>>>
>>>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>>>and
>>>>>> it appears from your experience that there are similar issues with
>>>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>>>under
>>>>>> the user's identity rather than as user mapred.  This is good from a
>>>>>> security perspective yet it seems no one bothered to account for this
>>>>>>in
>>>>>> terms of the permissions that need to be set in the various
>>>>>>directories to
>>>>>> enable this.
>>>>>>
>>>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>>>attempts to
>>>>>> use the fair scheduler on holdŠ
>>>>>>
>>>>>> Regards,
>>>>>> Robin Goldstone, LLNL
>>>>>>
>>>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Harsh,
>>>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>>>> from the instruction.
>>>>>> The 2% is about hadoop.tmp.dir
>>>>>>
>>>>>> let's say i have 2 users
>>>>>> userA is a user that start hdfs and mapred
>>>>>> userB is a regular user
>>>>>>
>>>>>> if i use default value of  hadoop.tmp.dir
>>>>>> /tmp/hadoop-${user.name}
>>>>>> I can submit job as usersA but not by usersB
>>>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>>>> :userA:supergroup:drwxr-xr-x
>>>>>>
>>>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>>>> /tmp/hadoop.
>>>>>> This way it is almost a yay way; the thing is
>>>>>>
>>>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>>>> ownership will be userA.userA,
>>>>>> and once I tried to submit job from the same machine as userB I will
>>>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>>>> Permission denied"
>>>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>>>> be able to submit job.
>>>>>>
>>>>>> Which is the right approach i should work with?
>>>>>> Please suggest
>>>>>>
>>>>>> Patai
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>>
>>>>>> Hi Patai,
>>>>>>
>>>>>> Reply inline.
>>>>>>
>>>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>>>> <si...@gmail.com> wrote:
>>>>>>
>>>>>> Thanks for input,
>>>>>>
>>>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>>>
>>>>>>
>>>>>> That version should have the support for all of this.
>>>>>>
>>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>>> can leverage the Per-Queue ACLs
>>>>>>
>>>>>>
>>>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>>>> can leverage Per-Queue ACL of each queue.?
>>>>>>
>>>>>>
>>>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>>>> Queue configuration.
>>>>>>
>>>>>> All you need to do is the following:
>>>>>>
>>>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>>>
>>>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>>>
>>>>>> 2. Define your required queues:
>>>>>>
>>>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>>>> default, foo and bar.
>>>>>>
>>>>>> 3. Define Submit ACLs for each Queue:
>>>>>>
>>>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>>>> (usernames groupnames)
>>>>>>
>>>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>>>
>>>>>> Likewise for remaining queues, as you need itŠ
>>>>>>
>>>>>> 4. Enable ACLs and restart JT.
>>>>>>
>>>>>> mapred.acls.enabled set to "true"
>>>>>>
>>>>>> 5. Users then use the right API to set queue names before submitting
>>>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>>>
>>>>>>
>>>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>>>f
>>>>>> .html#setQueueName(java.lang.String)
>>>>>>
>>>>>> 6. Done.
>>>>>>
>>>>>> Let us know if this works!
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Harsh J
>>>
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

No, you're right - to define the queue names at the cluster level, the
mapred.queue.names is the right config. To specify a queue at the job
level, mapred.job.queue.name is the right config.

On Wed, Oct 17, 2012 at 11:10 PM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Harsh.. i am testing it again according to your last instruction.
>
>>> 2. Define your required queues:
>>>mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>default, foo and bar.
>
> From http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4/cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons
> I couldn't find "mapred.job.queues" from that link so i have been
> using mapred.queue.names which might be the case that it is my fault.
>
> Please suggest
>
> On Wed, Oct 17, 2012 at 8:43 AM, Harsh J <ha...@cloudera.com> wrote:
>> Hey Robin,
>>
>> Thanks for the detailed post.
>>
>> Just looked at your older thread, and you're right, the JT does write
>> into its system dir for users' job info and token files when
>> initializing the Job. The bug you ran into and the exception+trace you
>> got makes sense now.
>>
>> I just didn't see it on version which Patai seems to be using. I think
>> if he specifies a proper staging directory, he'll go through, cause
>> his trace is different than that of MAPREDUCE-4398 (i.e. system dir
>> vs. staging dir - you had system dir unfortunately).
>>
>> On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
>> <go...@llnl.gov> wrote:
>>> Yes, you would think that users shouldn't need to write to
>>> mapred.system.dir, yet that seems to be the case.  I posted details about
>>> my configuration along with full stack traces last week.  I won't re-post
>>> everything but essentially I have mapred.system.dir defined as a directory
>>> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
>>> but when the job tracker started up it changed the permissions to 700.
>>> Then when I ran a job as a regular user I got this error:
>>>
>>> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
>>> failed:
>>> org.apache.hadoop.security.AccessControlException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>>>
>>>
>>> I then manually changed the permissions back to 755 and ran again and got
>>> this error:
>>> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
>>> failed:
>>> org.apache.hadoop.security.AccessControlException:
>>> org.apache.hadoop.security.AccessControlException: Permission denied:
>>> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>>>
>>> I then changed the permissions to 777 and the job ran successfully.  This
>>> suggests that some process was trying to write to write to
>>> mapred.system.dir but did not have sufficient permissions.  The
>>> speculation is that this was being attempted under my uid instead of
>>> mapred.  Perhaps it is something else. I welcome your suggestions.
>>>
>>>
>>> For completeness, I also have mapred.jobtracker.staging.root.dir set to
>>> /user within HDFS.  I can verify the staging files are going there but
>>> something else is still trying to access mapred.system.dir.
>>>
>>> Robin Goldstone, LLNL
>>>
>>> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>>
>>>>Hi,
>>>>
>>>>Regular users never write into the mapred.system.dir AFAICT. That
>>>>directory, is just for the JT to use to mark its presence and to
>>>>"expose" the distributed filesystem it will be relying on.
>>>>
>>>>Users write to their respective staging directories, which lies
>>>>elsewhere and is per-user.
>>>>
>>>>Let me post my environment:
>>>>
>>>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>>>the JT can feel free to reconfigure it.
>>>>
>>>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>>>the parent directory for user's to write their per-user job stage
>>>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>>>user's home directories, owned all by them. For example:
>>>>
>>>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>>>
>>>>All staging files from local user 'harsh' are hence written as the
>>>>proper user under /user/harsh/.staging since that user does have
>>>>permissions to write there. For any user to access HDFS, they'd need a
>>>>home directory created on the HDFS by the admin first - and after that
>>>>things users do under their own directory, will work just fine. The JT
>>>>would not have to try to create per-user directories.
>>>>
>>>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>>>><si...@gmail.com> wrote:
>>>>> Thanks everyone, Seem like i hit the dead end.
>>>>> It's kind of funny when i read that jira; run it 4 time and everything
>>>>> will work.. where that magic number from..lol
>>>>>
>>>>> respects
>>>>>
>>>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>>>wrote:
>>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>>>
>>>>>> is the bug that Robin is referring to.
>>>>>>
>>>>>> --
>>>>>> Arpit Gupta
>>>>>> Hortonworks Inc.
>>>>>> http://hortonworks.com/
>>>>>>
>>>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>>>><go...@llnl.gov>
>>>>>> wrote:
>>>>>>
>>>>>> This is similar to issues I ran into with permissions/ownership of
>>>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>>>set
>>>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>>>job
>>>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>>>job as
>>>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>>>it
>>>>>> can't due to the ownership/permissions that have been established.
>>>>>>
>>>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>>>and
>>>>>> it appears from your experience that there are similar issues with
>>>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>>>under
>>>>>> the user's identity rather than as user mapred.  This is good from a
>>>>>> security perspective yet it seems no one bothered to account for this
>>>>>>in
>>>>>> terms of the permissions that need to be set in the various
>>>>>>directories to
>>>>>> enable this.
>>>>>>
>>>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>>>attempts to
>>>>>> use the fair scheduler on holdŠ
>>>>>>
>>>>>> Regards,
>>>>>> Robin Goldstone, LLNL
>>>>>>
>>>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Harsh,
>>>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>>>> from the instruction.
>>>>>> The 2% is about hadoop.tmp.dir
>>>>>>
>>>>>> let's say i have 2 users
>>>>>> userA is a user that start hdfs and mapred
>>>>>> userB is a regular user
>>>>>>
>>>>>> if i use default value of  hadoop.tmp.dir
>>>>>> /tmp/hadoop-${user.name}
>>>>>> I can submit job as usersA but not by usersB
>>>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>>>> :userA:supergroup:drwxr-xr-x
>>>>>>
>>>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>>>> /tmp/hadoop.
>>>>>> This way it is almost a yay way; the thing is
>>>>>>
>>>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>>>> ownership will be userA.userA,
>>>>>> and once I tried to submit job from the same machine as userB I will
>>>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>>>> Permission denied"
>>>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>>>> be able to submit job.
>>>>>>
>>>>>> Which is the right approach i should work with?
>>>>>> Please suggest
>>>>>>
>>>>>> Patai
>>>>>>
>>>>>>
>>>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>>
>>>>>> Hi Patai,
>>>>>>
>>>>>> Reply inline.
>>>>>>
>>>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>>>> <si...@gmail.com> wrote:
>>>>>>
>>>>>> Thanks for input,
>>>>>>
>>>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>>>
>>>>>>
>>>>>> That version should have the support for all of this.
>>>>>>
>>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>>> can leverage the Per-Queue ACLs
>>>>>>
>>>>>>
>>>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>>>> can leverage Per-Queue ACL of each queue.?
>>>>>>
>>>>>>
>>>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>>>> Queue configuration.
>>>>>>
>>>>>> All you need to do is the following:
>>>>>>
>>>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>>>
>>>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>>>
>>>>>> 2. Define your required queues:
>>>>>>
>>>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>>>> default, foo and bar.
>>>>>>
>>>>>> 3. Define Submit ACLs for each Queue:
>>>>>>
>>>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>>>> (usernames groupnames)
>>>>>>
>>>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>>>
>>>>>> Likewise for remaining queues, as you need itŠ
>>>>>>
>>>>>> 4. Enable ACLs and restart JT.
>>>>>>
>>>>>> mapred.acls.enabled set to "true"
>>>>>>
>>>>>> 5. Users then use the right API to set queue names before submitting
>>>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>>>
>>>>>>
>>>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>>>f
>>>>>> .html#setQueueName(java.lang.String)
>>>>>>
>>>>>> 6. Done.
>>>>>>
>>>>>> Let us know if this works!
>>>>>>
>>>>>> --
>>>>>> Harsh J
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>>--
>>>>Harsh J
>>>
>>
>>
>>
>> --
>> Harsh J



-- 
Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Harsh.. i am testing it again according to your last instruction.

>> 2. Define your required queues:
>>mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>default, foo and bar.

>From http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4/cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons
I couldn't find "mapred.job.queues" from that link so i have been
using mapred.queue.names which might be the case that it is my fault.

Please suggest

On Wed, Oct 17, 2012 at 8:43 AM, Harsh J <ha...@cloudera.com> wrote:
> Hey Robin,
>
> Thanks for the detailed post.
>
> Just looked at your older thread, and you're right, the JT does write
> into its system dir for users' job info and token files when
> initializing the Job. The bug you ran into and the exception+trace you
> got makes sense now.
>
> I just didn't see it on version which Patai seems to be using. I think
> if he specifies a proper staging directory, he'll go through, cause
> his trace is different than that of MAPREDUCE-4398 (i.e. system dir
> vs. staging dir - you had system dir unfortunately).
>
> On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
> <go...@llnl.gov> wrote:
>> Yes, you would think that users shouldn't need to write to
>> mapred.system.dir, yet that seems to be the case.  I posted details about
>> my configuration along with full stack traces last week.  I won't re-post
>> everything but essentially I have mapred.system.dir defined as a directory
>> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
>> but when the job tracker started up it changed the permissions to 700.
>> Then when I ran a job as a regular user I got this error:
>>
>> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
>> failed:
>> org.apache.hadoop.security.AccessControlException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>>
>>
>> I then manually changed the permissions back to 755 and ran again and got
>> this error:
>> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
>> failed:
>> org.apache.hadoop.security.AccessControlException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>>
>> I then changed the permissions to 777 and the job ran successfully.  This
>> suggests that some process was trying to write to write to
>> mapred.system.dir but did not have sufficient permissions.  The
>> speculation is that this was being attempted under my uid instead of
>> mapred.  Perhaps it is something else. I welcome your suggestions.
>>
>>
>> For completeness, I also have mapred.jobtracker.staging.root.dir set to
>> /user within HDFS.  I can verify the staging files are going there but
>> something else is still trying to access mapred.system.dir.
>>
>> Robin Goldstone, LLNL
>>
>> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>>>Hi,
>>>
>>>Regular users never write into the mapred.system.dir AFAICT. That
>>>directory, is just for the JT to use to mark its presence and to
>>>"expose" the distributed filesystem it will be relying on.
>>>
>>>Users write to their respective staging directories, which lies
>>>elsewhere and is per-user.
>>>
>>>Let me post my environment:
>>>
>>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>>the JT can feel free to reconfigure it.
>>>
>>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>>the parent directory for user's to write their per-user job stage
>>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>>user's home directories, owned all by them. For example:
>>>
>>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>>
>>>All staging files from local user 'harsh' are hence written as the
>>>proper user under /user/harsh/.staging since that user does have
>>>permissions to write there. For any user to access HDFS, they'd need a
>>>home directory created on the HDFS by the admin first - and after that
>>>things users do under their own directory, will work just fine. The JT
>>>would not have to try to create per-user directories.
>>>
>>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>>><si...@gmail.com> wrote:
>>>> Thanks everyone, Seem like i hit the dead end.
>>>> It's kind of funny when i read that jira; run it 4 time and everything
>>>> will work.. where that magic number from..lol
>>>>
>>>> respects
>>>>
>>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>>wrote:
>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>>
>>>>> is the bug that Robin is referring to.
>>>>>
>>>>> --
>>>>> Arpit Gupta
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>>><go...@llnl.gov>
>>>>> wrote:
>>>>>
>>>>> This is similar to issues I ran into with permissions/ownership of
>>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>>set
>>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>>job
>>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>>job as
>>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>>it
>>>>> can't due to the ownership/permissions that have been established.
>>>>>
>>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>>and
>>>>> it appears from your experience that there are similar issues with
>>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>>under
>>>>> the user's identity rather than as user mapred.  This is good from a
>>>>> security perspective yet it seems no one bothered to account for this
>>>>>in
>>>>> terms of the permissions that need to be set in the various
>>>>>directories to
>>>>> enable this.
>>>>>
>>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>>attempts to
>>>>> use the fair scheduler on holdŠ
>>>>>
>>>>> Regards,
>>>>> Robin Goldstone, LLNL
>>>>>
>>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>>> from the instruction.
>>>>> The 2% is about hadoop.tmp.dir
>>>>>
>>>>> let's say i have 2 users
>>>>> userA is a user that start hdfs and mapred
>>>>> userB is a regular user
>>>>>
>>>>> if i use default value of  hadoop.tmp.dir
>>>>> /tmp/hadoop-${user.name}
>>>>> I can submit job as usersA but not by usersB
>>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>>> :userA:supergroup:drwxr-xr-x
>>>>>
>>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>>> /tmp/hadoop.
>>>>> This way it is almost a yay way; the thing is
>>>>>
>>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>>> ownership will be userA.userA,
>>>>> and once I tried to submit job from the same machine as userB I will
>>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>>> Permission denied"
>>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>>> be able to submit job.
>>>>>
>>>>> Which is the right approach i should work with?
>>>>> Please suggest
>>>>>
>>>>> Patai
>>>>>
>>>>>
>>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>> Hi Patai,
>>>>>
>>>>> Reply inline.
>>>>>
>>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>>> <si...@gmail.com> wrote:
>>>>>
>>>>> Thanks for input,
>>>>>
>>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>>
>>>>>
>>>>> That version should have the support for all of this.
>>>>>
>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>> can leverage the Per-Queue ACLs
>>>>>
>>>>>
>>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>>> can leverage Per-Queue ACL of each queue.?
>>>>>
>>>>>
>>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>>> Queue configuration.
>>>>>
>>>>> All you need to do is the following:
>>>>>
>>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>>
>>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>>
>>>>> 2. Define your required queues:
>>>>>
>>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>>> default, foo and bar.
>>>>>
>>>>> 3. Define Submit ACLs for each Queue:
>>>>>
>>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>>> (usernames groupnames)
>>>>>
>>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>>
>>>>> Likewise for remaining queues, as you need itŠ
>>>>>
>>>>> 4. Enable ACLs and restart JT.
>>>>>
>>>>> mapred.acls.enabled set to "true"
>>>>>
>>>>> 5. Users then use the right API to set queue names before submitting
>>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>>
>>>>>
>>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>>f
>>>>> .html#setQueueName(java.lang.String)
>>>>>
>>>>> 6. Done.
>>>>>
>>>>> Let us know if this works!
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>--
>>>Harsh J
>>
>
>
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Harsh.. i am testing it again according to your last instruction.

>> 2. Define your required queues:
>>mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>default, foo and bar.

>From http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4/cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons
I couldn't find "mapred.job.queues" from that link so i have been
using mapred.queue.names which might be the case that it is my fault.

Please suggest

On Wed, Oct 17, 2012 at 8:43 AM, Harsh J <ha...@cloudera.com> wrote:
> Hey Robin,
>
> Thanks for the detailed post.
>
> Just looked at your older thread, and you're right, the JT does write
> into its system dir for users' job info and token files when
> initializing the Job. The bug you ran into and the exception+trace you
> got makes sense now.
>
> I just didn't see it on version which Patai seems to be using. I think
> if he specifies a proper staging directory, he'll go through, cause
> his trace is different than that of MAPREDUCE-4398 (i.e. system dir
> vs. staging dir - you had system dir unfortunately).
>
> On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
> <go...@llnl.gov> wrote:
>> Yes, you would think that users shouldn't need to write to
>> mapred.system.dir, yet that seems to be the case.  I posted details about
>> my configuration along with full stack traces last week.  I won't re-post
>> everything but essentially I have mapred.system.dir defined as a directory
>> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
>> but when the job tracker started up it changed the permissions to 700.
>> Then when I ran a job as a regular user I got this error:
>>
>> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
>> failed:
>> org.apache.hadoop.security.AccessControlException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>>
>>
>> I then manually changed the permissions back to 755 and ran again and got
>> this error:
>> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
>> failed:
>> org.apache.hadoop.security.AccessControlException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>>
>> I then changed the permissions to 777 and the job ran successfully.  This
>> suggests that some process was trying to write to write to
>> mapred.system.dir but did not have sufficient permissions.  The
>> speculation is that this was being attempted under my uid instead of
>> mapred.  Perhaps it is something else. I welcome your suggestions.
>>
>>
>> For completeness, I also have mapred.jobtracker.staging.root.dir set to
>> /user within HDFS.  I can verify the staging files are going there but
>> something else is still trying to access mapred.system.dir.
>>
>> Robin Goldstone, LLNL
>>
>> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>>>Hi,
>>>
>>>Regular users never write into the mapred.system.dir AFAICT. That
>>>directory, is just for the JT to use to mark its presence and to
>>>"expose" the distributed filesystem it will be relying on.
>>>
>>>Users write to their respective staging directories, which lies
>>>elsewhere and is per-user.
>>>
>>>Let me post my environment:
>>>
>>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>>the JT can feel free to reconfigure it.
>>>
>>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>>the parent directory for user's to write their per-user job stage
>>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>>user's home directories, owned all by them. For example:
>>>
>>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>>
>>>All staging files from local user 'harsh' are hence written as the
>>>proper user under /user/harsh/.staging since that user does have
>>>permissions to write there. For any user to access HDFS, they'd need a
>>>home directory created on the HDFS by the admin first - and after that
>>>things users do under their own directory, will work just fine. The JT
>>>would not have to try to create per-user directories.
>>>
>>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>>><si...@gmail.com> wrote:
>>>> Thanks everyone, Seem like i hit the dead end.
>>>> It's kind of funny when i read that jira; run it 4 time and everything
>>>> will work.. where that magic number from..lol
>>>>
>>>> respects
>>>>
>>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>>wrote:
>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>>
>>>>> is the bug that Robin is referring to.
>>>>>
>>>>> --
>>>>> Arpit Gupta
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>>><go...@llnl.gov>
>>>>> wrote:
>>>>>
>>>>> This is similar to issues I ran into with permissions/ownership of
>>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>>set
>>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>>job
>>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>>job as
>>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>>it
>>>>> can't due to the ownership/permissions that have been established.
>>>>>
>>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>>and
>>>>> it appears from your experience that there are similar issues with
>>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>>under
>>>>> the user's identity rather than as user mapred.  This is good from a
>>>>> security perspective yet it seems no one bothered to account for this
>>>>>in
>>>>> terms of the permissions that need to be set in the various
>>>>>directories to
>>>>> enable this.
>>>>>
>>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>>attempts to
>>>>> use the fair scheduler on holdŠ
>>>>>
>>>>> Regards,
>>>>> Robin Goldstone, LLNL
>>>>>
>>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>>> from the instruction.
>>>>> The 2% is about hadoop.tmp.dir
>>>>>
>>>>> let's say i have 2 users
>>>>> userA is a user that start hdfs and mapred
>>>>> userB is a regular user
>>>>>
>>>>> if i use default value of  hadoop.tmp.dir
>>>>> /tmp/hadoop-${user.name}
>>>>> I can submit job as usersA but not by usersB
>>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>>> :userA:supergroup:drwxr-xr-x
>>>>>
>>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>>> /tmp/hadoop.
>>>>> This way it is almost a yay way; the thing is
>>>>>
>>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>>> ownership will be userA.userA,
>>>>> and once I tried to submit job from the same machine as userB I will
>>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>>> Permission denied"
>>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>>> be able to submit job.
>>>>>
>>>>> Which is the right approach i should work with?
>>>>> Please suggest
>>>>>
>>>>> Patai
>>>>>
>>>>>
>>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>> Hi Patai,
>>>>>
>>>>> Reply inline.
>>>>>
>>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>>> <si...@gmail.com> wrote:
>>>>>
>>>>> Thanks for input,
>>>>>
>>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>>
>>>>>
>>>>> That version should have the support for all of this.
>>>>>
>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>> can leverage the Per-Queue ACLs
>>>>>
>>>>>
>>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>>> can leverage Per-Queue ACL of each queue.?
>>>>>
>>>>>
>>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>>> Queue configuration.
>>>>>
>>>>> All you need to do is the following:
>>>>>
>>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>>
>>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>>
>>>>> 2. Define your required queues:
>>>>>
>>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>>> default, foo and bar.
>>>>>
>>>>> 3. Define Submit ACLs for each Queue:
>>>>>
>>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>>> (usernames groupnames)
>>>>>
>>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>>
>>>>> Likewise for remaining queues, as you need itŠ
>>>>>
>>>>> 4. Enable ACLs and restart JT.
>>>>>
>>>>> mapred.acls.enabled set to "true"
>>>>>
>>>>> 5. Users then use the right API to set queue names before submitting
>>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>>
>>>>>
>>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>>f
>>>>> .html#setQueueName(java.lang.String)
>>>>>
>>>>> 6. Done.
>>>>>
>>>>> Let us know if this works!
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>--
>>>Harsh J
>>
>
>
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Harsh.. i am testing it again according to your last instruction.

>> 2. Define your required queues:
>>mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>default, foo and bar.

>From http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4/cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons
I couldn't find "mapred.job.queues" from that link so i have been
using mapred.queue.names which might be the case that it is my fault.

Please suggest

On Wed, Oct 17, 2012 at 8:43 AM, Harsh J <ha...@cloudera.com> wrote:
> Hey Robin,
>
> Thanks for the detailed post.
>
> Just looked at your older thread, and you're right, the JT does write
> into its system dir for users' job info and token files when
> initializing the Job. The bug you ran into and the exception+trace you
> got makes sense now.
>
> I just didn't see it on version which Patai seems to be using. I think
> if he specifies a proper staging directory, he'll go through, cause
> his trace is different than that of MAPREDUCE-4398 (i.e. system dir
> vs. staging dir - you had system dir unfortunately).
>
> On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
> <go...@llnl.gov> wrote:
>> Yes, you would think that users shouldn't need to write to
>> mapred.system.dir, yet that seems to be the case.  I posted details about
>> my configuration along with full stack traces last week.  I won't re-post
>> everything but essentially I have mapred.system.dir defined as a directory
>> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
>> but when the job tracker started up it changed the permissions to 700.
>> Then when I ran a job as a regular user I got this error:
>>
>> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
>> failed:
>> org.apache.hadoop.security.AccessControlException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>>
>>
>> I then manually changed the permissions back to 755 and ran again and got
>> this error:
>> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
>> failed:
>> org.apache.hadoop.security.AccessControlException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>>
>> I then changed the permissions to 777 and the job ran successfully.  This
>> suggests that some process was trying to write to write to
>> mapred.system.dir but did not have sufficient permissions.  The
>> speculation is that this was being attempted under my uid instead of
>> mapred.  Perhaps it is something else. I welcome your suggestions.
>>
>>
>> For completeness, I also have mapred.jobtracker.staging.root.dir set to
>> /user within HDFS.  I can verify the staging files are going there but
>> something else is still trying to access mapred.system.dir.
>>
>> Robin Goldstone, LLNL
>>
>> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>>>Hi,
>>>
>>>Regular users never write into the mapred.system.dir AFAICT. That
>>>directory, is just for the JT to use to mark its presence and to
>>>"expose" the distributed filesystem it will be relying on.
>>>
>>>Users write to their respective staging directories, which lies
>>>elsewhere and is per-user.
>>>
>>>Let me post my environment:
>>>
>>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>>the JT can feel free to reconfigure it.
>>>
>>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>>the parent directory for user's to write their per-user job stage
>>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>>user's home directories, owned all by them. For example:
>>>
>>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>>
>>>All staging files from local user 'harsh' are hence written as the
>>>proper user under /user/harsh/.staging since that user does have
>>>permissions to write there. For any user to access HDFS, they'd need a
>>>home directory created on the HDFS by the admin first - and after that
>>>things users do under their own directory, will work just fine. The JT
>>>would not have to try to create per-user directories.
>>>
>>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>>><si...@gmail.com> wrote:
>>>> Thanks everyone, Seem like i hit the dead end.
>>>> It's kind of funny when i read that jira; run it 4 time and everything
>>>> will work.. where that magic number from..lol
>>>>
>>>> respects
>>>>
>>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>>wrote:
>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>>
>>>>> is the bug that Robin is referring to.
>>>>>
>>>>> --
>>>>> Arpit Gupta
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>>><go...@llnl.gov>
>>>>> wrote:
>>>>>
>>>>> This is similar to issues I ran into with permissions/ownership of
>>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>>set
>>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>>job
>>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>>job as
>>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>>it
>>>>> can't due to the ownership/permissions that have been established.
>>>>>
>>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>>and
>>>>> it appears from your experience that there are similar issues with
>>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>>under
>>>>> the user's identity rather than as user mapred.  This is good from a
>>>>> security perspective yet it seems no one bothered to account for this
>>>>>in
>>>>> terms of the permissions that need to be set in the various
>>>>>directories to
>>>>> enable this.
>>>>>
>>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>>attempts to
>>>>> use the fair scheduler on holdŠ
>>>>>
>>>>> Regards,
>>>>> Robin Goldstone, LLNL
>>>>>
>>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>>> from the instruction.
>>>>> The 2% is about hadoop.tmp.dir
>>>>>
>>>>> let's say i have 2 users
>>>>> userA is a user that start hdfs and mapred
>>>>> userB is a regular user
>>>>>
>>>>> if i use default value of  hadoop.tmp.dir
>>>>> /tmp/hadoop-${user.name}
>>>>> I can submit job as usersA but not by usersB
>>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>>> :userA:supergroup:drwxr-xr-x
>>>>>
>>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>>> /tmp/hadoop.
>>>>> This way it is almost a yay way; the thing is
>>>>>
>>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>>> ownership will be userA.userA,
>>>>> and once I tried to submit job from the same machine as userB I will
>>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>>> Permission denied"
>>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>>> be able to submit job.
>>>>>
>>>>> Which is the right approach i should work with?
>>>>> Please suggest
>>>>>
>>>>> Patai
>>>>>
>>>>>
>>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>> Hi Patai,
>>>>>
>>>>> Reply inline.
>>>>>
>>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>>> <si...@gmail.com> wrote:
>>>>>
>>>>> Thanks for input,
>>>>>
>>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>>
>>>>>
>>>>> That version should have the support for all of this.
>>>>>
>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>> can leverage the Per-Queue ACLs
>>>>>
>>>>>
>>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>>> can leverage Per-Queue ACL of each queue.?
>>>>>
>>>>>
>>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>>> Queue configuration.
>>>>>
>>>>> All you need to do is the following:
>>>>>
>>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>>
>>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>>
>>>>> 2. Define your required queues:
>>>>>
>>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>>> default, foo and bar.
>>>>>
>>>>> 3. Define Submit ACLs for each Queue:
>>>>>
>>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>>> (usernames groupnames)
>>>>>
>>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>>
>>>>> Likewise for remaining queues, as you need itŠ
>>>>>
>>>>> 4. Enable ACLs and restart JT.
>>>>>
>>>>> mapred.acls.enabled set to "true"
>>>>>
>>>>> 5. Users then use the right API to set queue names before submitting
>>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>>
>>>>>
>>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>>f
>>>>> .html#setQueueName(java.lang.String)
>>>>>
>>>>> 6. Done.
>>>>>
>>>>> Let us know if this works!
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>--
>>>Harsh J
>>
>
>
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Harsh.. i am testing it again according to your last instruction.

>> 2. Define your required queues:
>>mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>default, foo and bar.

>From http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u4/cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons
I couldn't find "mapred.job.queues" from that link so i have been
using mapred.queue.names which might be the case that it is my fault.

Please suggest

On Wed, Oct 17, 2012 at 8:43 AM, Harsh J <ha...@cloudera.com> wrote:
> Hey Robin,
>
> Thanks for the detailed post.
>
> Just looked at your older thread, and you're right, the JT does write
> into its system dir for users' job info and token files when
> initializing the Job. The bug you ran into and the exception+trace you
> got makes sense now.
>
> I just didn't see it on version which Patai seems to be using. I think
> if he specifies a proper staging directory, he'll go through, cause
> his trace is different than that of MAPREDUCE-4398 (i.e. system dir
> vs. staging dir - you had system dir unfortunately).
>
> On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
> <go...@llnl.gov> wrote:
>> Yes, you would think that users shouldn't need to write to
>> mapred.system.dir, yet that seems to be the case.  I posted details about
>> my configuration along with full stack traces last week.  I won't re-post
>> everything but essentially I have mapred.system.dir defined as a directory
>> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
>> but when the job tracker started up it changed the permissions to 700.
>> Then when I ran a job as a regular user I got this error:
>>
>> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
>> failed:
>> org.apache.hadoop.security.AccessControlException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>>
>>
>> I then manually changed the permissions back to 755 and ran again and got
>> this error:
>> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
>> failed:
>> org.apache.hadoop.security.AccessControlException:
>> org.apache.hadoop.security.AccessControlException: Permission denied:
>> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>>
>> I then changed the permissions to 777 and the job ran successfully.  This
>> suggests that some process was trying to write to write to
>> mapred.system.dir but did not have sufficient permissions.  The
>> speculation is that this was being attempted under my uid instead of
>> mapred.  Perhaps it is something else. I welcome your suggestions.
>>
>>
>> For completeness, I also have mapred.jobtracker.staging.root.dir set to
>> /user within HDFS.  I can verify the staging files are going there but
>> something else is still trying to access mapred.system.dir.
>>
>> Robin Goldstone, LLNL
>>
>> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>>
>>>Hi,
>>>
>>>Regular users never write into the mapred.system.dir AFAICT. That
>>>directory, is just for the JT to use to mark its presence and to
>>>"expose" the distributed filesystem it will be relying on.
>>>
>>>Users write to their respective staging directories, which lies
>>>elsewhere and is per-user.
>>>
>>>Let me post my environment:
>>>
>>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>>the JT can feel free to reconfigure it.
>>>
>>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>>the parent directory for user's to write their per-user job stage
>>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>>user's home directories, owned all by them. For example:
>>>
>>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>>
>>>All staging files from local user 'harsh' are hence written as the
>>>proper user under /user/harsh/.staging since that user does have
>>>permissions to write there. For any user to access HDFS, they'd need a
>>>home directory created on the HDFS by the admin first - and after that
>>>things users do under their own directory, will work just fine. The JT
>>>would not have to try to create per-user directories.
>>>
>>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>>><si...@gmail.com> wrote:
>>>> Thanks everyone, Seem like i hit the dead end.
>>>> It's kind of funny when i read that jira; run it 4 time and everything
>>>> will work.. where that magic number from..lol
>>>>
>>>> respects
>>>>
>>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>>wrote:
>>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>>
>>>>> is the bug that Robin is referring to.
>>>>>
>>>>> --
>>>>> Arpit Gupta
>>>>> Hortonworks Inc.
>>>>> http://hortonworks.com/
>>>>>
>>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>>><go...@llnl.gov>
>>>>> wrote:
>>>>>
>>>>> This is similar to issues I ran into with permissions/ownership of
>>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>>set
>>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>>job
>>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>>job as
>>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>>it
>>>>> can't due to the ownership/permissions that have been established.
>>>>>
>>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>>and
>>>>> it appears from your experience that there are similar issues with
>>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>>under
>>>>> the user's identity rather than as user mapred.  This is good from a
>>>>> security perspective yet it seems no one bothered to account for this
>>>>>in
>>>>> terms of the permissions that need to be set in the various
>>>>>directories to
>>>>> enable this.
>>>>>
>>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>>attempts to
>>>>> use the fair scheduler on holdŠ
>>>>>
>>>>> Regards,
>>>>> Robin Goldstone, LLNL
>>>>>
>>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Hi Harsh,
>>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>>> from the instruction.
>>>>> The 2% is about hadoop.tmp.dir
>>>>>
>>>>> let's say i have 2 users
>>>>> userA is a user that start hdfs and mapred
>>>>> userB is a regular user
>>>>>
>>>>> if i use default value of  hadoop.tmp.dir
>>>>> /tmp/hadoop-${user.name}
>>>>> I can submit job as usersA but not by usersB
>>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>>> :userA:supergroup:drwxr-xr-x
>>>>>
>>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>>> /tmp/hadoop.
>>>>> This way it is almost a yay way; the thing is
>>>>>
>>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>>> ownership will be userA.userA,
>>>>> and once I tried to submit job from the same machine as userB I will
>>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>>> Permission denied"
>>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>>> be able to submit job.
>>>>>
>>>>> Which is the right approach i should work with?
>>>>> Please suggest
>>>>>
>>>>> Patai
>>>>>
>>>>>
>>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>>
>>>>> Hi Patai,
>>>>>
>>>>> Reply inline.
>>>>>
>>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>>> <si...@gmail.com> wrote:
>>>>>
>>>>> Thanks for input,
>>>>>
>>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>>
>>>>>
>>>>> That version should have the support for all of this.
>>>>>
>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>> can leverage the Per-Queue ACLs
>>>>>
>>>>>
>>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>>> can leverage Per-Queue ACL of each queue.?
>>>>>
>>>>>
>>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>>> Queue configuration.
>>>>>
>>>>> All you need to do is the following:
>>>>>
>>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>>
>>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>>
>>>>> 2. Define your required queues:
>>>>>
>>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>>> default, foo and bar.
>>>>>
>>>>> 3. Define Submit ACLs for each Queue:
>>>>>
>>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>>> (usernames groupnames)
>>>>>
>>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>>
>>>>> Likewise for remaining queues, as you need itŠ
>>>>>
>>>>> 4. Enable ACLs and restart JT.
>>>>>
>>>>> mapred.acls.enabled set to "true"
>>>>>
>>>>> 5. Users then use the right API to set queue names before submitting
>>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>>
>>>>>
>>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>>f
>>>>> .html#setQueueName(java.lang.String)
>>>>>
>>>>> 6. Done.
>>>>>
>>>>> Let us know if this works!
>>>>>
>>>>> --
>>>>> Harsh J
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>>>--
>>>Harsh J
>>
>
>
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hey Robin,

Thanks for the detailed post.

Just looked at your older thread, and you're right, the JT does write
into its system dir for users' job info and token files when
initializing the Job. The bug you ran into and the exception+trace you
got makes sense now.

I just didn't see it on version which Patai seems to be using. I think
if he specifies a proper staging directory, he'll go through, cause
his trace is different than that of MAPREDUCE-4398 (i.e. system dir
vs. staging dir - you had system dir unfortunately).

On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
<go...@llnl.gov> wrote:
> Yes, you would think that users shouldn't need to write to
> mapred.system.dir, yet that seems to be the case.  I posted details about
> my configuration along with full stack traces last week.  I won't re-post
> everything but essentially I have mapred.system.dir defined as a directory
> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
> but when the job tracker started up it changed the permissions to 700.
> Then when I ran a job as a regular user I got this error:
>
> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
> failed:
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>
>
> I then manually changed the permissions back to 755 and ran again and got
> this error:
> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
> failed:
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>
> I then changed the permissions to 777 and the job ran successfully.  This
> suggests that some process was trying to write to write to
> mapred.system.dir but did not have sufficient permissions.  The
> speculation is that this was being attempted under my uid instead of
> mapred.  Perhaps it is something else. I welcome your suggestions.
>
>
> For completeness, I also have mapred.jobtracker.staging.root.dir set to
> /user within HDFS.  I can verify the staging files are going there but
> something else is still trying to access mapred.system.dir.
>
> Robin Goldstone, LLNL
>
> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>Hi,
>>
>>Regular users never write into the mapred.system.dir AFAICT. That
>>directory, is just for the JT to use to mark its presence and to
>>"expose" the distributed filesystem it will be relying on.
>>
>>Users write to their respective staging directories, which lies
>>elsewhere and is per-user.
>>
>>Let me post my environment:
>>
>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>the JT can feel free to reconfigure it.
>>
>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>the parent directory for user's to write their per-user job stage
>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>user's home directories, owned all by them. For example:
>>
>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>
>>All staging files from local user 'harsh' are hence written as the
>>proper user under /user/harsh/.staging since that user does have
>>permissions to write there. For any user to access HDFS, they'd need a
>>home directory created on the HDFS by the admin first - and after that
>>things users do under their own directory, will work just fine. The JT
>>would not have to try to create per-user directories.
>>
>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>><si...@gmail.com> wrote:
>>> Thanks everyone, Seem like i hit the dead end.
>>> It's kind of funny when i read that jira; run it 4 time and everything
>>> will work.. where that magic number from..lol
>>>
>>> respects
>>>
>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>wrote:
>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>
>>>> is the bug that Robin is referring to.
>>>>
>>>> --
>>>> Arpit Gupta
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>><go...@llnl.gov>
>>>> wrote:
>>>>
>>>> This is similar to issues I ran into with permissions/ownership of
>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>set
>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>job
>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>job as
>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>it
>>>> can't due to the ownership/permissions that have been established.
>>>>
>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>and
>>>> it appears from your experience that there are similar issues with
>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>under
>>>> the user's identity rather than as user mapred.  This is good from a
>>>> security perspective yet it seems no one bothered to account for this
>>>>in
>>>> terms of the permissions that need to be set in the various
>>>>directories to
>>>> enable this.
>>>>
>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>attempts to
>>>> use the fair scheduler on holdŠ
>>>>
>>>> Regards,
>>>> Robin Goldstone, LLNL
>>>>
>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Harsh,
>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>> from the instruction.
>>>> The 2% is about hadoop.tmp.dir
>>>>
>>>> let's say i have 2 users
>>>> userA is a user that start hdfs and mapred
>>>> userB is a regular user
>>>>
>>>> if i use default value of  hadoop.tmp.dir
>>>> /tmp/hadoop-${user.name}
>>>> I can submit job as usersA but not by usersB
>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>> :userA:supergroup:drwxr-xr-x
>>>>
>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>> /tmp/hadoop.
>>>> This way it is almost a yay way; the thing is
>>>>
>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>> ownership will be userA.userA,
>>>> and once I tried to submit job from the same machine as userB I will
>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>> Permission denied"
>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>> be able to submit job.
>>>>
>>>> Which is the right approach i should work with?
>>>> Please suggest
>>>>
>>>> Patai
>>>>
>>>>
>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Hi Patai,
>>>>
>>>> Reply inline.
>>>>
>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>> <si...@gmail.com> wrote:
>>>>
>>>> Thanks for input,
>>>>
>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>
>>>>
>>>> That version should have the support for all of this.
>>>>
>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>> can leverage the Per-Queue ACLs
>>>>
>>>>
>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>> can leverage Per-Queue ACL of each queue.?
>>>>
>>>>
>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>> Queue configuration.
>>>>
>>>> All you need to do is the following:
>>>>
>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>
>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>
>>>> 2. Define your required queues:
>>>>
>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>> default, foo and bar.
>>>>
>>>> 3. Define Submit ACLs for each Queue:
>>>>
>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>> (usernames groupnames)
>>>>
>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>
>>>> Likewise for remaining queues, as you need itŠ
>>>>
>>>> 4. Enable ACLs and restart JT.
>>>>
>>>> mapred.acls.enabled set to "true"
>>>>
>>>> 5. Users then use the right API to set queue names before submitting
>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>
>>>>
>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>f
>>>> .html#setQueueName(java.lang.String)
>>>>
>>>> 6. Done.
>>>>
>>>> Let us know if this works!
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>>
>>>>
>>
>>
>>
>>--
>>Harsh J
>



-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hey Robin,

Thanks for the detailed post.

Just looked at your older thread, and you're right, the JT does write
into its system dir for users' job info and token files when
initializing the Job. The bug you ran into and the exception+trace you
got makes sense now.

I just didn't see it on version which Patai seems to be using. I think
if he specifies a proper staging directory, he'll go through, cause
his trace is different than that of MAPREDUCE-4398 (i.e. system dir
vs. staging dir - you had system dir unfortunately).

On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
<go...@llnl.gov> wrote:
> Yes, you would think that users shouldn't need to write to
> mapred.system.dir, yet that seems to be the case.  I posted details about
> my configuration along with full stack traces last week.  I won't re-post
> everything but essentially I have mapred.system.dir defined as a directory
> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
> but when the job tracker started up it changed the permissions to 700.
> Then when I ran a job as a regular user I got this error:
>
> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
> failed:
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>
>
> I then manually changed the permissions back to 755 and ran again and got
> this error:
> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
> failed:
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>
> I then changed the permissions to 777 and the job ran successfully.  This
> suggests that some process was trying to write to write to
> mapred.system.dir but did not have sufficient permissions.  The
> speculation is that this was being attempted under my uid instead of
> mapred.  Perhaps it is something else. I welcome your suggestions.
>
>
> For completeness, I also have mapred.jobtracker.staging.root.dir set to
> /user within HDFS.  I can verify the staging files are going there but
> something else is still trying to access mapred.system.dir.
>
> Robin Goldstone, LLNL
>
> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>Hi,
>>
>>Regular users never write into the mapred.system.dir AFAICT. That
>>directory, is just for the JT to use to mark its presence and to
>>"expose" the distributed filesystem it will be relying on.
>>
>>Users write to their respective staging directories, which lies
>>elsewhere and is per-user.
>>
>>Let me post my environment:
>>
>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>the JT can feel free to reconfigure it.
>>
>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>the parent directory for user's to write their per-user job stage
>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>user's home directories, owned all by them. For example:
>>
>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>
>>All staging files from local user 'harsh' are hence written as the
>>proper user under /user/harsh/.staging since that user does have
>>permissions to write there. For any user to access HDFS, they'd need a
>>home directory created on the HDFS by the admin first - and after that
>>things users do under their own directory, will work just fine. The JT
>>would not have to try to create per-user directories.
>>
>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>><si...@gmail.com> wrote:
>>> Thanks everyone, Seem like i hit the dead end.
>>> It's kind of funny when i read that jira; run it 4 time and everything
>>> will work.. where that magic number from..lol
>>>
>>> respects
>>>
>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>wrote:
>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>
>>>> is the bug that Robin is referring to.
>>>>
>>>> --
>>>> Arpit Gupta
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>><go...@llnl.gov>
>>>> wrote:
>>>>
>>>> This is similar to issues I ran into with permissions/ownership of
>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>set
>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>job
>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>job as
>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>it
>>>> can't due to the ownership/permissions that have been established.
>>>>
>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>and
>>>> it appears from your experience that there are similar issues with
>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>under
>>>> the user's identity rather than as user mapred.  This is good from a
>>>> security perspective yet it seems no one bothered to account for this
>>>>in
>>>> terms of the permissions that need to be set in the various
>>>>directories to
>>>> enable this.
>>>>
>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>attempts to
>>>> use the fair scheduler on holdŠ
>>>>
>>>> Regards,
>>>> Robin Goldstone, LLNL
>>>>
>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Harsh,
>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>> from the instruction.
>>>> The 2% is about hadoop.tmp.dir
>>>>
>>>> let's say i have 2 users
>>>> userA is a user that start hdfs and mapred
>>>> userB is a regular user
>>>>
>>>> if i use default value of  hadoop.tmp.dir
>>>> /tmp/hadoop-${user.name}
>>>> I can submit job as usersA but not by usersB
>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>> :userA:supergroup:drwxr-xr-x
>>>>
>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>> /tmp/hadoop.
>>>> This way it is almost a yay way; the thing is
>>>>
>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>> ownership will be userA.userA,
>>>> and once I tried to submit job from the same machine as userB I will
>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>> Permission denied"
>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>> be able to submit job.
>>>>
>>>> Which is the right approach i should work with?
>>>> Please suggest
>>>>
>>>> Patai
>>>>
>>>>
>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Hi Patai,
>>>>
>>>> Reply inline.
>>>>
>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>> <si...@gmail.com> wrote:
>>>>
>>>> Thanks for input,
>>>>
>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>
>>>>
>>>> That version should have the support for all of this.
>>>>
>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>> can leverage the Per-Queue ACLs
>>>>
>>>>
>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>> can leverage Per-Queue ACL of each queue.?
>>>>
>>>>
>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>> Queue configuration.
>>>>
>>>> All you need to do is the following:
>>>>
>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>
>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>
>>>> 2. Define your required queues:
>>>>
>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>> default, foo and bar.
>>>>
>>>> 3. Define Submit ACLs for each Queue:
>>>>
>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>> (usernames groupnames)
>>>>
>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>
>>>> Likewise for remaining queues, as you need itŠ
>>>>
>>>> 4. Enable ACLs and restart JT.
>>>>
>>>> mapred.acls.enabled set to "true"
>>>>
>>>> 5. Users then use the right API to set queue names before submitting
>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>
>>>>
>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>f
>>>> .html#setQueueName(java.lang.String)
>>>>
>>>> 6. Done.
>>>>
>>>> Let us know if this works!
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>>
>>>>
>>
>>
>>
>>--
>>Harsh J
>



-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hey Robin,

Thanks for the detailed post.

Just looked at your older thread, and you're right, the JT does write
into its system dir for users' job info and token files when
initializing the Job. The bug you ran into and the exception+trace you
got makes sense now.

I just didn't see it on version which Patai seems to be using. I think
if he specifies a proper staging directory, he'll go through, cause
his trace is different than that of MAPREDUCE-4398 (i.e. system dir
vs. staging dir - you had system dir unfortunately).

On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
<go...@llnl.gov> wrote:
> Yes, you would think that users shouldn't need to write to
> mapred.system.dir, yet that seems to be the case.  I posted details about
> my configuration along with full stack traces last week.  I won't re-post
> everything but essentially I have mapred.system.dir defined as a directory
> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
> but when the job tracker started up it changed the permissions to 700.
> Then when I ran a job as a regular user I got this error:
>
> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
> failed:
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>
>
> I then manually changed the permissions back to 755 and ran again and got
> this error:
> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
> failed:
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>
> I then changed the permissions to 777 and the job ran successfully.  This
> suggests that some process was trying to write to write to
> mapred.system.dir but did not have sufficient permissions.  The
> speculation is that this was being attempted under my uid instead of
> mapred.  Perhaps it is something else. I welcome your suggestions.
>
>
> For completeness, I also have mapred.jobtracker.staging.root.dir set to
> /user within HDFS.  I can verify the staging files are going there but
> something else is still trying to access mapred.system.dir.
>
> Robin Goldstone, LLNL
>
> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>Hi,
>>
>>Regular users never write into the mapred.system.dir AFAICT. That
>>directory, is just for the JT to use to mark its presence and to
>>"expose" the distributed filesystem it will be relying on.
>>
>>Users write to their respective staging directories, which lies
>>elsewhere and is per-user.
>>
>>Let me post my environment:
>>
>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>the JT can feel free to reconfigure it.
>>
>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>the parent directory for user's to write their per-user job stage
>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>user's home directories, owned all by them. For example:
>>
>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>
>>All staging files from local user 'harsh' are hence written as the
>>proper user under /user/harsh/.staging since that user does have
>>permissions to write there. For any user to access HDFS, they'd need a
>>home directory created on the HDFS by the admin first - and after that
>>things users do under their own directory, will work just fine. The JT
>>would not have to try to create per-user directories.
>>
>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>><si...@gmail.com> wrote:
>>> Thanks everyone, Seem like i hit the dead end.
>>> It's kind of funny when i read that jira; run it 4 time and everything
>>> will work.. where that magic number from..lol
>>>
>>> respects
>>>
>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>wrote:
>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>
>>>> is the bug that Robin is referring to.
>>>>
>>>> --
>>>> Arpit Gupta
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>><go...@llnl.gov>
>>>> wrote:
>>>>
>>>> This is similar to issues I ran into with permissions/ownership of
>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>set
>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>job
>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>job as
>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>it
>>>> can't due to the ownership/permissions that have been established.
>>>>
>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>and
>>>> it appears from your experience that there are similar issues with
>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>under
>>>> the user's identity rather than as user mapred.  This is good from a
>>>> security perspective yet it seems no one bothered to account for this
>>>>in
>>>> terms of the permissions that need to be set in the various
>>>>directories to
>>>> enable this.
>>>>
>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>attempts to
>>>> use the fair scheduler on holdŠ
>>>>
>>>> Regards,
>>>> Robin Goldstone, LLNL
>>>>
>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Harsh,
>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>> from the instruction.
>>>> The 2% is about hadoop.tmp.dir
>>>>
>>>> let's say i have 2 users
>>>> userA is a user that start hdfs and mapred
>>>> userB is a regular user
>>>>
>>>> if i use default value of  hadoop.tmp.dir
>>>> /tmp/hadoop-${user.name}
>>>> I can submit job as usersA but not by usersB
>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>> :userA:supergroup:drwxr-xr-x
>>>>
>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>> /tmp/hadoop.
>>>> This way it is almost a yay way; the thing is
>>>>
>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>> ownership will be userA.userA,
>>>> and once I tried to submit job from the same machine as userB I will
>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>> Permission denied"
>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>> be able to submit job.
>>>>
>>>> Which is the right approach i should work with?
>>>> Please suggest
>>>>
>>>> Patai
>>>>
>>>>
>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Hi Patai,
>>>>
>>>> Reply inline.
>>>>
>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>> <si...@gmail.com> wrote:
>>>>
>>>> Thanks for input,
>>>>
>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>
>>>>
>>>> That version should have the support for all of this.
>>>>
>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>> can leverage the Per-Queue ACLs
>>>>
>>>>
>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>> can leverage Per-Queue ACL of each queue.?
>>>>
>>>>
>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>> Queue configuration.
>>>>
>>>> All you need to do is the following:
>>>>
>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>
>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>
>>>> 2. Define your required queues:
>>>>
>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>> default, foo and bar.
>>>>
>>>> 3. Define Submit ACLs for each Queue:
>>>>
>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>> (usernames groupnames)
>>>>
>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>
>>>> Likewise for remaining queues, as you need itŠ
>>>>
>>>> 4. Enable ACLs and restart JT.
>>>>
>>>> mapred.acls.enabled set to "true"
>>>>
>>>> 5. Users then use the right API to set queue names before submitting
>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>
>>>>
>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>f
>>>> .html#setQueueName(java.lang.String)
>>>>
>>>> 6. Done.
>>>>
>>>> Let us know if this works!
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>>
>>>>
>>
>>
>>
>>--
>>Harsh J
>



-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hey Robin,

Thanks for the detailed post.

Just looked at your older thread, and you're right, the JT does write
into its system dir for users' job info and token files when
initializing the Job. The bug you ran into and the exception+trace you
got makes sense now.

I just didn't see it on version which Patai seems to be using. I think
if he specifies a proper staging directory, he'll go through, cause
his trace is different than that of MAPREDUCE-4398 (i.e. system dir
vs. staging dir - you had system dir unfortunately).

On Wed, Oct 17, 2012 at 8:39 PM, Goldstone, Robin J.
<go...@llnl.gov> wrote:
> Yes, you would think that users shouldn't need to write to
> mapred.system.dir, yet that seems to be the case.  I posted details about
> my configuration along with full stack traces last week.  I won't re-post
> everything but essentially I have mapred.system.dir defined as a directory
> in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
> but when the job tracker started up it changed the permissions to 700.
> Then when I ran a job as a regular user I got this error:
>
> 12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
> failed:
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------
>
>
> I then manually changed the permissions back to 755 and ran again and got
> this error:
> 12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
> failed:
> org.apache.hadoop.security.AccessControlException:
> org.apache.hadoop.security.AccessControlException: Permission denied:
> user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x
>
> I then changed the permissions to 777 and the job ran successfully.  This
> suggests that some process was trying to write to write to
> mapred.system.dir but did not have sufficient permissions.  The
> speculation is that this was being attempted under my uid instead of
> mapred.  Perhaps it is something else. I welcome your suggestions.
>
>
> For completeness, I also have mapred.jobtracker.staging.root.dir set to
> /user within HDFS.  I can verify the staging files are going there but
> something else is still trying to access mapred.system.dir.
>
> Robin Goldstone, LLNL
>
> On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:
>
>>Hi,
>>
>>Regular users never write into the mapred.system.dir AFAICT. That
>>directory, is just for the JT to use to mark its presence and to
>>"expose" the distributed filesystem it will be relying on.
>>
>>Users write to their respective staging directories, which lies
>>elsewhere and is per-user.
>>
>>Let me post my environment:
>>
>>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>>whatever you configure it to) is to be owned by mapred:hadoop so that
>>the JT can feel free to reconfigure it.
>>
>>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>>the parent directory for user's to write their per-user job stage
>>files (JARs, etc.)) is set to "/user". The /user further contains each
>>user's home directories, owned all by them. For example:
>>
>>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>>
>>All staging files from local user 'harsh' are hence written as the
>>proper user under /user/harsh/.staging since that user does have
>>permissions to write there. For any user to access HDFS, they'd need a
>>home directory created on the HDFS by the admin first - and after that
>>things users do under their own directory, will work just fine. The JT
>>would not have to try to create per-user directories.
>>
>>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
>><si...@gmail.com> wrote:
>>> Thanks everyone, Seem like i hit the dead end.
>>> It's kind of funny when i read that jira; run it 4 time and everything
>>> will work.. where that magic number from..lol
>>>
>>> respects
>>>
>>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>>wrote:
>>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>>
>>>> is the bug that Robin is referring to.
>>>>
>>>> --
>>>> Arpit Gupta
>>>> Hortonworks Inc.
>>>> http://hortonworks.com/
>>>>
>>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>>><go...@llnl.gov>
>>>> wrote:
>>>>
>>>> This is similar to issues I ran into with permissions/ownership of
>>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>>set
>>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>>job
>>>> tracker starts up (running as user mapred) it explicitly sets the
>>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>>job as
>>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>>it
>>>> can't due to the ownership/permissions that have been established.
>>>>
>>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>>and
>>>> it appears from your experience that there are similar issues with
>>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>>under
>>>> the user's identity rather than as user mapred.  This is good from a
>>>> security perspective yet it seems no one bothered to account for this
>>>>in
>>>> terms of the permissions that need to be set in the various
>>>>directories to
>>>> enable this.
>>>>
>>>> Until this is sorted out by the Hadoop developers, I've put my
>>>>attempts to
>>>> use the fair scheduler on holdŠ
>>>>
>>>> Regards,
>>>> Robin Goldstone, LLNL
>>>>
>>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>>> wrote:
>>>>
>>>> Hi Harsh,
>>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>>> from the instruction.
>>>> The 2% is about hadoop.tmp.dir
>>>>
>>>> let's say i have 2 users
>>>> userA is a user that start hdfs and mapred
>>>> userB is a regular user
>>>>
>>>> if i use default value of  hadoop.tmp.dir
>>>> /tmp/hadoop-${user.name}
>>>> I can submit job as usersA but not by usersB
>>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>>> :userA:supergroup:drwxr-xr-x
>>>>
>>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>>> /tmp/hadoop.
>>>> This way it is almost a yay way; the thing is
>>>>
>>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>>> ownership will be userA.userA,
>>>> and once I tried to submit job from the same machine as userB I will
>>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>>> Permission denied"
>>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>>> be able to submit job.
>>>>
>>>> Which is the right approach i should work with?
>>>> Please suggest
>>>>
>>>> Patai
>>>>
>>>>
>>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>>
>>>> Hi Patai,
>>>>
>>>> Reply inline.
>>>>
>>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>>> <si...@gmail.com> wrote:
>>>>
>>>> Thanks for input,
>>>>
>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>>
>>>>
>>>> That version should have the support for all of this.
>>>>
>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>> can leverage the Per-Queue ACLs
>>>>
>>>>
>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>> can leverage Per-Queue ACL of each queue.?
>>>>
>>>>
>>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>>> Queue configuration.
>>>>
>>>> All you need to do is the following:
>>>>
>>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>>
>>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>>
>>>> 2. Define your required queues:
>>>>
>>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>>> default, foo and bar.
>>>>
>>>> 3. Define Submit ACLs for each Queue:
>>>>
>>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>>> (usernames groupnames)
>>>>
>>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>>
>>>> Likewise for remaining queues, as you need itŠ
>>>>
>>>> 4. Enable ACLs and restart JT.
>>>>
>>>> mapred.acls.enabled set to "true"
>>>>
>>>> 5. Users then use the right API to set queue names before submitting
>>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>>
>>>>
>>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>>f
>>>> .html#setQueueName(java.lang.String)
>>>>
>>>> 6. Done.
>>>>
>>>> Let us know if this works!
>>>>
>>>> --
>>>> Harsh J
>>>>
>>>>
>>>>
>>
>>
>>
>>--
>>Harsh J
>



-- 
Harsh J

Re: Fair scheduler.

Posted by "Goldstone, Robin J." <go...@llnl.gov>.

Yes, you would think that users shouldn't need to write to
mapred.system.dir, yet that seems to be the case.  I posted details about
my configuration along with full stack traces last week.  I won't re-post
everything but essentially I have mapred.system.dir defined as a directory
in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
but when the job tracker started up it changed the permissions to 700.
Then when I ran a job as a regular user I got this error:

12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
failed:
org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------


I then manually changed the permissions back to 755 and ran again and got
this error:
12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
failed:
org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x

I then changed the permissions to 777 and the job ran successfully.  This
suggests that some process was trying to write to write to
mapred.system.dir but did not have sufficient permissions.  The
speculation is that this was being attempted under my uid instead of
mapred.  Perhaps it is something else. I welcome your suggestions.


For completeness, I also have mapred.jobtracker.staging.root.dir set to
/user within HDFS.  I can verify the staging files are going there but
something else is still trying to access mapred.system.dir.

Robin Goldstone, LLNL

On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:

>Hi,
>
>Regular users never write into the mapred.system.dir AFAICT. That
>directory, is just for the JT to use to mark its presence and to
>"expose" the distributed filesystem it will be relying on.
>
>Users write to their respective staging directories, which lies
>elsewhere and is per-user.
>
>Let me post my environment:
>
>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>whatever you configure it to) is to be owned by mapred:hadoop so that
>the JT can feel free to reconfigure it.
>
>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>the parent directory for user's to write their per-user job stage
>files (JARs, etc.)) is set to "/user". The /user further contains each
>user's home directories, owned all by them. For example:
>
>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>
>All staging files from local user 'harsh' are hence written as the
>proper user under /user/harsh/.staging since that user does have
>permissions to write there. For any user to access HDFS, they'd need a
>home directory created on the HDFS by the admin first - and after that
>things users do under their own directory, will work just fine. The JT
>would not have to try to create per-user directories.
>
>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
><si...@gmail.com> wrote:
>> Thanks everyone, Seem like i hit the dead end.
>> It's kind of funny when i read that jira; run it 4 time and everything
>> will work.. where that magic number from..lol
>>
>> respects
>>
>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>wrote:
>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>
>>> is the bug that Robin is referring to.
>>>
>>> --
>>> Arpit Gupta
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>><go...@llnl.gov>
>>> wrote:
>>>
>>> This is similar to issues I ran into with permissions/ownership of
>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>set
>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>job
>>> tracker starts up (running as user mapred) it explicitly sets the
>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>job as
>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>it
>>> can't due to the ownership/permissions that have been established.
>>>
>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>and
>>> it appears from your experience that there are similar issues with
>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>under
>>> the user's identity rather than as user mapred.  This is good from a
>>> security perspective yet it seems no one bothered to account for this
>>>in
>>> terms of the permissions that need to be set in the various
>>>directories to
>>> enable this.
>>>
>>> Until this is sorted out by the Hadoop developers, I've put my
>>>attempts to
>>> use the fair scheduler on holdŠ
>>>
>>> Regards,
>>> Robin Goldstone, LLNL
>>>
>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>> wrote:
>>>
>>> Hi Harsh,
>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>> from the instruction.
>>> The 2% is about hadoop.tmp.dir
>>>
>>> let's say i have 2 users
>>> userA is a user that start hdfs and mapred
>>> userB is a regular user
>>>
>>> if i use default value of  hadoop.tmp.dir
>>> /tmp/hadoop-${user.name}
>>> I can submit job as usersA but not by usersB
>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>> :userA:supergroup:drwxr-xr-x
>>>
>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>> /tmp/hadoop.
>>> This way it is almost a yay way; the thing is
>>>
>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>> ownership will be userA.userA,
>>> and once I tried to submit job from the same machine as userB I will
>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>> Permission denied"
>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>> be able to submit job.
>>>
>>> Which is the right approach i should work with?
>>> Please suggest
>>>
>>> Patai
>>>
>>>
>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Hi Patai,
>>>
>>> Reply inline.
>>>
>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>> <si...@gmail.com> wrote:
>>>
>>> Thanks for input,
>>>
>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>
>>>
>>> That version should have the support for all of this.
>>>
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>>>
>>>
>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>> configure 3 queues of capacity scheduler. in order to have each pool
>>> can leverage Per-Queue ACL of each queue.?
>>>
>>>
>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>> Queue configuration.
>>>
>>> All you need to do is the following:
>>>
>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>
>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>
>>> 2. Define your required queues:
>>>
>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>> default, foo and bar.
>>>
>>> 3. Define Submit ACLs for each Queue:
>>>
>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>> (usernames groupnames)
>>>
>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>
>>> Likewise for remaining queues, as you need itŠ
>>>
>>> 4. Enable ACLs and restart JT.
>>>
>>> mapred.acls.enabled set to "true"
>>>
>>> 5. Users then use the right API to set queue names before submitting
>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>
>>> 
>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>f
>>> .html#setQueueName(java.lang.String)
>>>
>>> 6. Done.
>>>
>>> Let us know if this works!
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>
>
>
>-- 
>Harsh J

Re: Fair scheduler.

Posted by "Goldstone, Robin J." <go...@llnl.gov>.

Yes, you would think that users shouldn't need to write to
mapred.system.dir, yet that seems to be the case.  I posted details about
my configuration along with full stack traces last week.  I won't re-post
everything but essentially I have mapred.system.dir defined as a directory
in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
but when the job tracker started up it changed the permissions to 700.
Then when I ran a job as a regular user I got this error:

12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
failed:
org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------


I then manually changed the permissions back to 755 and ran again and got
this error:
12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
failed:
org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x

I then changed the permissions to 777 and the job ran successfully.  This
suggests that some process was trying to write to write to
mapred.system.dir but did not have sufficient permissions.  The
speculation is that this was being attempted under my uid instead of
mapred.  Perhaps it is something else. I welcome your suggestions.


For completeness, I also have mapred.jobtracker.staging.root.dir set to
/user within HDFS.  I can verify the staging files are going there but
something else is still trying to access mapred.system.dir.

Robin Goldstone, LLNL

On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:

>Hi,
>
>Regular users never write into the mapred.system.dir AFAICT. That
>directory, is just for the JT to use to mark its presence and to
>"expose" the distributed filesystem it will be relying on.
>
>Users write to their respective staging directories, which lies
>elsewhere and is per-user.
>
>Let me post my environment:
>
>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>whatever you configure it to) is to be owned by mapred:hadoop so that
>the JT can feel free to reconfigure it.
>
>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>the parent directory for user's to write their per-user job stage
>files (JARs, etc.)) is set to "/user". The /user further contains each
>user's home directories, owned all by them. For example:
>
>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>
>All staging files from local user 'harsh' are hence written as the
>proper user under /user/harsh/.staging since that user does have
>permissions to write there. For any user to access HDFS, they'd need a
>home directory created on the HDFS by the admin first - and after that
>things users do under their own directory, will work just fine. The JT
>would not have to try to create per-user directories.
>
>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
><si...@gmail.com> wrote:
>> Thanks everyone, Seem like i hit the dead end.
>> It's kind of funny when i read that jira; run it 4 time and everything
>> will work.. where that magic number from..lol
>>
>> respects
>>
>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>wrote:
>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>
>>> is the bug that Robin is referring to.
>>>
>>> --
>>> Arpit Gupta
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>><go...@llnl.gov>
>>> wrote:
>>>
>>> This is similar to issues I ran into with permissions/ownership of
>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>set
>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>job
>>> tracker starts up (running as user mapred) it explicitly sets the
>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>job as
>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>it
>>> can't due to the ownership/permissions that have been established.
>>>
>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>and
>>> it appears from your experience that there are similar issues with
>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>under
>>> the user's identity rather than as user mapred.  This is good from a
>>> security perspective yet it seems no one bothered to account for this
>>>in
>>> terms of the permissions that need to be set in the various
>>>directories to
>>> enable this.
>>>
>>> Until this is sorted out by the Hadoop developers, I've put my
>>>attempts to
>>> use the fair scheduler on holdŠ
>>>
>>> Regards,
>>> Robin Goldstone, LLNL
>>>
>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>> wrote:
>>>
>>> Hi Harsh,
>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>> from the instruction.
>>> The 2% is about hadoop.tmp.dir
>>>
>>> let's say i have 2 users
>>> userA is a user that start hdfs and mapred
>>> userB is a regular user
>>>
>>> if i use default value of  hadoop.tmp.dir
>>> /tmp/hadoop-${user.name}
>>> I can submit job as usersA but not by usersB
>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>> :userA:supergroup:drwxr-xr-x
>>>
>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>> /tmp/hadoop.
>>> This way it is almost a yay way; the thing is
>>>
>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>> ownership will be userA.userA,
>>> and once I tried to submit job from the same machine as userB I will
>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>> Permission denied"
>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>> be able to submit job.
>>>
>>> Which is the right approach i should work with?
>>> Please suggest
>>>
>>> Patai
>>>
>>>
>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Hi Patai,
>>>
>>> Reply inline.
>>>
>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>> <si...@gmail.com> wrote:
>>>
>>> Thanks for input,
>>>
>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>
>>>
>>> That version should have the support for all of this.
>>>
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>>>
>>>
>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>> configure 3 queues of capacity scheduler. in order to have each pool
>>> can leverage Per-Queue ACL of each queue.?
>>>
>>>
>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>> Queue configuration.
>>>
>>> All you need to do is the following:
>>>
>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>
>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>
>>> 2. Define your required queues:
>>>
>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>> default, foo and bar.
>>>
>>> 3. Define Submit ACLs for each Queue:
>>>
>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>> (usernames groupnames)
>>>
>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>
>>> Likewise for remaining queues, as you need itŠ
>>>
>>> 4. Enable ACLs and restart JT.
>>>
>>> mapred.acls.enabled set to "true"
>>>
>>> 5. Users then use the right API to set queue names before submitting
>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>
>>> 
>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>f
>>> .html#setQueueName(java.lang.String)
>>>
>>> 6. Done.
>>>
>>> Let us know if this works!
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>
>
>
>-- 
>Harsh J

Re: Fair scheduler.

Posted by "Goldstone, Robin J." <go...@llnl.gov>.

Yes, you would think that users shouldn't need to write to
mapred.system.dir, yet that seems to be the case.  I posted details about
my configuration along with full stack traces last week.  I won't re-post
everything but essentially I have mapred.system.dir defined as a directory
in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
but when the job tracker started up it changed the permissions to 700.
Then when I ran a job as a regular user I got this error:

12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
failed:
org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------


I then manually changed the permissions back to 755 and ran again and got
this error:
12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
failed:
org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x

I then changed the permissions to 777 and the job ran successfully.  This
suggests that some process was trying to write to write to
mapred.system.dir but did not have sufficient permissions.  The
speculation is that this was being attempted under my uid instead of
mapred.  Perhaps it is something else. I welcome your suggestions.


For completeness, I also have mapred.jobtracker.staging.root.dir set to
/user within HDFS.  I can verify the staging files are going there but
something else is still trying to access mapred.system.dir.

Robin Goldstone, LLNL

On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:

>Hi,
>
>Regular users never write into the mapred.system.dir AFAICT. That
>directory, is just for the JT to use to mark its presence and to
>"expose" the distributed filesystem it will be relying on.
>
>Users write to their respective staging directories, which lies
>elsewhere and is per-user.
>
>Let me post my environment:
>
>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>whatever you configure it to) is to be owned by mapred:hadoop so that
>the JT can feel free to reconfigure it.
>
>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>the parent directory for user's to write their per-user job stage
>files (JARs, etc.)) is set to "/user". The /user further contains each
>user's home directories, owned all by them. For example:
>
>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>
>All staging files from local user 'harsh' are hence written as the
>proper user under /user/harsh/.staging since that user does have
>permissions to write there. For any user to access HDFS, they'd need a
>home directory created on the HDFS by the admin first - and after that
>things users do under their own directory, will work just fine. The JT
>would not have to try to create per-user directories.
>
>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
><si...@gmail.com> wrote:
>> Thanks everyone, Seem like i hit the dead end.
>> It's kind of funny when i read that jira; run it 4 time and everything
>> will work.. where that magic number from..lol
>>
>> respects
>>
>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>wrote:
>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>
>>> is the bug that Robin is referring to.
>>>
>>> --
>>> Arpit Gupta
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>><go...@llnl.gov>
>>> wrote:
>>>
>>> This is similar to issues I ran into with permissions/ownership of
>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>set
>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>job
>>> tracker starts up (running as user mapred) it explicitly sets the
>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>job as
>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>it
>>> can't due to the ownership/permissions that have been established.
>>>
>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>and
>>> it appears from your experience that there are similar issues with
>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>under
>>> the user's identity rather than as user mapred.  This is good from a
>>> security perspective yet it seems no one bothered to account for this
>>>in
>>> terms of the permissions that need to be set in the various
>>>directories to
>>> enable this.
>>>
>>> Until this is sorted out by the Hadoop developers, I've put my
>>>attempts to
>>> use the fair scheduler on holdŠ
>>>
>>> Regards,
>>> Robin Goldstone, LLNL
>>>
>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>> wrote:
>>>
>>> Hi Harsh,
>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>> from the instruction.
>>> The 2% is about hadoop.tmp.dir
>>>
>>> let's say i have 2 users
>>> userA is a user that start hdfs and mapred
>>> userB is a regular user
>>>
>>> if i use default value of  hadoop.tmp.dir
>>> /tmp/hadoop-${user.name}
>>> I can submit job as usersA but not by usersB
>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>> :userA:supergroup:drwxr-xr-x
>>>
>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>> /tmp/hadoop.
>>> This way it is almost a yay way; the thing is
>>>
>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>> ownership will be userA.userA,
>>> and once I tried to submit job from the same machine as userB I will
>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>> Permission denied"
>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>> be able to submit job.
>>>
>>> Which is the right approach i should work with?
>>> Please suggest
>>>
>>> Patai
>>>
>>>
>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Hi Patai,
>>>
>>> Reply inline.
>>>
>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>> <si...@gmail.com> wrote:
>>>
>>> Thanks for input,
>>>
>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>
>>>
>>> That version should have the support for all of this.
>>>
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>>>
>>>
>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>> configure 3 queues of capacity scheduler. in order to have each pool
>>> can leverage Per-Queue ACL of each queue.?
>>>
>>>
>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>> Queue configuration.
>>>
>>> All you need to do is the following:
>>>
>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>
>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>
>>> 2. Define your required queues:
>>>
>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>> default, foo and bar.
>>>
>>> 3. Define Submit ACLs for each Queue:
>>>
>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>> (usernames groupnames)
>>>
>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>
>>> Likewise for remaining queues, as you need itŠ
>>>
>>> 4. Enable ACLs and restart JT.
>>>
>>> mapred.acls.enabled set to "true"
>>>
>>> 5. Users then use the right API to set queue names before submitting
>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>
>>> 
>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>f
>>> .html#setQueueName(java.lang.String)
>>>
>>> 6. Done.
>>>
>>> Let us know if this works!
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>
>
>
>-- 
>Harsh J

Re: Fair scheduler.

Posted by "Goldstone, Robin J." <go...@llnl.gov>.

Yes, you would think that users shouldn't need to write to
mapred.system.dir, yet that seems to be the case.  I posted details about
my configuration along with full stack traces last week.  I won't re-post
everything but essentially I have mapred.system.dir defined as a directory
in HDFS owned by mapred:hadoop.  I initially set the permissions to 755
but when the job tracker started up it changed the permissions to 700.
Then when I ran a job as a regular user I got this error:

12/10/09 16:27:03 INFO mapred.JobClient: Job Failed: Job initialization
failed:
org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=robing, access=EXECUTE, inode="mapred":mapred:hadoop:rwx------


I then manually changed the permissions back to 755 and ran again and got
this error:
12/10/09 16:31:30 INFO mapred.JobClient: Job Failed: Job initialization
failed:
org.apache.hadoop.security.AccessControlException:
org.apache.hadoop.security.AccessControlException: Permission denied:
user=robing, access=WRITE, inode="mapred":mapred:hadoop:rwxr-xr-x

I then changed the permissions to 777 and the job ran successfully.  This
suggests that some process was trying to write to write to
mapred.system.dir but did not have sufficient permissions.  The
speculation is that this was being attempted under my uid instead of
mapred.  Perhaps it is something else. I welcome your suggestions.


For completeness, I also have mapred.jobtracker.staging.root.dir set to
/user within HDFS.  I can verify the staging files are going there but
something else is still trying to access mapred.system.dir.

Robin Goldstone, LLNL

On 10/17/12 12:00 AM, "Harsh J" <ha...@cloudera.com> wrote:

>Hi,
>
>Regular users never write into the mapred.system.dir AFAICT. That
>directory, is just for the JT to use to mark its presence and to
>"expose" the distributed filesystem it will be relying on.
>
>Users write to their respective staging directories, which lies
>elsewhere and is per-user.
>
>Let me post my environment:
>
>- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
>"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
>whatever you configure it to) is to be owned by mapred:hadoop so that
>the JT can feel free to reconfigure it.
>
>- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
>the parent directory for user's to write their per-user job stage
>files (JARs, etc.)) is set to "/user". The /user further contains each
>user's home directories, owned all by them. For example:
>
>drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh
>
>All staging files from local user 'harsh' are hence written as the
>proper user under /user/harsh/.staging since that user does have
>permissions to write there. For any user to access HDFS, they'd need a
>home directory created on the HDFS by the admin first - and after that
>things users do under their own directory, will work just fine. The JT
>would not have to try to create per-user directories.
>
>On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
><si...@gmail.com> wrote:
>> Thanks everyone, Seem like i hit the dead end.
>> It's kind of funny when i read that jira; run it 4 time and everything
>> will work.. where that magic number from..lol
>>
>> respects
>>
>> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com>
>>wrote:
>>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>>
>>> is the bug that Robin is referring to.
>>>
>>> --
>>> Arpit Gupta
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J."
>>><go...@llnl.gov>
>>> wrote:
>>>
>>> This is similar to issues I ran into with permissions/ownership of
>>> mapred.system.dir when using the fair scheduler.  We are instructed to
>>>set
>>> the ownership of mapred.system.dir to mapred:hadoop and then when the
>>>job
>>> tracker starts up (running as user mapred) it explicitly sets the
>>> permissions on this directory to 700.  Meanwhile when I go to run a
>>>job as
>>> a regular user, it is trying to write stuff into mapred.system.dir but
>>>it
>>> can't due to the ownership/permissions that have been established.
>>>
>>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler
>>>and
>>> it appears from your experience that there are similar issues with
>>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs
>>>under
>>> the user's identity rather than as user mapred.  This is good from a
>>> security perspective yet it seems no one bothered to account for this
>>>in
>>> terms of the permissions that need to be set in the various
>>>directories to
>>> enable this.
>>>
>>> Until this is sorted out by the Hadoop developers, I've put my
>>>attempts to
>>> use the fair scheduler on holdŠ
>>>
>>> Regards,
>>> Robin Goldstone, LLNL
>>>
>>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>>> wrote:
>>>
>>> Hi Harsh,
>>> Thanks for breaking it down clearly. I would say i am successful 98%
>>> from the instruction.
>>> The 2% is about hadoop.tmp.dir
>>>
>>> let's say i have 2 users
>>> userA is a user that start hdfs and mapred
>>> userB is a regular user
>>>
>>> if i use default value of  hadoop.tmp.dir
>>> /tmp/hadoop-${user.name}
>>> I can submit job as usersA but not by usersB
>>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>>> :userA:supergroup:drwxr-xr-x
>>>
>>> i googled around; someone recommended to change hadoop.tmp.dir to
>>> /tmp/hadoop.
>>> This way it is almost a yay way; the thing is
>>>
>>> if I submit as userA it will create /tmp/hadoop in local machine which
>>> ownership will be userA.userA,
>>> and once I tried to submit job from the same machine as userB I will
>>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>>> Permission denied"
>>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>>> /tmp/hadoop and let the directory be created by userB, userA will not
>>> be able to submit job.
>>>
>>> Which is the right approach i should work with?
>>> Please suggest
>>>
>>> Patai
>>>
>>>
>>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>>
>>> Hi Patai,
>>>
>>> Reply inline.
>>>
>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>> <si...@gmail.com> wrote:
>>>
>>> Thanks for input,
>>>
>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>>
>>>
>>> That version should have the support for all of this.
>>>
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>>>
>>>
>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>> configure 3 queues of capacity scheduler. in order to have each pool
>>> can leverage Per-Queue ACL of each queue.?
>>>
>>>
>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>> Queue configuration.
>>>
>>> All you need to do is the following:
>>>
>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>>
>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>>
>>> 2. Define your required queues:
>>>
>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>> default, foo and bar.
>>>
>>> 3. Define Submit ACLs for each Queue:
>>>
>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>> (usernames groupnames)
>>>
>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>>
>>> Likewise for remaining queues, as you need itŠ
>>>
>>> 4. Enable ACLs and restart JT.
>>>
>>> mapred.acls.enabled set to "true"
>>>
>>> 5. Users then use the right API to set queue names before submitting
>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>>
>>> 
>>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobCon
>>>f
>>> .html#setQueueName(java.lang.String)
>>>
>>> 6. Done.
>>>
>>> Let us know if this works!
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>
>
>
>-- 
>Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hi,

Regular users never write into the mapred.system.dir AFAICT. That
directory, is just for the JT to use to mark its presence and to
"expose" the distributed filesystem it will be relying on.

Users write to their respective staging directories, which lies
elsewhere and is per-user.

Let me post my environment:

- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
whatever you configure it to) is to be owned by mapred:hadoop so that
the JT can feel free to reconfigure it.

- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
the parent directory for user's to write their per-user job stage
files (JARs, etc.)) is set to "/user". The /user further contains each
user's home directories, owned all by them. For example:

drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh

All staging files from local user 'harsh' are hence written as the
proper user under /user/harsh/.staging since that user does have
permissions to write there. For any user to access HDFS, they'd need a
home directory created on the HDFS by the admin first - and after that
things users do under their own directory, will work just fine. The JT
would not have to try to create per-user directories.

On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks everyone, Seem like i hit the dead end.
> It's kind of funny when i read that jira; run it 4 time and everything
> will work.. where that magic number from..lol
>
> respects
>
> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>
>> is the bug that Robin is referring to.
>>
>> --
>> Arpit Gupta
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
>> wrote:
>>
>> This is similar to issues I ran into with permissions/ownership of
>> mapred.system.dir when using the fair scheduler.  We are instructed to set
>> the ownership of mapred.system.dir to mapred:hadoop and then when the job
>> tracker starts up (running as user mapred) it explicitly sets the
>> permissions on this directory to 700.  Meanwhile when I go to run a job as
>> a regular user, it is trying to write stuff into mapred.system.dir but it
>> can't due to the ownership/permissions that have been established.
>>
>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
>> it appears from your experience that there are similar issues with
>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
>> the user's identity rather than as user mapred.  This is good from a
>> security perspective yet it seems no one bothered to account for this in
>> terms of the permissions that need to be set in the various directories to
>> enable this.
>>
>> Until this is sorted out by the Hadoop developers, I've put my attempts to
>> use the fair scheduler on holdŠ
>>
>> Regards,
>> Robin Goldstone, LLNL
>>
>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>> wrote:
>>
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>>
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>>
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>>
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>>
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>>
>> Which is the right approach i should work with?
>> Please suggest
>>
>> Patai
>>
>>
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>>
>> That version should have the support for all of this.
>>
>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>>
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>>
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>> .html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J
>>
>>
>>



-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hi,

Regular users never write into the mapred.system.dir AFAICT. That
directory, is just for the JT to use to mark its presence and to
"expose" the distributed filesystem it will be relying on.

Users write to their respective staging directories, which lies
elsewhere and is per-user.

Let me post my environment:

- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
whatever you configure it to) is to be owned by mapred:hadoop so that
the JT can feel free to reconfigure it.

- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
the parent directory for user's to write their per-user job stage
files (JARs, etc.)) is set to "/user". The /user further contains each
user's home directories, owned all by them. For example:

drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh

All staging files from local user 'harsh' are hence written as the
proper user under /user/harsh/.staging since that user does have
permissions to write there. For any user to access HDFS, they'd need a
home directory created on the HDFS by the admin first - and after that
things users do under their own directory, will work just fine. The JT
would not have to try to create per-user directories.

On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks everyone, Seem like i hit the dead end.
> It's kind of funny when i read that jira; run it 4 time and everything
> will work.. where that magic number from..lol
>
> respects
>
> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>
>> is the bug that Robin is referring to.
>>
>> --
>> Arpit Gupta
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
>> wrote:
>>
>> This is similar to issues I ran into with permissions/ownership of
>> mapred.system.dir when using the fair scheduler.  We are instructed to set
>> the ownership of mapred.system.dir to mapred:hadoop and then when the job
>> tracker starts up (running as user mapred) it explicitly sets the
>> permissions on this directory to 700.  Meanwhile when I go to run a job as
>> a regular user, it is trying to write stuff into mapred.system.dir but it
>> can't due to the ownership/permissions that have been established.
>>
>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
>> it appears from your experience that there are similar issues with
>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
>> the user's identity rather than as user mapred.  This is good from a
>> security perspective yet it seems no one bothered to account for this in
>> terms of the permissions that need to be set in the various directories to
>> enable this.
>>
>> Until this is sorted out by the Hadoop developers, I've put my attempts to
>> use the fair scheduler on holdŠ
>>
>> Regards,
>> Robin Goldstone, LLNL
>>
>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>> wrote:
>>
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>>
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>>
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>>
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>>
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>>
>> Which is the right approach i should work with?
>> Please suggest
>>
>> Patai
>>
>>
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>>
>> That version should have the support for all of this.
>>
>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>>
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>>
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>> .html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J
>>
>>
>>



-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hi,

Regular users never write into the mapred.system.dir AFAICT. That
directory, is just for the JT to use to mark its presence and to
"expose" the distributed filesystem it will be relying on.

Users write to their respective staging directories, which lies
elsewhere and is per-user.

Let me post my environment:

- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
whatever you configure it to) is to be owned by mapred:hadoop so that
the JT can feel free to reconfigure it.

- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
the parent directory for user's to write their per-user job stage
files (JARs, etc.)) is set to "/user". The /user further contains each
user's home directories, owned all by them. For example:

drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh

All staging files from local user 'harsh' are hence written as the
proper user under /user/harsh/.staging since that user does have
permissions to write there. For any user to access HDFS, they'd need a
home directory created on the HDFS by the admin first - and after that
things users do under their own directory, will work just fine. The JT
would not have to try to create per-user directories.

On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks everyone, Seem like i hit the dead end.
> It's kind of funny when i read that jira; run it 4 time and everything
> will work.. where that magic number from..lol
>
> respects
>
> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>
>> is the bug that Robin is referring to.
>>
>> --
>> Arpit Gupta
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
>> wrote:
>>
>> This is similar to issues I ran into with permissions/ownership of
>> mapred.system.dir when using the fair scheduler.  We are instructed to set
>> the ownership of mapred.system.dir to mapred:hadoop and then when the job
>> tracker starts up (running as user mapred) it explicitly sets the
>> permissions on this directory to 700.  Meanwhile when I go to run a job as
>> a regular user, it is trying to write stuff into mapred.system.dir but it
>> can't due to the ownership/permissions that have been established.
>>
>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
>> it appears from your experience that there are similar issues with
>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
>> the user's identity rather than as user mapred.  This is good from a
>> security perspective yet it seems no one bothered to account for this in
>> terms of the permissions that need to be set in the various directories to
>> enable this.
>>
>> Until this is sorted out by the Hadoop developers, I've put my attempts to
>> use the fair scheduler on holdŠ
>>
>> Regards,
>> Robin Goldstone, LLNL
>>
>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>> wrote:
>>
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>>
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>>
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>>
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>>
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>>
>> Which is the right approach i should work with?
>> Please suggest
>>
>> Patai
>>
>>
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>>
>> That version should have the support for all of this.
>>
>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>>
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>>
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>> .html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J
>>
>>
>>



-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hi,

Regular users never write into the mapred.system.dir AFAICT. That
directory, is just for the JT to use to mark its presence and to
"expose" the distributed filesystem it will be relying on.

Users write to their respective staging directories, which lies
elsewhere and is per-user.

Let me post my environment:

- mapred.system.dir (A HDFS Dir for a JT to register itself) set to
"/tmp/mapred/system". The /tmp/mapred and /tmp/mapred/system (or
whatever you configure it to) is to be owned by mapred:hadoop so that
the JT can feel free to reconfigure it.

- mapreduce.jobtracker.staging.root.dir (A HDFS dir that represents
the parent directory for user's to write their per-user job stage
files (JARs, etc.)) is set to "/user". The /user further contains each
user's home directories, owned all by them. For example:

drwxr-xr-x   - harsh    harsh 0 2012-09-27 15:51 /user/harsh

All staging files from local user 'harsh' are hence written as the
proper user under /user/harsh/.staging since that user does have
permissions to write there. For any user to access HDFS, they'd need a
home directory created on the HDFS by the admin first - and after that
things users do under their own directory, will work just fine. The JT
would not have to try to create per-user directories.

On Wed, Oct 17, 2012 at 5:22 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks everyone, Seem like i hit the dead end.
> It's kind of funny when i read that jira; run it 4 time and everything
> will work.. where that magic number from..lol
>
> respects
>
> On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
>> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>>
>> is the bug that Robin is referring to.
>>
>> --
>> Arpit Gupta
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
>> wrote:
>>
>> This is similar to issues I ran into with permissions/ownership of
>> mapred.system.dir when using the fair scheduler.  We are instructed to set
>> the ownership of mapred.system.dir to mapred:hadoop and then when the job
>> tracker starts up (running as user mapred) it explicitly sets the
>> permissions on this directory to 700.  Meanwhile when I go to run a job as
>> a regular user, it is trying to write stuff into mapred.system.dir but it
>> can't due to the ownership/permissions that have been established.
>>
>> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
>> it appears from your experience that there are similar issues with
>> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
>> the user's identity rather than as user mapred.  This is good from a
>> security perspective yet it seems no one bothered to account for this in
>> terms of the permissions that need to be set in the various directories to
>> enable this.
>>
>> Until this is sorted out by the Hadoop developers, I've put my attempts to
>> use the fair scheduler on holdŠ
>>
>> Regards,
>> Robin Goldstone, LLNL
>>
>> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
>> wrote:
>>
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>>
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>>
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>>
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>>
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>>
>> Which is the right approach i should work with?
>> Please suggest
>>
>> Patai
>>
>>
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>>
>> That version should have the support for all of this.
>>
>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>>
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>>
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>
>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>> .html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J
>>
>>
>>



-- 
Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Thanks everyone, Seem like i hit the dead end.
It's kind of funny when i read that jira; run it 4 time and everything
will work.. where that magic number from..lol

respects

On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>
> is the bug that Robin is referring to.
>
> --
> Arpit Gupta
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
> wrote:
>
> This is similar to issues I ran into with permissions/ownership of
> mapred.system.dir when using the fair scheduler.  We are instructed to set
> the ownership of mapred.system.dir to mapred:hadoop and then when the job
> tracker starts up (running as user mapred) it explicitly sets the
> permissions on this directory to 700.  Meanwhile when I go to run a job as
> a regular user, it is trying to write stuff into mapred.system.dir but it
> can't due to the ownership/permissions that have been established.
>
> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
> it appears from your experience that there are similar issues with
> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
> the user's identity rather than as user mapred.  This is good from a
> security perspective yet it seems no one bothered to account for this in
> terms of the permissions that need to be set in the various directories to
> enable this.
>
> Until this is sorted out by the Hadoop developers, I've put my attempts to
> use the fair scheduler on holdŠ
>
> Regards,
> Robin Goldstone, LLNL
>
> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
> wrote:
>
> Hi Harsh,
> Thanks for breaking it down clearly. I would say i am successful 98%
> from the instruction.
> The 2% is about hadoop.tmp.dir
>
> let's say i have 2 users
> userA is a user that start hdfs and mapred
> userB is a regular user
>
> if i use default value of  hadoop.tmp.dir
> /tmp/hadoop-${user.name}
> I can submit job as usersA but not by usersB
> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
> :userA:supergroup:drwxr-xr-x
>
> i googled around; someone recommended to change hadoop.tmp.dir to
> /tmp/hadoop.
> This way it is almost a yay way; the thing is
>
> if I submit as userA it will create /tmp/hadoop in local machine which
> ownership will be userA.userA,
> and once I tried to submit job from the same machine as userB I will
> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
> Permission denied"
> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
> /tmp/hadoop and let the directory be created by userB, userA will not
> be able to submit job.
>
> Which is the right approach i should work with?
> Please suggest
>
> Patai
>
>
> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Hi Patai,
>
> Reply inline.
>
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>
> Thanks for input,
>
> I am reading the document; i forget to mention that i am on cdh3u4.
>
>
> That version should have the support for all of this.
>
> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs
>
>
> Is that mean if i plan to 3 pools of fair scheduler, i have to
> configure 3 queues of capacity scheduler. in order to have each pool
> can leverage Per-Queue ACL of each queue.?
>
>
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
>
> All you need to do is the following:
>
> 1. Map FairScheduler pool name to reuse queue names itself:
>
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>
> 2. Define your required queues:
>
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
>
> 3. Define Submit ACLs for each Queue:
>
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
>
> mapred.queue.foo.acl-submit-job set to "spam eggs"
>
> Likewise for remaining queues, as you need itŠ
>
> 4. Enable ACLs and restart JT.
>
> mapred.acls.enabled set to "true"
>
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
> .html#setQueueName(java.lang.String)
>
> 6. Done.
>
> Let us know if this works!
>
> --
> Harsh J
>
>
>

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Thanks everyone, Seem like i hit the dead end.
It's kind of funny when i read that jira; run it 4 time and everything
will work.. where that magic number from..lol

respects

On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>
> is the bug that Robin is referring to.
>
> --
> Arpit Gupta
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
> wrote:
>
> This is similar to issues I ran into with permissions/ownership of
> mapred.system.dir when using the fair scheduler.  We are instructed to set
> the ownership of mapred.system.dir to mapred:hadoop and then when the job
> tracker starts up (running as user mapred) it explicitly sets the
> permissions on this directory to 700.  Meanwhile when I go to run a job as
> a regular user, it is trying to write stuff into mapred.system.dir but it
> can't due to the ownership/permissions that have been established.
>
> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
> it appears from your experience that there are similar issues with
> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
> the user's identity rather than as user mapred.  This is good from a
> security perspective yet it seems no one bothered to account for this in
> terms of the permissions that need to be set in the various directories to
> enable this.
>
> Until this is sorted out by the Hadoop developers, I've put my attempts to
> use the fair scheduler on holdŠ
>
> Regards,
> Robin Goldstone, LLNL
>
> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
> wrote:
>
> Hi Harsh,
> Thanks for breaking it down clearly. I would say i am successful 98%
> from the instruction.
> The 2% is about hadoop.tmp.dir
>
> let's say i have 2 users
> userA is a user that start hdfs and mapred
> userB is a regular user
>
> if i use default value of  hadoop.tmp.dir
> /tmp/hadoop-${user.name}
> I can submit job as usersA but not by usersB
> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
> :userA:supergroup:drwxr-xr-x
>
> i googled around; someone recommended to change hadoop.tmp.dir to
> /tmp/hadoop.
> This way it is almost a yay way; the thing is
>
> if I submit as userA it will create /tmp/hadoop in local machine which
> ownership will be userA.userA,
> and once I tried to submit job from the same machine as userB I will
> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
> Permission denied"
> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
> /tmp/hadoop and let the directory be created by userB, userA will not
> be able to submit job.
>
> Which is the right approach i should work with?
> Please suggest
>
> Patai
>
>
> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Hi Patai,
>
> Reply inline.
>
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>
> Thanks for input,
>
> I am reading the document; i forget to mention that i am on cdh3u4.
>
>
> That version should have the support for all of this.
>
> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs
>
>
> Is that mean if i plan to 3 pools of fair scheduler, i have to
> configure 3 queues of capacity scheduler. in order to have each pool
> can leverage Per-Queue ACL of each queue.?
>
>
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
>
> All you need to do is the following:
>
> 1. Map FairScheduler pool name to reuse queue names itself:
>
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>
> 2. Define your required queues:
>
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
>
> 3. Define Submit ACLs for each Queue:
>
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
>
> mapred.queue.foo.acl-submit-job set to "spam eggs"
>
> Likewise for remaining queues, as you need itŠ
>
> 4. Enable ACLs and restart JT.
>
> mapred.acls.enabled set to "true"
>
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
> .html#setQueueName(java.lang.String)
>
> 6. Done.
>
> Let us know if this works!
>
> --
> Harsh J
>
>
>

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Thanks everyone, Seem like i hit the dead end.
It's kind of funny when i read that jira; run it 4 time and everything
will work.. where that magic number from..lol

respects

On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>
> is the bug that Robin is referring to.
>
> --
> Arpit Gupta
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
> wrote:
>
> This is similar to issues I ran into with permissions/ownership of
> mapred.system.dir when using the fair scheduler.  We are instructed to set
> the ownership of mapred.system.dir to mapred:hadoop and then when the job
> tracker starts up (running as user mapred) it explicitly sets the
> permissions on this directory to 700.  Meanwhile when I go to run a job as
> a regular user, it is trying to write stuff into mapred.system.dir but it
> can't due to the ownership/permissions that have been established.
>
> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
> it appears from your experience that there are similar issues with
> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
> the user's identity rather than as user mapred.  This is good from a
> security perspective yet it seems no one bothered to account for this in
> terms of the permissions that need to be set in the various directories to
> enable this.
>
> Until this is sorted out by the Hadoop developers, I've put my attempts to
> use the fair scheduler on holdŠ
>
> Regards,
> Robin Goldstone, LLNL
>
> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
> wrote:
>
> Hi Harsh,
> Thanks for breaking it down clearly. I would say i am successful 98%
> from the instruction.
> The 2% is about hadoop.tmp.dir
>
> let's say i have 2 users
> userA is a user that start hdfs and mapred
> userB is a regular user
>
> if i use default value of  hadoop.tmp.dir
> /tmp/hadoop-${user.name}
> I can submit job as usersA but not by usersB
> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
> :userA:supergroup:drwxr-xr-x
>
> i googled around; someone recommended to change hadoop.tmp.dir to
> /tmp/hadoop.
> This way it is almost a yay way; the thing is
>
> if I submit as userA it will create /tmp/hadoop in local machine which
> ownership will be userA.userA,
> and once I tried to submit job from the same machine as userB I will
> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
> Permission denied"
> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
> /tmp/hadoop and let the directory be created by userB, userA will not
> be able to submit job.
>
> Which is the right approach i should work with?
> Please suggest
>
> Patai
>
>
> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Hi Patai,
>
> Reply inline.
>
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>
> Thanks for input,
>
> I am reading the document; i forget to mention that i am on cdh3u4.
>
>
> That version should have the support for all of this.
>
> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs
>
>
> Is that mean if i plan to 3 pools of fair scheduler, i have to
> configure 3 queues of capacity scheduler. in order to have each pool
> can leverage Per-Queue ACL of each queue.?
>
>
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
>
> All you need to do is the following:
>
> 1. Map FairScheduler pool name to reuse queue names itself:
>
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>
> 2. Define your required queues:
>
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
>
> 3. Define Submit ACLs for each Queue:
>
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
>
> mapred.queue.foo.acl-submit-job set to "spam eggs"
>
> Likewise for remaining queues, as you need itŠ
>
> 4. Enable ACLs and restart JT.
>
> mapred.acls.enabled set to "true"
>
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
> .html#setQueueName(java.lang.String)
>
> 6. Done.
>
> Let us know if this works!
>
> --
> Harsh J
>
>
>

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Thanks everyone, Seem like i hit the dead end.
It's kind of funny when i read that jira; run it 4 time and everything
will work.. where that magic number from..lol

respects

On Tue, Oct 16, 2012 at 4:12 PM, Arpit Gupta <ar...@hortonworks.com> wrote:
> https://issues.apache.org/jira/browse/MAPREDUCE-4398
>
> is the bug that Robin is referring to.
>
> --
> Arpit Gupta
> Hortonworks Inc.
> http://hortonworks.com/
>
> On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov>
> wrote:
>
> This is similar to issues I ran into with permissions/ownership of
> mapred.system.dir when using the fair scheduler.  We are instructed to set
> the ownership of mapred.system.dir to mapred:hadoop and then when the job
> tracker starts up (running as user mapred) it explicitly sets the
> permissions on this directory to 700.  Meanwhile when I go to run a job as
> a regular user, it is trying to write stuff into mapred.system.dir but it
> can't due to the ownership/permissions that have been established.
>
> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
> it appears from your experience that there are similar issues with
> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
> the user's identity rather than as user mapred.  This is good from a
> security perspective yet it seems no one bothered to account for this in
> terms of the permissions that need to be set in the various directories to
> enable this.
>
> Until this is sorted out by the Hadoop developers, I've put my attempts to
> use the fair scheduler on holdŠ
>
> Regards,
> Robin Goldstone, LLNL
>
> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
> wrote:
>
> Hi Harsh,
> Thanks for breaking it down clearly. I would say i am successful 98%
> from the instruction.
> The 2% is about hadoop.tmp.dir
>
> let's say i have 2 users
> userA is a user that start hdfs and mapred
> userB is a regular user
>
> if i use default value of  hadoop.tmp.dir
> /tmp/hadoop-${user.name}
> I can submit job as usersA but not by usersB
> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
> :userA:supergroup:drwxr-xr-x
>
> i googled around; someone recommended to change hadoop.tmp.dir to
> /tmp/hadoop.
> This way it is almost a yay way; the thing is
>
> if I submit as userA it will create /tmp/hadoop in local machine which
> ownership will be userA.userA,
> and once I tried to submit job from the same machine as userB I will
> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
> Permission denied"
> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
> /tmp/hadoop and let the directory be created by userB, userA will not
> be able to submit job.
>
> Which is the right approach i should work with?
> Please suggest
>
> Patai
>
>
> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>
> Hi Patai,
>
> Reply inline.
>
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>
> Thanks for input,
>
> I am reading the document; i forget to mention that i am on cdh3u4.
>
>
> That version should have the support for all of this.
>
> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs
>
>
> Is that mean if i plan to 3 pools of fair scheduler, i have to
> configure 3 queues of capacity scheduler. in order to have each pool
> can leverage Per-Queue ACL of each queue.?
>
>
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
>
> All you need to do is the following:
>
> 1. Map FairScheduler pool name to reuse queue names itself:
>
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>
> 2. Define your required queues:
>
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
>
> 3. Define Submit ACLs for each Queue:
>
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
>
> mapred.queue.foo.acl-submit-job set to "spam eggs"
>
> Likewise for remaining queues, as you need itŠ
>
> 4. Enable ACLs and restart JT.
>
> mapred.acls.enabled set to "true"
>
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
> .html#setQueueName(java.lang.String)
>
> 6. Done.
>
> Let us know if this works!
>
> --
> Harsh J
>
>
>

Re: Fair scheduler.

Posted by Arpit Gupta <ar...@hortonworks.com>.

https://issues.apache.org/jira/browse/MAPREDUCE-4398

is the bug that Robin is referring to.

--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/

On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov> wrote:

> This is similar to issues I ran into with permissions/ownership of
> mapred.system.dir when using the fair scheduler.  We are instructed to set
> the ownership of mapred.system.dir to mapred:hadoop and then when the job
> tracker starts up (running as user mapred) it explicitly sets the
> permissions on this directory to 700.  Meanwhile when I go to run a job as
> a regular user, it is trying to write stuff into mapred.system.dir but it
> can't due to the ownership/permissions that have been established.
> 
> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
> it appears from your experience that there are similar issues with
> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
> the user's identity rather than as user mapred.  This is good from a
> security perspective yet it seems no one bothered to account for this in
> terms of the permissions that need to be set in the various directories to
> enable this. 
> 
> Until this is sorted out by the Hadoop developers, I've put my attempts to
> use the fair scheduler on holdŠ
> 
> Regards,
> Robin Goldstone, LLNL
> 
> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
> wrote:
> 
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>> 
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>> 
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>> 
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>> 
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>> 
>> Which is the right approach i should work with?
>> Please suggest
>> 
>> Patai
>> 
>> 
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Hi Patai,
>>> 
>>> Reply inline.
>>> 
>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>> <si...@gmail.com> wrote:
>>>> Thanks for input,
>>>> 
>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>> 
>>> That version should have the support for all of this.
>>> 
>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>> can leverage the Per-Queue ACLs
>>>> 
>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>> can leverage Per-Queue ACL of each queue.?
>>> 
>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>> Queue configuration.
>>> 
>>> All you need to do is the following:
>>> 
>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>> 
>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>> 
>>> 2. Define your required queues:
>>> 
>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>> default, foo and bar.
>>> 
>>> 3. Define Submit ACLs for each Queue:
>>> 
>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>> (usernames groupnames)
>>> 
>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>> 
>>> Likewise for remaining queues, as you need itŠ
>>> 
>>> 4. Enable ACLs and restart JT.
>>> 
>>> mapred.acls.enabled set to "true"
>>> 
>>> 5. Users then use the right API to set queue names before submitting
>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>> 
>>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>>> .html#setQueueName(java.lang.String)
>>> 
>>> 6. Done.
>>> 
>>> Let us know if this works!
>>> 
>>> --
>>> Harsh J
>

Re: Fair scheduler.

Posted by Arpit Gupta <ar...@hortonworks.com>.

https://issues.apache.org/jira/browse/MAPREDUCE-4398

is the bug that Robin is referring to.

--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/

On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov> wrote:

> This is similar to issues I ran into with permissions/ownership of
> mapred.system.dir when using the fair scheduler.  We are instructed to set
> the ownership of mapred.system.dir to mapred:hadoop and then when the job
> tracker starts up (running as user mapred) it explicitly sets the
> permissions on this directory to 700.  Meanwhile when I go to run a job as
> a regular user, it is trying to write stuff into mapred.system.dir but it
> can't due to the ownership/permissions that have been established.
> 
> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
> it appears from your experience that there are similar issues with
> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
> the user's identity rather than as user mapred.  This is good from a
> security perspective yet it seems no one bothered to account for this in
> terms of the permissions that need to be set in the various directories to
> enable this. 
> 
> Until this is sorted out by the Hadoop developers, I've put my attempts to
> use the fair scheduler on holdŠ
> 
> Regards,
> Robin Goldstone, LLNL
> 
> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
> wrote:
> 
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>> 
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>> 
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>> 
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>> 
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>> 
>> Which is the right approach i should work with?
>> Please suggest
>> 
>> Patai
>> 
>> 
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Hi Patai,
>>> 
>>> Reply inline.
>>> 
>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>> <si...@gmail.com> wrote:
>>>> Thanks for input,
>>>> 
>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>> 
>>> That version should have the support for all of this.
>>> 
>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>> can leverage the Per-Queue ACLs
>>>> 
>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>> can leverage Per-Queue ACL of each queue.?
>>> 
>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>> Queue configuration.
>>> 
>>> All you need to do is the following:
>>> 
>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>> 
>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>> 
>>> 2. Define your required queues:
>>> 
>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>> default, foo and bar.
>>> 
>>> 3. Define Submit ACLs for each Queue:
>>> 
>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>> (usernames groupnames)
>>> 
>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>> 
>>> Likewise for remaining queues, as you need itŠ
>>> 
>>> 4. Enable ACLs and restart JT.
>>> 
>>> mapred.acls.enabled set to "true"
>>> 
>>> 5. Users then use the right API to set queue names before submitting
>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>> 
>>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>>> .html#setQueueName(java.lang.String)
>>> 
>>> 6. Done.
>>> 
>>> Let us know if this works!
>>> 
>>> --
>>> Harsh J
>

Re: Fair scheduler.

Posted by Arpit Gupta <ar...@hortonworks.com>.

https://issues.apache.org/jira/browse/MAPREDUCE-4398

is the bug that Robin is referring to.

--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/

On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov> wrote:

> This is similar to issues I ran into with permissions/ownership of
> mapred.system.dir when using the fair scheduler.  We are instructed to set
> the ownership of mapred.system.dir to mapred:hadoop and then when the job
> tracker starts up (running as user mapred) it explicitly sets the
> permissions on this directory to 700.  Meanwhile when I go to run a job as
> a regular user, it is trying to write stuff into mapred.system.dir but it
> can't due to the ownership/permissions that have been established.
> 
> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
> it appears from your experience that there are similar issues with
> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
> the user's identity rather than as user mapred.  This is good from a
> security perspective yet it seems no one bothered to account for this in
> terms of the permissions that need to be set in the various directories to
> enable this. 
> 
> Until this is sorted out by the Hadoop developers, I've put my attempts to
> use the fair scheduler on holdŠ
> 
> Regards,
> Robin Goldstone, LLNL
> 
> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
> wrote:
> 
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>> 
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>> 
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>> 
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>> 
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>> 
>> Which is the right approach i should work with?
>> Please suggest
>> 
>> Patai
>> 
>> 
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Hi Patai,
>>> 
>>> Reply inline.
>>> 
>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>> <si...@gmail.com> wrote:
>>>> Thanks for input,
>>>> 
>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>> 
>>> That version should have the support for all of this.
>>> 
>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>> can leverage the Per-Queue ACLs
>>>> 
>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>> can leverage Per-Queue ACL of each queue.?
>>> 
>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>> Queue configuration.
>>> 
>>> All you need to do is the following:
>>> 
>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>> 
>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>> 
>>> 2. Define your required queues:
>>> 
>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>> default, foo and bar.
>>> 
>>> 3. Define Submit ACLs for each Queue:
>>> 
>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>> (usernames groupnames)
>>> 
>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>> 
>>> Likewise for remaining queues, as you need itŠ
>>> 
>>> 4. Enable ACLs and restart JT.
>>> 
>>> mapred.acls.enabled set to "true"
>>> 
>>> 5. Users then use the right API to set queue names before submitting
>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>> 
>>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>>> .html#setQueueName(java.lang.String)
>>> 
>>> 6. Done.
>>> 
>>> Let us know if this works!
>>> 
>>> --
>>> Harsh J
>

Re: Fair scheduler.

Posted by Arpit Gupta <ar...@hortonworks.com>.

https://issues.apache.org/jira/browse/MAPREDUCE-4398

is the bug that Robin is referring to.

--
Arpit Gupta
Hortonworks Inc.
http://hortonworks.com/

On Oct 16, 2012, at 3:51 PM, "Goldstone, Robin J." <go...@llnl.gov> wrote:

> This is similar to issues I ran into with permissions/ownership of
> mapred.system.dir when using the fair scheduler.  We are instructed to set
> the ownership of mapred.system.dir to mapred:hadoop and then when the job
> tracker starts up (running as user mapred) it explicitly sets the
> permissions on this directory to 700.  Meanwhile when I go to run a job as
> a regular user, it is trying to write stuff into mapred.system.dir but it
> can't due to the ownership/permissions that have been established.
> 
> Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
> it appears from your experience that there are similar issues with
> hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
> the user's identity rather than as user mapred.  This is good from a
> security perspective yet it seems no one bothered to account for this in
> terms of the permissions that need to be set in the various directories to
> enable this. 
> 
> Until this is sorted out by the Hadoop developers, I've put my attempts to
> use the fair scheduler on holdŠ
> 
> Regards,
> Robin Goldstone, LLNL
> 
> On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
> wrote:
> 
>> Hi Harsh,
>> Thanks for breaking it down clearly. I would say i am successful 98%
>> from the instruction.
>> The 2% is about hadoop.tmp.dir
>> 
>> let's say i have 2 users
>> userA is a user that start hdfs and mapred
>> userB is a regular user
>> 
>> if i use default value of  hadoop.tmp.dir
>> /tmp/hadoop-${user.name}
>> I can submit job as usersA but not by usersB
>> ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>> :userA:supergroup:drwxr-xr-x
>> 
>> i googled around; someone recommended to change hadoop.tmp.dir to
>> /tmp/hadoop.
>> This way it is almost a yay way; the thing is
>> 
>> if I submit as userA it will create /tmp/hadoop in local machine which
>> ownership will be userA.userA,
>> and once I tried to submit job from the same machine as userB I will
>> get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>> Permission denied"
>> (as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>> /tmp/hadoop and let the directory be created by userB, userA will not
>> be able to submit job.
>> 
>> Which is the right approach i should work with?
>> Please suggest
>> 
>> Patai
>> 
>> 
>> On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>>> Hi Patai,
>>> 
>>> Reply inline.
>>> 
>>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>>> <si...@gmail.com> wrote:
>>>> Thanks for input,
>>>> 
>>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>> 
>>> That version should have the support for all of this.
>>> 
>>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>>> can leverage the Per-Queue ACLs
>>>> 
>>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>>> configure 3 queues of capacity scheduler. in order to have each pool
>>>> can leverage Per-Queue ACL of each queue.?
>>> 
>>> Queues are not hard-tied into CapacityScheduler. You can have generic
>>> queues in MR. And FairScheduler can bind its Pool concept into the
>>> Queue configuration.
>>> 
>>> All you need to do is the following:
>>> 
>>> 1. Map FairScheduler pool name to reuse queue names itself:
>>> 
>>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>> 
>>> 2. Define your required queues:
>>> 
>>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>>> default, foo and bar.
>>> 
>>> 3. Define Submit ACLs for each Queue:
>>> 
>>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>>> (usernames groupnames)
>>> 
>>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>> 
>>> Likewise for remaining queues, as you need itŠ
>>> 
>>> 4. Enable ACLs and restart JT.
>>> 
>>> mapred.acls.enabled set to "true"
>>> 
>>> 5. Users then use the right API to set queue names before submitting
>>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>>> 
>>> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>>> .html#setQueueName(java.lang.String)
>>> 
>>> 6. Done.
>>> 
>>> Let us know if this works!
>>> 
>>> --
>>> Harsh J
>

Re: Fair scheduler.

Posted by "Goldstone, Robin J." <go...@llnl.gov>.

This is similar to issues I ran into with permissions/ownership of
mapred.system.dir when using the fair scheduler.  We are instructed to set
the ownership of mapred.system.dir to mapred:hadoop and then when the job
tracker starts up (running as user mapred) it explicitly sets the
permissions on this directory to 700.  Meanwhile when I go to run a job as
a regular user, it is trying to write stuff into mapred.system.dir but it
can't due to the ownership/permissions that have been established.

Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
it appears from your experience that there are similar issues with
hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
the user's identity rather than as user mapred.  This is good from a
security perspective yet it seems no one bothered to account for this in
terms of the permissions that need to be set in the various directories to
enable this. 

Until this is sorted out by the Hadoop developers, I've put my attempts to
use the fair scheduler on holdŠ

Regards,
Robin Goldstone, LLNL

On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
wrote:

>Hi Harsh,
>Thanks for breaking it down clearly. I would say i am successful 98%
>from the instruction.
>The 2% is about hadoop.tmp.dir
>
>let's say i have 2 users
>userA is a user that start hdfs and mapred
>userB is a regular user
>
>if i use default value of  hadoop.tmp.dir
>/tmp/hadoop-${user.name}
>I can submit job as usersA but not by usersB
>ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>:userA:supergroup:drwxr-xr-x
>
>i googled around; someone recommended to change hadoop.tmp.dir to
>/tmp/hadoop.
>This way it is almost a yay way; the thing is
>
>if I submit as userA it will create /tmp/hadoop in local machine which
>ownership will be userA.userA,
>and once I tried to submit job from the same machine as userB I will
>get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>Permission denied"
>(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>/tmp/hadoop and let the directory be created by userB, userA will not
>be able to submit job.
>
>Which is the right approach i should work with?
>Please suggest
>
>Patai
>
>
>On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>> Thanks for input,
>>>
>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>> That version should have the support for all of this.
>>
>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>> can leverage the Per-Queue ACLs
>>>
>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>> configure 3 queues of capacity scheduler. in order to have each pool
>>> can leverage Per-Queue ACL of each queue.?
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>> 
>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>>.html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J

Re: Fair scheduler.

Posted by "Goldstone, Robin J." <go...@llnl.gov>.

This is similar to issues I ran into with permissions/ownership of
mapred.system.dir when using the fair scheduler.  We are instructed to set
the ownership of mapred.system.dir to mapred:hadoop and then when the job
tracker starts up (running as user mapred) it explicitly sets the
permissions on this directory to 700.  Meanwhile when I go to run a job as
a regular user, it is trying to write stuff into mapred.system.dir but it
can't due to the ownership/permissions that have been established.

Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
it appears from your experience that there are similar issues with
hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
the user's identity rather than as user mapred.  This is good from a
security perspective yet it seems no one bothered to account for this in
terms of the permissions that need to be set in the various directories to
enable this. 

Until this is sorted out by the Hadoop developers, I've put my attempts to
use the fair scheduler on holdŠ

Regards,
Robin Goldstone, LLNL

On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
wrote:

>Hi Harsh,
>Thanks for breaking it down clearly. I would say i am successful 98%
>from the instruction.
>The 2% is about hadoop.tmp.dir
>
>let's say i have 2 users
>userA is a user that start hdfs and mapred
>userB is a regular user
>
>if i use default value of  hadoop.tmp.dir
>/tmp/hadoop-${user.name}
>I can submit job as usersA but not by usersB
>ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>:userA:supergroup:drwxr-xr-x
>
>i googled around; someone recommended to change hadoop.tmp.dir to
>/tmp/hadoop.
>This way it is almost a yay way; the thing is
>
>if I submit as userA it will create /tmp/hadoop in local machine which
>ownership will be userA.userA,
>and once I tried to submit job from the same machine as userB I will
>get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>Permission denied"
>(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>/tmp/hadoop and let the directory be created by userB, userA will not
>be able to submit job.
>
>Which is the right approach i should work with?
>Please suggest
>
>Patai
>
>
>On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>> Thanks for input,
>>>
>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>> That version should have the support for all of this.
>>
>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>> can leverage the Per-Queue ACLs
>>>
>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>> configure 3 queues of capacity scheduler. in order to have each pool
>>> can leverage Per-Queue ACL of each queue.?
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>> 
>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>>.html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J

Re: Fair scheduler.

Posted by "Goldstone, Robin J." <go...@llnl.gov>.

This is similar to issues I ran into with permissions/ownership of
mapred.system.dir when using the fair scheduler.  We are instructed to set
the ownership of mapred.system.dir to mapred:hadoop and then when the job
tracker starts up (running as user mapred) it explicitly sets the
permissions on this directory to 700.  Meanwhile when I go to run a job as
a regular user, it is trying to write stuff into mapred.system.dir but it
can't due to the ownership/permissions that have been established.

Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
it appears from your experience that there are similar issues with
hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
the user's identity rather than as user mapred.  This is good from a
security perspective yet it seems no one bothered to account for this in
terms of the permissions that need to be set in the various directories to
enable this. 

Until this is sorted out by the Hadoop developers, I've put my attempts to
use the fair scheduler on holdŠ

Regards,
Robin Goldstone, LLNL

On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
wrote:

>Hi Harsh,
>Thanks for breaking it down clearly. I would say i am successful 98%
>from the instruction.
>The 2% is about hadoop.tmp.dir
>
>let's say i have 2 users
>userA is a user that start hdfs and mapred
>userB is a regular user
>
>if i use default value of  hadoop.tmp.dir
>/tmp/hadoop-${user.name}
>I can submit job as usersA but not by usersB
>ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>:userA:supergroup:drwxr-xr-x
>
>i googled around; someone recommended to change hadoop.tmp.dir to
>/tmp/hadoop.
>This way it is almost a yay way; the thing is
>
>if I submit as userA it will create /tmp/hadoop in local machine which
>ownership will be userA.userA,
>and once I tried to submit job from the same machine as userB I will
>get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>Permission denied"
>(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>/tmp/hadoop and let the directory be created by userB, userA will not
>be able to submit job.
>
>Which is the right approach i should work with?
>Please suggest
>
>Patai
>
>
>On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>> Thanks for input,
>>>
>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>> That version should have the support for all of this.
>>
>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>> can leverage the Per-Queue ACLs
>>>
>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>> configure 3 queues of capacity scheduler. in order to have each pool
>>> can leverage Per-Queue ACL of each queue.?
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>> 
>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>>.html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J

Re: Fair scheduler.

Posted by "Goldstone, Robin J." <go...@llnl.gov>.

This is similar to issues I ran into with permissions/ownership of
mapred.system.dir when using the fair scheduler.  We are instructed to set
the ownership of mapred.system.dir to mapred:hadoop and then when the job
tracker starts up (running as user mapred) it explicitly sets the
permissions on this directory to 700.  Meanwhile when I go to run a job as
a regular user, it is trying to write stuff into mapred.system.dir but it
can't due to the ownership/permissions that have been established.

Per discussion with Arpit Gupta, this is a bug with the fair scheduler and
it appears from your experience that there are similar issues with
hadoop.tmp.dir.  The whole idea of the fair scheduler is to run jobs under
the user's identity rather than as user mapred.  This is good from a
security perspective yet it seems no one bothered to account for this in
terms of the permissions that need to be set in the various directories to
enable this. 

Until this is sorted out by the Hadoop developers, I've put my attempts to
use the fair scheduler on holdŠ

Regards,
Robin Goldstone, LLNL

On 10/16/12 3:32 PM, "Patai Sangbutsarakum" <si...@gmail.com>
wrote:

>Hi Harsh,
>Thanks for breaking it down clearly. I would say i am successful 98%
>from the instruction.
>The 2% is about hadoop.tmp.dir
>
>let's say i have 2 users
>userA is a user that start hdfs and mapred
>userB is a regular user
>
>if i use default value of  hadoop.tmp.dir
>/tmp/hadoop-${user.name}
>I can submit job as usersA but not by usersB
>ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
>:userA:supergroup:drwxr-xr-x
>
>i googled around; someone recommended to change hadoop.tmp.dir to
>/tmp/hadoop.
>This way it is almost a yay way; the thing is
>
>if I submit as userA it will create /tmp/hadoop in local machine which
>ownership will be userA.userA,
>and once I tried to submit job from the same machine as userB I will
>get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
>Permission denied"
>(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
>/tmp/hadoop and let the directory be created by userB, userA will not
>be able to submit job.
>
>Which is the right approach i should work with?
>Please suggest
>
>Patai
>
>
>On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
>> Hi Patai,
>>
>> Reply inline.
>>
>> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
>> <si...@gmail.com> wrote:
>>> Thanks for input,
>>>
>>> I am reading the document; i forget to mention that i am on cdh3u4.
>>
>> That version should have the support for all of this.
>>
>>>> If you point your poolname property to mapred.job.queue.name, then you
>>>> can leverage the Per-Queue ACLs
>>>
>>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>>> configure 3 queues of capacity scheduler. in order to have each pool
>>> can leverage Per-Queue ACL of each queue.?
>>
>> Queues are not hard-tied into CapacityScheduler. You can have generic
>> queues in MR. And FairScheduler can bind its Pool concept into the
>> Queue configuration.
>>
>> All you need to do is the following:
>>
>> 1. Map FairScheduler pool name to reuse queue names itself:
>>
>> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>>
>> 2. Define your required queues:
>>
>> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
>> default, foo and bar.
>>
>> 3. Define Submit ACLs for each Queue:
>>
>> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
>> (usernames groupnames)
>>
>> mapred.queue.foo.acl-submit-job set to "spam eggs"
>>
>> Likewise for remaining queues, as you need itŠ
>>
>> 4. Enable ACLs and restart JT.
>>
>> mapred.acls.enabled set to "true"
>>
>> 5. Users then use the right API to set queue names before submitting
>> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
>> 
>>http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf
>>.html#setQueueName(java.lang.String)
>>
>> 6. Done.
>>
>> Let us know if this works!
>>
>> --
>> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Hi Harsh,
Thanks for breaking it down clearly. I would say i am successful 98%
from the instruction.
The 2% is about hadoop.tmp.dir

let's say i have 2 users
userA is a user that start hdfs and mapred
userB is a regular user

if i use default value of  hadoop.tmp.dir
/tmp/hadoop-${user.name}
I can submit job as usersA but not by usersB
ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
:userA:supergroup:drwxr-xr-x

i googled around; someone recommended to change hadoop.tmp.dir to /tmp/hadoop.
This way it is almost a yay way; the thing is

if I submit as userA it will create /tmp/hadoop in local machine which
ownership will be userA.userA,
and once I tried to submit job from the same machine as userB I will
get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
Permission denied"
(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
/tmp/hadoop and let the directory be created by userB, userA will not
be able to submit job.

Which is the right approach i should work with?
Please suggest

Patai


On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi Patai,
>
> Reply inline.
>
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>
> That version should have the support for all of this.
>
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
>
> All you need to do is the following:
>
> 1. Map FairScheduler pool name to reuse queue names itself:
>
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>
> 2. Define your required queues:
>
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
>
> 3. Define Submit ACLs for each Queue:
>
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
>
> mapred.queue.foo.acl-submit-job set to "spam eggs"
>
> Likewise for remaining queues, as you need it…
>
> 4. Enable ACLs and restart JT.
>
> mapred.acls.enabled set to "true"
>
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)
>
> 6. Done.
>
> Let us know if this works!
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Hi Harsh,
Thanks for breaking it down clearly. I would say i am successful 98%
from the instruction.
The 2% is about hadoop.tmp.dir

let's say i have 2 users
userA is a user that start hdfs and mapred
userB is a regular user

if i use default value of  hadoop.tmp.dir
/tmp/hadoop-${user.name}
I can submit job as usersA but not by usersB
ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
:userA:supergroup:drwxr-xr-x

i googled around; someone recommended to change hadoop.tmp.dir to /tmp/hadoop.
This way it is almost a yay way; the thing is

if I submit as userA it will create /tmp/hadoop in local machine which
ownership will be userA.userA,
and once I tried to submit job from the same machine as userB I will
get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
Permission denied"
(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
/tmp/hadoop and let the directory be created by userB, userA will not
be able to submit job.

Which is the right approach i should work with?
Please suggest

Patai


On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi Patai,
>
> Reply inline.
>
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>
> That version should have the support for all of this.
>
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
>
> All you need to do is the following:
>
> 1. Map FairScheduler pool name to reuse queue names itself:
>
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>
> 2. Define your required queues:
>
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
>
> 3. Define Submit ACLs for each Queue:
>
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
>
> mapred.queue.foo.acl-submit-job set to "spam eggs"
>
> Likewise for remaining queues, as you need it…
>
> 4. Enable ACLs and restart JT.
>
> mapred.acls.enabled set to "true"
>
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)
>
> 6. Done.
>
> Let us know if this works!
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Hi Harsh,
Thanks for breaking it down clearly. I would say i am successful 98%
from the instruction.
The 2% is about hadoop.tmp.dir

let's say i have 2 users
userA is a user that start hdfs and mapred
userB is a regular user

if i use default value of  hadoop.tmp.dir
/tmp/hadoop-${user.name}
I can submit job as usersA but not by usersB
ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
:userA:supergroup:drwxr-xr-x

i googled around; someone recommended to change hadoop.tmp.dir to /tmp/hadoop.
This way it is almost a yay way; the thing is

if I submit as userA it will create /tmp/hadoop in local machine which
ownership will be userA.userA,
and once I tried to submit job from the same machine as userB I will
get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
Permission denied"
(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
/tmp/hadoop and let the directory be created by userB, userA will not
be able to submit job.

Which is the right approach i should work with?
Please suggest

Patai


On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi Patai,
>
> Reply inline.
>
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>
> That version should have the support for all of this.
>
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
>
> All you need to do is the following:
>
> 1. Map FairScheduler pool name to reuse queue names itself:
>
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>
> 2. Define your required queues:
>
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
>
> 3. Define Submit ACLs for each Queue:
>
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
>
> mapred.queue.foo.acl-submit-job set to "spam eggs"
>
> Likewise for remaining queues, as you need it…
>
> 4. Enable ACLs and restart JT.
>
> mapred.acls.enabled set to "true"
>
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)
>
> 6. Done.
>
> Let us know if this works!
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Hi Harsh,
Thanks for breaking it down clearly. I would say i am successful 98%
from the instruction.
The 2% is about hadoop.tmp.dir

let's say i have 2 users
userA is a user that start hdfs and mapred
userB is a regular user

if i use default value of  hadoop.tmp.dir
/tmp/hadoop-${user.name}
I can submit job as usersA but not by usersB
ser=userB, access=WRITE, inode="/tmp/hadoop-userA/mapred/staging"
:userA:supergroup:drwxr-xr-x

i googled around; someone recommended to change hadoop.tmp.dir to /tmp/hadoop.
This way it is almost a yay way; the thing is

if I submit as userA it will create /tmp/hadoop in local machine which
ownership will be userA.userA,
and once I tried to submit job from the same machine as userB I will
get  "Error creating temp dir in hadoop.tmp.dir /tmp/hadoop due to
Permission denied"
(as because /tmp/hadoop is own by userA.userA). vise versa if I delete
/tmp/hadoop and let the directory be created by userB, userA will not
be able to submit job.

Which is the right approach i should work with?
Please suggest

Patai


On Mon, Oct 15, 2012 at 3:18 PM, Harsh J <ha...@cloudera.com> wrote:
> Hi Patai,
>
> Reply inline.
>
> On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Thanks for input,
>>
>> I am reading the document; i forget to mention that i am on cdh3u4.
>
> That version should have the support for all of this.
>
>>> If you point your poolname property to mapred.job.queue.name, then you
>>> can leverage the Per-Queue ACLs
>>
>> Is that mean if i plan to 3 pools of fair scheduler, i have to
>> configure 3 queues of capacity scheduler. in order to have each pool
>> can leverage Per-Queue ACL of each queue.?
>
> Queues are not hard-tied into CapacityScheduler. You can have generic
> queues in MR. And FairScheduler can bind its Pool concept into the
> Queue configuration.
>
> All you need to do is the following:
>
> 1. Map FairScheduler pool name to reuse queue names itself:
>
> mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'
>
> 2. Define your required queues:
>
> mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
> default, foo and bar.
>
> 3. Define Submit ACLs for each Queue:
>
> mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
> (usernames groupnames)
>
> mapred.queue.foo.acl-submit-job set to "spam eggs"
>
> Likewise for remaining queues, as you need it…
>
> 4. Enable ACLs and restart JT.
>
> mapred.acls.enabled set to "true"
>
> 5. Users then use the right API to set queue names before submitting
> jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
> http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)
>
> 6. Done.
>
> Let us know if this works!
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hi Patai,

Reply inline.

On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks for input,
>
> I am reading the document; i forget to mention that i am on cdh3u4.

That version should have the support for all of this.

>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>
> Is that mean if i plan to 3 pools of fair scheduler, i have to
> configure 3 queues of capacity scheduler. in order to have each pool
> can leverage Per-Queue ACL of each queue.?

Queues are not hard-tied into CapacityScheduler. You can have generic
queues in MR. And FairScheduler can bind its Pool concept into the
Queue configuration.

All you need to do is the following:

1. Map FairScheduler pool name to reuse queue names itself:

mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'

2. Define your required queues:

mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
default, foo and bar.

3. Define Submit ACLs for each Queue:

mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
(usernames groupnames)

mapred.queue.foo.acl-submit-job set to "spam eggs"

Likewise for remaining queues, as you need it…

4. Enable ACLs and restart JT.

mapred.acls.enabled set to "true"

5. Users then use the right API to set queue names before submitting
jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)

6. Done.

Let us know if this works!

-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hi Patai,

Reply inline.

On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks for input,
>
> I am reading the document; i forget to mention that i am on cdh3u4.

That version should have the support for all of this.

>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>
> Is that mean if i plan to 3 pools of fair scheduler, i have to
> configure 3 queues of capacity scheduler. in order to have each pool
> can leverage Per-Queue ACL of each queue.?

Queues are not hard-tied into CapacityScheduler. You can have generic
queues in MR. And FairScheduler can bind its Pool concept into the
Queue configuration.

All you need to do is the following:

1. Map FairScheduler pool name to reuse queue names itself:

mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'

2. Define your required queues:

mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
default, foo and bar.

3. Define Submit ACLs for each Queue:

mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
(usernames groupnames)

mapred.queue.foo.acl-submit-job set to "spam eggs"

Likewise for remaining queues, as you need it…

4. Enable ACLs and restart JT.

mapred.acls.enabled set to "true"

5. Users then use the right API to set queue names before submitting
jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)

6. Done.

Let us know if this works!

-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hi Patai,

Reply inline.

On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks for input,
>
> I am reading the document; i forget to mention that i am on cdh3u4.

That version should have the support for all of this.

>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>
> Is that mean if i plan to 3 pools of fair scheduler, i have to
> configure 3 queues of capacity scheduler. in order to have each pool
> can leverage Per-Queue ACL of each queue.?

Queues are not hard-tied into CapacityScheduler. You can have generic
queues in MR. And FairScheduler can bind its Pool concept into the
Queue configuration.

All you need to do is the following:

1. Map FairScheduler pool name to reuse queue names itself:

mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'

2. Define your required queues:

mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
default, foo and bar.

3. Define Submit ACLs for each Queue:

mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
(usernames groupnames)

mapred.queue.foo.acl-submit-job set to "spam eggs"

Likewise for remaining queues, as you need it…

4. Enable ACLs and restart JT.

mapred.acls.enabled set to "true"

5. Users then use the right API to set queue names before submitting
jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)

6. Done.

Let us know if this works!

-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

Hi Patai,

Reply inline.

On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Thanks for input,
>
> I am reading the document; i forget to mention that i am on cdh3u4.

That version should have the support for all of this.

>> If you point your poolname property to mapred.job.queue.name, then you
>> can leverage the Per-Queue ACLs
>
> Is that mean if i plan to 3 pools of fair scheduler, i have to
> configure 3 queues of capacity scheduler. in order to have each pool
> can leverage Per-Queue ACL of each queue.?

Queues are not hard-tied into CapacityScheduler. You can have generic
queues in MR. And FairScheduler can bind its Pool concept into the
Queue configuration.

All you need to do is the following:

1. Map FairScheduler pool name to reuse queue names itself:

mapred.fairscheduler.poolnameproperty set to 'mapred.job.queue.name'

2. Define your required queues:

mapred.job.queues set to "default,foo,bar" for example, for 3 queues:
default, foo and bar.

3. Define Submit ACLs for each Queue:

mapred.queue.default.acl-submit-job set to "patai,foobar users,adm"
(usernames groupnames)

mapred.queue.foo.acl-submit-job set to "spam eggs"

Likewise for remaining queues, as you need it…

4. Enable ACLs and restart JT.

mapred.acls.enabled set to "true"

5. Users then use the right API to set queue names before submitting
jobs, or use -Dmapred.job.queue.name=value via CLI (if using Tool):
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/JobConf.html#setQueueName(java.lang.String)

6. Done.

Let us know if this works!

-- 
Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Thanks for input,

I am reading the document; i forget to mention that i am on cdh3u4.

> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs

Is that mean if i plan to 3 pools of fair scheduler, i have to
configure 3 queues of capacity scheduler. in order to have each pool
can leverage Per-Queue ACL of each queue.?


On Sat, Oct 13, 2012 at 8:30 PM, Harsh J <ha...@cloudera.com> wrote:
> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs described at
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization
> to do user/group based control.
>
> In addition, depending on the version/distribution of Apache Hadoop in
> use, you can set mapred.fairscheduler.allow.undeclared.pools to false
> in mapred-site.xml to disallow dynamic pool names (to box users to use
> specific poolnames and not get away with new ones).
>
> On Sun, Oct 14, 2012 at 6:03 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Is that anyway to control who can submit job to a pool.?
>>
>> Eg. Pool1, can run jobs submitted from any users except userx.
>>
>> Userx can submit jobs to poolx only. Can't submit to pool1.
>>
>> Hope this make sense.
>> Patai
>
>
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Thanks for input,

I am reading the document; i forget to mention that i am on cdh3u4.

> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs

Is that mean if i plan to 3 pools of fair scheduler, i have to
configure 3 queues of capacity scheduler. in order to have each pool
can leverage Per-Queue ACL of each queue.?


On Sat, Oct 13, 2012 at 8:30 PM, Harsh J <ha...@cloudera.com> wrote:
> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs described at
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization
> to do user/group based control.
>
> In addition, depending on the version/distribution of Apache Hadoop in
> use, you can set mapred.fairscheduler.allow.undeclared.pools to false
> in mapred-site.xml to disallow dynamic pool names (to box users to use
> specific poolnames and not get away with new ones).
>
> On Sun, Oct 14, 2012 at 6:03 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Is that anyway to control who can submit job to a pool.?
>>
>> Eg. Pool1, can run jobs submitted from any users except userx.
>>
>> Userx can submit jobs to poolx only. Can't submit to pool1.
>>
>> Hope this make sense.
>> Patai
>
>
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Thanks for input,

I am reading the document; i forget to mention that i am on cdh3u4.

> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs

Is that mean if i plan to 3 pools of fair scheduler, i have to
configure 3 queues of capacity scheduler. in order to have each pool
can leverage Per-Queue ACL of each queue.?


On Sat, Oct 13, 2012 at 8:30 PM, Harsh J <ha...@cloudera.com> wrote:
> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs described at
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization
> to do user/group based control.
>
> In addition, depending on the version/distribution of Apache Hadoop in
> use, you can set mapred.fairscheduler.allow.undeclared.pools to false
> in mapred-site.xml to disallow dynamic pool names (to box users to use
> specific poolnames and not get away with new ones).
>
> On Sun, Oct 14, 2012 at 6:03 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Is that anyway to control who can submit job to a pool.?
>>
>> Eg. Pool1, can run jobs submitted from any users except userx.
>>
>> Userx can submit jobs to poolx only. Can't submit to pool1.
>>
>> Hope this make sense.
>> Patai
>
>
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Patai Sangbutsarakum <si...@gmail.com>.

Thanks for input,

I am reading the document; i forget to mention that i am on cdh3u4.

> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs

Is that mean if i plan to 3 pools of fair scheduler, i have to
configure 3 queues of capacity scheduler. in order to have each pool
can leverage Per-Queue ACL of each queue.?


On Sat, Oct 13, 2012 at 8:30 PM, Harsh J <ha...@cloudera.com> wrote:
> If you point your poolname property to mapred.job.queue.name, then you
> can leverage the Per-Queue ACLs described at
> http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization
> to do user/group based control.
>
> In addition, depending on the version/distribution of Apache Hadoop in
> use, you can set mapred.fairscheduler.allow.undeclared.pools to false
> in mapred-site.xml to disallow dynamic pool names (to box users to use
> specific poolnames and not get away with new ones).
>
> On Sun, Oct 14, 2012 at 6:03 AM, Patai Sangbutsarakum
> <si...@gmail.com> wrote:
>> Is that anyway to control who can submit job to a pool.?
>>
>> Eg. Pool1, can run jobs submitted from any users except userx.
>>
>> Userx can submit jobs to poolx only. Can't submit to pool1.
>>
>> Hope this make sense.
>> Patai
>
>
>
> --
> Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

If you point your poolname property to mapred.job.queue.name, then you
can leverage the Per-Queue ACLs described at
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization
to do user/group based control.

In addition, depending on the version/distribution of Apache Hadoop in
use, you can set mapred.fairscheduler.allow.undeclared.pools to false
in mapred-site.xml to disallow dynamic pool names (to box users to use
specific poolnames and not get away with new ones).

On Sun, Oct 14, 2012 at 6:03 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Is that anyway to control who can submit job to a pool.?
>
> Eg. Pool1, can run jobs submitted from any users except userx.
>
> Userx can submit jobs to poolx only. Can't submit to pool1.
>
> Hope this make sense.
> Patai

-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

If you point your poolname property to mapred.job.queue.name, then you
can leverage the Per-Queue ACLs described at
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization
to do user/group based control.

In addition, depending on the version/distribution of Apache Hadoop in
use, you can set mapred.fairscheduler.allow.undeclared.pools to false
in mapred-site.xml to disallow dynamic pool names (to box users to use
specific poolnames and not get away with new ones).

On Sun, Oct 14, 2012 at 6:03 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Is that anyway to control who can submit job to a pool.?
>
> Eg. Pool1, can run jobs submitted from any users except userx.
>
> Userx can submit jobs to poolx only. Can't submit to pool1.
>
> Hope this make sense.
> Patai

-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

If you point your poolname property to mapred.job.queue.name, then you
can leverage the Per-Queue ACLs described at
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization
to do user/group based control.

In addition, depending on the version/distribution of Apache Hadoop in
use, you can set mapred.fairscheduler.allow.undeclared.pools to false
in mapred-site.xml to disallow dynamic pool names (to box users to use
specific poolnames and not get away with new ones).

On Sun, Oct 14, 2012 at 6:03 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Is that anyway to control who can submit job to a pool.?
>
> Eg. Pool1, can run jobs submitted from any users except userx.
>
> Userx can submit jobs to poolx only. Can't submit to pool1.
>
> Hope this make sense.
> Patai

-- 
Harsh J

Re: Fair scheduler.

Posted by Harsh J <ha...@cloudera.com>.

If you point your poolname property to mapred.job.queue.name, then you
can leverage the Per-Queue ACLs described at
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Authorization
to do user/group based control.

In addition, depending on the version/distribution of Apache Hadoop in
use, you can set mapred.fairscheduler.allow.undeclared.pools to false
in mapred-site.xml to disallow dynamic pool names (to box users to use
specific poolnames and not get away with new ones).

On Sun, Oct 14, 2012 at 6:03 AM, Patai Sangbutsarakum
<si...@gmail.com> wrote:
> Is that anyway to control who can submit job to a pool.?
>
> Eg. Pool1, can run jobs submitted from any users except userx.
>
> Userx can submit jobs to poolx only. Can't submit to pool1.
>
> Hope this make sense.
> Patai

-- 
Harsh J