You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Todd Lipcon <to...@cloudera.com> on 2011/05/06 20:49:47 UTC

Re: Problems with LinuxTaskController, LocalJobRunner, and localRunner directory

Hi Jeremy,

That's a good point - we don't currently do a good job of segregating the
configurations used for the LJR from the configs used for the TaskTracker.
In particular I think both mapred.local.dir and mapred.system.dir are used
by both.

You run into the same issue when trying to use LJR on a system with a
configured cluster, even if not using the LinuxTaskController features.

I'd recommend making a separate hadoop conf/ directory with a different
setting for mapred.local.dir.

-Todd

On Fri, May 6, 2011 at 11:45 AM, <je...@lewi.us> wrote:

> Hi,
>
> I'm running hadoop (Cloudera release 3) in pseudo distributed mode, with
> the linux task controller so that jobs will run as the user who submitted
> them.
>
> My program (which uses hadoop cascading) fires off a job using
> LocalJobRunner (I think to read data from the local filesystem). So far so
> good.
> The job creates the directory
> /var/lib/hadoop-0.20/cache/pseudo/localRunner
> (/var/lib/hadoop-0.20/cache/pseudo being the value of mapred.local.dir)
>
> The problem is that localRunner isn't owned by the user mapred. Instead its
> owned by the user who submitted the job. The next time I restart the
> daemons, the task tracker will fail because it can't rename
> /var/lib/hadoop-0.20/cache/pseudo/localRunner.
>
> Does anybody have suggestions how to fix this?
>
> Thanks
> Jeremy
>
>
>

-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Problems with LinuxTaskController, LocalJobRunner, and localRunner directory

Posted by je...@lewi.us.

Thanks Todd.

Unfortunately, I'm using Hadoop cascading, so I'm not sure if there's  
an easy mechanism to force LocalJobs it fires off to use a different  
configuration. I'll talk to the Cascading folks and find out.

J


Quoting Todd Lipcon <to...@cloudera.com>:

> Hi Jeremy,
>
> That's a good point - we don't currently do a good job of segregating the
> configurations used for the LJR from the configs used for the TaskTracker.
> In particular I think both mapred.local.dir and mapred.system.dir are used
> by both.
>
> You run into the same issue when trying to use LJR on a system with a
> configured cluster, even if not using the LinuxTaskController features.
>
> I'd recommend making a separate hadoop conf/ directory with a different
> setting for mapred.local.dir.
>
> -Todd
>
> On Fri, May 6, 2011 at 11:45 AM, <je...@lewi.us> wrote:
>
>> Hi,
>>
>> I'm running hadoop (Cloudera release 3) in pseudo distributed mode, with
>> the linux task controller so that jobs will run as the user who submitted
>> them.
>>
>> My program (which uses hadoop cascading) fires off a job using
>> LocalJobRunner (I think to read data from the local filesystem). So far so
>> good.
>> The job creates the directory
>> /var/lib/hadoop-0.20/cache/pseudo/localRunner
>> (/var/lib/hadoop-0.20/cache/pseudo being the value of mapred.local.dir)
>>
>> The problem is that localRunner isn't owned by the user mapred. Instead its
>> owned by the user who submitted the job. The next time I restart the
>> daemons, the task tracker will fail because it can't rename
>> /var/lib/hadoop-0.20/cache/pseudo/localRunner.
>>
>> Does anybody have suggestions how to fix this?
>>
>> Thanks
>> Jeremy
>>
>>
>>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>