You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Joris Poort <gp...@gmail.com> on 2011/09/26 19:50:37 UTC

Execution directory for child process within mapper

As part of my Java mapper I have a command executes some standalone
code on a local slave node. When I run a code it executes fine, unless
it is trying to access some local files in which case I get the error
that it cannot locate those files.

Digging a little deeper it seems to be executing from the following directory:

    /data/hadoop/mapred/local/taskTracker/{user}/jobcache/job_201109261253_0023/attempt_201109261253_0023_m_000001_0/work

But I am intending to execute from a local directory where the
relevant files are located:

    /home/users/{user}/input/jobname

Is there a way in java/hadoop to force the execution from the local
directory, instead of the jobcache directory automatically created in
hadoop?

Is there perhaps a better way to go about this?

Any help on this would be greatly appreciated!

Cheers,

Joris

Re: Execution directory for child process within mapper

Posted by Steve Lewis <lo...@gmail.com>.

I had a similar issue - when I needed the same file for each reduce (or map
task) I simply added Java code to the setup method to write a file to ".".
When every map needed different files I wrote the files before calling the
executable. The trick also works when the code writes to a file rather than
stdout

On Mon, Sep 26, 2011 at 12:19 PM, Devaraj k <de...@huawei.com> wrote:

> Localized distributed cache also can be helpful here, if you can do
> necessary changes to your code. It locates like this in local directory
> ${mapred.local.dir}/taskTracker/archive/.
>
> As per your explanation, I feel you can write the mapper in such way that
> copy the files from your customized location(
> /home/users/{user}/input/jobname) to the current working directory and then
> start executing the executable.
>
> I hope this helps. :)
>
>
> Thanks
> Devaraj
> ________________________________________
> From: Joris Poort [gpoort@gmail.com]
> Sent: Tuesday, September 27, 2011 12:25 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Execution directory for child process within mapper
>
> Hi Devaraj,
>
> Thanks for your help - that makes sense.  Is there any way to copy the
> local files needed for execution to the mapred.local.dir?
> Unfortunately I'm running a local code which I cannot edit - and this
> code is the one which assumes these files are available in the same
> directory.
>
> Thanks!
>
> Joris
>
> On Mon, Sep 26, 2011 at 11:40 AM, Devaraj k <de...@huawei.com> wrote:
> > Hi Joris,
> >
> > You cannot configure the work directory directly. You can configure the
> local directory with property 'mapred.local.dir', and it will be used
> further to create the work directory like
> '${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work'. Based on
> this, you can relatively refer your local command to execute.
> >
> > I hope this page will help you to understand the directory structure
> clearly.
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Directory+Structure
> >
> >
> > Thanks
> > Devaraj
> > ________________________________________
> > From: Joris Poort [gpoort@gmail.com]
> > Sent: Monday, September 26, 2011 11:20 PM
> > To: mapreduce-user
> > Subject: Execution directory for child process within mapper
> >
> > As part of my Java mapper I have a command executes some standalone
> > code on a local slave node. When I run a code it executes fine, unless
> > it is trying to access some local files in which case I get the error
> > that it cannot locate those files.
> >
> > Digging a little deeper it seems to be executing from the following
> directory:
> >
> >
>  /data/hadoop/mapred/local/taskTracker/{user}/jobcache/job_201109261253_0023/attempt_201109261253_0023_m_000001_0/work
> >
> > But I am intending to execute from a local directory where the
> > relevant files are located:
> >
> >    /home/users/{user}/input/jobname
> >
> > Is there a way in java/hadoop to force the execution from the local
> > directory, instead of the jobcache directory automatically created in
> > hadoop?
> >
> > Is there perhaps a better way to go about this?
> >
> > Any help on this would be greatly appreciated!
> >
> > Cheers,
> >
> > Joris
> >
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

RE: Execution directory for child process within mapper

Posted by Devaraj k <de...@huawei.com>.

Localized distributed cache also can be helpful here, if you can do necessary changes to your code. It locates like this in local directory ${mapred.local.dir}/taskTracker/archive/.

As per your explanation, I feel you can write the mapper in such way that copy the files from your customized location( /home/users/{user}/input/jobname) to the current working directory and then start executing the executable. 

I hope this helps. :)


Thanks
Devaraj
________________________________________
From: Joris Poort [gpoort@gmail.com]
Sent: Tuesday, September 27, 2011 12:25 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Execution directory for child process within mapper

Hi Devaraj,

Thanks for your help - that makes sense.  Is there any way to copy the
local files needed for execution to the mapred.local.dir?
Unfortunately I'm running a local code which I cannot edit - and this
code is the one which assumes these files are available in the same
directory.

Thanks!

Joris

On Mon, Sep 26, 2011 at 11:40 AM, Devaraj k <de...@huawei.com> wrote:
> Hi Joris,
>
> You cannot configure the work directory directly. You can configure the local directory with property 'mapred.local.dir', and it will be used further to create the work directory like '${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work'. Based on this, you can relatively refer your local command to execute.
>
> I hope this page will help you to understand the directory structure clearly. http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Directory+Structure
>
>
> Thanks
> Devaraj
> ________________________________________
> From: Joris Poort [gpoort@gmail.com]
> Sent: Monday, September 26, 2011 11:20 PM
> To: mapreduce-user
> Subject: Execution directory for child process within mapper
>
> As part of my Java mapper I have a command executes some standalone
> code on a local slave node. When I run a code it executes fine, unless
> it is trying to access some local files in which case I get the error
> that it cannot locate those files.
>
> Digging a little deeper it seems to be executing from the following directory:
>
>    /data/hadoop/mapred/local/taskTracker/{user}/jobcache/job_201109261253_0023/attempt_201109261253_0023_m_000001_0/work
>
> But I am intending to execute from a local directory where the
> relevant files are located:
>
>    /home/users/{user}/input/jobname
>
> Is there a way in java/hadoop to force the execution from the local
> directory, instead of the jobcache directory automatically created in
> hadoop?
>
> Is there perhaps a better way to go about this?
>
> Any help on this would be greatly appreciated!
>
> Cheers,
>
> Joris
>

Re: Execution directory for child process within mapper

Posted by Joris Poort <gp...@gmail.com>.

Hi Devaraj,

Thanks for your help - that makes sense.  Is there any way to copy the
local files needed for execution to the mapred.local.dir?
Unfortunately I'm running a local code which I cannot edit - and this
code is the one which assumes these files are available in the same
directory.

Thanks!

Joris

On Mon, Sep 26, 2011 at 11:40 AM, Devaraj k <de...@huawei.com> wrote:
> Hi Joris,
>
> You cannot configure the work directory directly. You can configure the local directory with property 'mapred.local.dir', and it will be used further to create the work directory like '${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work'. Based on this, you can relatively refer your local command to execute.
>
> I hope this page will help you to understand the directory structure clearly. http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Directory+Structure
>
>
> Thanks
> Devaraj
> ________________________________________
> From: Joris Poort [gpoort@gmail.com]
> Sent: Monday, September 26, 2011 11:20 PM
> To: mapreduce-user
> Subject: Execution directory for child process within mapper
>
> As part of my Java mapper I have a command executes some standalone
> code on a local slave node. When I run a code it executes fine, unless
> it is trying to access some local files in which case I get the error
> that it cannot locate those files.
>
> Digging a little deeper it seems to be executing from the following directory:
>
>    /data/hadoop/mapred/local/taskTracker/{user}/jobcache/job_201109261253_0023/attempt_201109261253_0023_m_000001_0/work
>
> But I am intending to execute from a local directory where the
> relevant files are located:
>
>    /home/users/{user}/input/jobname
>
> Is there a way in java/hadoop to force the execution from the local
> directory, instead of the jobcache directory automatically created in
> hadoop?
>
> Is there perhaps a better way to go about this?
>
> Any help on this would be greatly appreciated!
>
> Cheers,
>
> Joris
>

RE: Execution directory for child process within mapper

Posted by Devaraj k <de...@huawei.com>.

Hi Joris,

You cannot configure the work directory directly. You can configure the local directory with property 'mapred.local.dir', and it will be used further to create the work directory like '${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work'. Based on this, you can relatively refer your local command to execute.

I hope this page will help you to understand the directory structure clearly. http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#Directory+Structure


Thanks
Devaraj
________________________________________
From: Joris Poort [gpoort@gmail.com]
Sent: Monday, September 26, 2011 11:20 PM
To: mapreduce-user
Subject: Execution directory for child process within mapper

As part of my Java mapper I have a command executes some standalone
code on a local slave node. When I run a code it executes fine, unless
it is trying to access some local files in which case I get the error
that it cannot locate those files.

Digging a little deeper it seems to be executing from the following directory:

    /data/hadoop/mapred/local/taskTracker/{user}/jobcache/job_201109261253_0023/attempt_201109261253_0023_m_000001_0/work

But I am intending to execute from a local directory where the
relevant files are located:

    /home/users/{user}/input/jobname

Is there a way in java/hadoop to force the execution from the local
directory, instead of the jobcache directory automatically created in
hadoop?

Is there perhaps a better way to go about this?

Any help on this would be greatly appreciated!

Cheers,

Joris