You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Chris Carman <kr...@redlab.ee> on 2009/05/19 15:21:24 UTC
Access to local filesystem working folder in map task
hi users,
I have started writing my first project on Hadoop and am now seeking some
guidance from more experienced members.
The project is about running some CPU intensive computations in parallel and
should be a straightforward application for MapReduce, as the input dataset
can easily be partitioned to independent jobs and the final aggregation is a
low cost step. The application, however, relies on a legacy command line exe
file (which runs OK under wine). It reads about 10 small files (5mb) from its
working folder and produces another 10 as a result.
I can easily send those files and the app to all nodes via DistributedCache so
that they get stored read-only to the local file system. I now need to get a
local working folder for the task-attempt, where I could copy or symlink the
relevant inputs, execute the legacy exe, and read off the output. As I
understand, the task is returning an HDFS location when I ask for
FileOutputFormat.getWorkOutputPath(job);
I read from docs that there should be task-attempt local working folder, but I
struggle to find a way to get the filesystem path to it, so that I could copy
files and pass it in to my app for local processing.
Tell me it's an easy one that I've missed.
Many Thanks,
Chris
Re: Access to local filesystem working folder in map task
Posted by Tom White <to...@cloudera.com>.
Hi Chris,
The task-attempt local working folder is actually just the current
working directory of your map or reduce task. You should be able to
pass your legacy command line exe and other files using the -files
option (assuming you are using the Java interface to write your job,
and you are implementing Tool; streaming also supports the -files
option) and they will appear in the local working folder. You
shouldn't have to use the DistributedCache class directly at all.
Cheers,
Tom
On Tue, May 19, 2009 at 2:21 PM, Chris Carman <kr...@redlab.ee> wrote:
> hi users,
>
> I have started writing my first project on Hadoop and am now seeking some
> guidance from more experienced members.
>
> The project is about running some CPU intensive computations in parallel and
> should be a straightforward application for MapReduce, as the input dataset
> can easily be partitioned to independent jobs and the final aggregation is a
> low cost step. The application, however, relies on a legacy command line exe
> file (which runs OK under wine). It reads about 10 small files (5mb) from its
> working folder and produces another 10 as a result.
>
> I can easily send those files and the app to all nodes via DistributedCache so
> that they get stored read-only to the local file system. I now need to get a
> local working folder for the task-attempt, where I could copy or symlink the
> relevant inputs, execute the legacy exe, and read off the output. As I
> understand, the task is returning an HDFS location when I ask for
> FileOutputFormat.getWorkOutputPath(job);
>
> I read from docs that there should be task-attempt local working folder, but I
> struggle to find a way to get the filesystem path to it, so that I could copy
> files and pass it in to my app for local processing.
>
> Tell me it's an easy one that I've missed.
>
> Many Thanks,
> Chris
>