You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "W.P. McNeill" <bi...@gmail.com> on 2012/01/09 19:30:57 UTC

Adding a soft-linked archive file to the distributed cache doesn't work as advertised

I am trying to add a zip file to the distributed cache and have it unzipped
on the task nodes with a softlink to the unzipped directory placed in the
working directory of my mapper process. I think I'm doing everything the
way the documentation tells me to, but it's not working.

On the client in the run() function while I'm creating the job I first call:

fs.copyFromLocalFile("gate-app.zip", "/tmp/gate-app.zip");

As expected, this copies the archive file gate-app.zip to the HDFS
directory /tmp.

Then I call

DistributedCache.addCacheArchive("/tmp/gate-app.zip#gate-app",
configuration);

I expect this to add "/tmp/gate-app.zip" to the distributed cache and put a
softlink to it called gate-app in the working directory of each task.
However, when I call job.waitForCompletion(), I see the following error:

Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/gate-app.zip#gate-app.

It appears that the distributed cache mechanism is interpreting the entire
URI as the literal name of the file, instead of treating the fragment as
the name of the softlink.

As far as I can tell, I'm doing this correctly according to the API
documentation:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
.

The full project in which I'm doing this is up on github:
https://github.com/wpm/Hadoop-GATE.

Can someone tell me what I'm doing wrong?

Re: Adding a soft-linked archive file to the distributed cache doesn't work as advertised

Posted by "W.P. McNeill" <bi...@gmail.com>.
I added a DistributedCache.createSymlink(configuration) call right after
the addCacheArcihve() call, but see the same error.

On Mon, Jan 9, 2012 at 11:05 AM, Alejandro Abdelnur <tu...@cloudera.com>wrote:

> Bill,
>
> In addition you must call DistributedCached.createSymlink(configuration),
> that should do.
>
> Thxs.
>
> Alejandro
>
> On Mon, Jan 9, 2012 at 10:30 AM, W.P. McNeill <bi...@gmail.com> wrote:
>
> > I am trying to add a zip file to the distributed cache and have it
> unzipped
> > on the task nodes with a softlink to the unzipped directory placed in the
> > working directory of my mapper process. I think I'm doing everything the
> > way the documentation tells me to, but it's not working.
> >
> > On the client in the run() function while I'm creating the job I first
> > call:
> >
> > fs.copyFromLocalFile("gate-app.zip", "/tmp/gate-app.zip");
> >
> > As expected, this copies the archive file gate-app.zip to the HDFS
> > directory /tmp.
> >
> > Then I call
> >
> > DistributedCache.addCacheArchive("/tmp/gate-app.zip#gate-app",
> > configuration);
> >
> > I expect this to add "/tmp/gate-app.zip" to the distributed cache and
> put a
> > softlink to it called gate-app in the working directory of each task.
> > However, when I call job.waitForCompletion(), I see the following error:
> >
> > Exception in thread "main" java.io.FileNotFoundException: File does not
> > exist: /tmp/gate-app.zip#gate-app.
> >
> > It appears that the distributed cache mechanism is interpreting the
> entire
> > URI as the literal name of the file, instead of treating the fragment as
> > the name of the softlink.
> >
> > As far as I can tell, I'm doing this correctly according to the API
> > documentation:
> >
> >
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
> > .
> >
> > The full project in which I'm doing this is up on github:
> > https://github.com/wpm/Hadoop-GATE.
> >
> > Can someone tell me what I'm doing wrong?
> >
>

Re: Adding a soft-linked archive file to the distributed cache doesn't work as advertised

Posted by Alejandro Abdelnur <tu...@cloudera.com>.
Bill,

In addition you must call DistributedCached.createSymlink(configuration),
that should do.

Thxs.

Alejandro

On Mon, Jan 9, 2012 at 10:30 AM, W.P. McNeill <bi...@gmail.com> wrote:

> I am trying to add a zip file to the distributed cache and have it unzipped
> on the task nodes with a softlink to the unzipped directory placed in the
> working directory of my mapper process. I think I'm doing everything the
> way the documentation tells me to, but it's not working.
>
> On the client in the run() function while I'm creating the job I first
> call:
>
> fs.copyFromLocalFile("gate-app.zip", "/tmp/gate-app.zip");
>
> As expected, this copies the archive file gate-app.zip to the HDFS
> directory /tmp.
>
> Then I call
>
> DistributedCache.addCacheArchive("/tmp/gate-app.zip#gate-app",
> configuration);
>
> I expect this to add "/tmp/gate-app.zip" to the distributed cache and put a
> softlink to it called gate-app in the working directory of each task.
> However, when I call job.waitForCompletion(), I see the following error:
>
> Exception in thread "main" java.io.FileNotFoundException: File does not
> exist: /tmp/gate-app.zip#gate-app.
>
> It appears that the distributed cache mechanism is interpreting the entire
> URI as the literal name of the file, instead of treating the fragment as
> the name of the softlink.
>
> As far as I can tell, I'm doing this correctly according to the API
> documentation:
>
> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
> .
>
> The full project in which I'm doing this is up on github:
> https://github.com/wpm/Hadoop-GATE.
>
> Can someone tell me what I'm doing wrong?
>