You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jamie Cockrill <ja...@gmail.com> on 2010/07/16 10:58:30 UTC

Problem with DistributedCache after upgrading to CDH3b2

Dear All,

We recently upgraded from CDH3b1 to b2 and ever since, all our
mapreduce jobs that use the DistributedCache have failed. Typically,
we add files to the cache prior to job startup, using
addCacheFile(URI, conf) and then get them on the other side, using
getLocalCacheFiles(conf). I believe the hadoop-core versions for these
are 0.20.2+228 and +320 respectively.

We then open the files and read them in using a standard FileReader,
using the toString on the path object as the constructor parameter,
which has worked fine up to now. However, we're now getting
FileNotFound exceptions when the file reader tries to open the file.

Unfortunately the cluster is on an airgapped network, but the
FileNotFound line comes out like:

java.io.FileNotFoundException:
/tmp/hadoop-hadoop/mapred/local/taskTracker/archive/master/path/to/my/file/filename.txt/filename.txt

Note, the duplication of filename.txt is deliberate. I'm not sure if
that's strange or not as this has previously worked absolutely fine.
Has anyone else experienced this? Apologies if this is known, I've
only just joined the list.

Many thanks,

Jamie

Re: Problem with DistributedCache after upgrading to CDH3b2

Posted by Patrick Angeles <pa...@cloudera.com>.

Kim, Jamie,

This might be a particular issue with the Cloudera distro, specifically with
the AsyncDiskService related patches that were applied to 0.20.2+320 (aka
CDH3b2).

I've created an issue here:

https://issues.cloudera.org/browse/DISTRO-39

I encourage you (and anyone else reading this) to continue the discussion on
that JIRA, or at

https://groups.google.com/a/cloudera.org/group/cdh-user/topics

Regards,

- Patrick


On Tue, Oct 5, 2010 at 7:40 PM, Kim Vogt <ki...@simplegeo.com> wrote:

> Hey Jamie,
>
> Thanks for the reply.  I asked about it the cloudera IRC so maybe they'll
> look into it, in the meantime, I'm going to go ahead and copy that file
> over
> to my datanodes :-)
>
> -Kim
>
> On Tue, Oct 5, 2010 at 2:54 PM, Jamie Cockrill <jamie.cockrill@gmail.com
> >wrote:
>
> > Hi Kim,
> >
> > We didn't fix it in the end. I just ended up manually writing the
> > files to the cluster using the FileSystem class, and then reading them
> > back out again on the other side. Not terribly efficient as I guess
> > the point of DistributedCache is that the files get distributed to
> > every node, whereas I'm only writing to two or three nodes, then every
> > map-task is then trying to read back from those two or three nodes the
> > data are stored on.
> >
> > Unfortunately I didn't have the will or inclination to investigate it
> > any further as I had some pretty tight deadlines to keep to and it
> > hasn't caused me any significant problems yet...
> >
> > Thanks,
> >
> > Jamie
> >
> > On 5 October 2010 22:30, Kim Vogt <ki...@simplegeo.com> wrote:
> > > I'm experiencing the same problem.  I was hoping there were be a reply
> to
> > > this.  Anyone? Bueller?
> > >
> > > -Kim
> > >
> > > On Fri, Jul 16, 2010 at 1:58 AM, Jamie Cockrill <
> > jamie.cockrill@gmail.com>wrote:
> > >
> > >> Dear All,
> > >>
> > >> We recently upgraded from CDH3b1 to b2 and ever since, all our
> > >> mapreduce jobs that use the DistributedCache have failed. Typically,
> > >> we add files to the cache prior to job startup, using
> > >> addCacheFile(URI, conf) and then get them on the other side, using
> > >> getLocalCacheFiles(conf). I believe the hadoop-core versions for these
> > >> are 0.20.2+228 and +320 respectively.
> > >>
> > >> We then open the files and read them in using a standard FileReader,
> > >> using the toString on the path object as the constructor parameter,
> > >> which has worked fine up to now. However, we're now getting
> > >> FileNotFound exceptions when the file reader tries to open the file.
> > >>
> > >> Unfortunately the cluster is on an airgapped network, but the
> > >> FileNotFound line comes out like:
> > >>
> > >> java.io.FileNotFoundException:
> > >>
> > >>
> >
> /tmp/hadoop-hadoop/mapred/local/taskTracker/archive/master/path/to/my/file/filename.txt/filename.txt
> > >>
> > >> Note, the duplication of filename.txt is deliberate. I'm not sure if
> > >> that's strange or not as this has previously worked absolutely fine.
> > >> Has anyone else experienced this? Apologies if this is known, I've
> > >> only just joined the list.
> > >>
> > >> Many thanks,
> > >>
> > >> Jamie
> > >>
> > >
> >
>

Re: Problem with DistributedCache after upgrading to CDH3b2

Posted by Kim Vogt <ki...@simplegeo.com>.

Hey Jamie,

Thanks for the reply.  I asked about it the cloudera IRC so maybe they'll
look into it, in the meantime, I'm going to go ahead and copy that file over
to my datanodes :-)

-Kim

On Tue, Oct 5, 2010 at 2:54 PM, Jamie Cockrill <ja...@gmail.com>wrote:

> Hi Kim,
>
> We didn't fix it in the end. I just ended up manually writing the
> files to the cluster using the FileSystem class, and then reading them
> back out again on the other side. Not terribly efficient as I guess
> the point of DistributedCache is that the files get distributed to
> every node, whereas I'm only writing to two or three nodes, then every
> map-task is then trying to read back from those two or three nodes the
> data are stored on.
>
> Unfortunately I didn't have the will or inclination to investigate it
> any further as I had some pretty tight deadlines to keep to and it
> hasn't caused me any significant problems yet...
>
> Thanks,
>
> Jamie
>
> On 5 October 2010 22:30, Kim Vogt <ki...@simplegeo.com> wrote:
> > I'm experiencing the same problem.  I was hoping there were be a reply to
> > this.  Anyone? Bueller?
> >
> > -Kim
> >
> > On Fri, Jul 16, 2010 at 1:58 AM, Jamie Cockrill <
> jamie.cockrill@gmail.com>wrote:
> >
> >> Dear All,
> >>
> >> We recently upgraded from CDH3b1 to b2 and ever since, all our
> >> mapreduce jobs that use the DistributedCache have failed. Typically,
> >> we add files to the cache prior to job startup, using
> >> addCacheFile(URI, conf) and then get them on the other side, using
> >> getLocalCacheFiles(conf). I believe the hadoop-core versions for these
> >> are 0.20.2+228 and +320 respectively.
> >>
> >> We then open the files and read them in using a standard FileReader,
> >> using the toString on the path object as the constructor parameter,
> >> which has worked fine up to now. However, we're now getting
> >> FileNotFound exceptions when the file reader tries to open the file.
> >>
> >> Unfortunately the cluster is on an airgapped network, but the
> >> FileNotFound line comes out like:
> >>
> >> java.io.FileNotFoundException:
> >>
> >>
> /tmp/hadoop-hadoop/mapred/local/taskTracker/archive/master/path/to/my/file/filename.txt/filename.txt
> >>
> >> Note, the duplication of filename.txt is deliberate. I'm not sure if
> >> that's strange or not as this has previously worked absolutely fine.
> >> Has anyone else experienced this? Apologies if this is known, I've
> >> only just joined the list.
> >>
> >> Many thanks,
> >>
> >> Jamie
> >>
> >
>

Re: Problem with DistributedCache after upgrading to CDH3b2

Posted by Jamie Cockrill <ja...@gmail.com>.

Hi Kim,

We didn't fix it in the end. I just ended up manually writing the
files to the cluster using the FileSystem class, and then reading them
back out again on the other side. Not terribly efficient as I guess
the point of DistributedCache is that the files get distributed to
every node, whereas I'm only writing to two or three nodes, then every
map-task is then trying to read back from those two or three nodes the
data are stored on.

Unfortunately I didn't have the will or inclination to investigate it
any further as I had some pretty tight deadlines to keep to and it
hasn't caused me any significant problems yet...

Thanks,

Jamie

On 5 October 2010 22:30, Kim Vogt <ki...@simplegeo.com> wrote:
> I'm experiencing the same problem.  I was hoping there were be a reply to
> this.  Anyone? Bueller?
>
> -Kim
>
> On Fri, Jul 16, 2010 at 1:58 AM, Jamie Cockrill <ja...@gmail.com>wrote:
>
>> Dear All,
>>
>> We recently upgraded from CDH3b1 to b2 and ever since, all our
>> mapreduce jobs that use the DistributedCache have failed. Typically,
>> we add files to the cache prior to job startup, using
>> addCacheFile(URI, conf) and then get them on the other side, using
>> getLocalCacheFiles(conf). I believe the hadoop-core versions for these
>> are 0.20.2+228 and +320 respectively.
>>
>> We then open the files and read them in using a standard FileReader,
>> using the toString on the path object as the constructor parameter,
>> which has worked fine up to now. However, we're now getting
>> FileNotFound exceptions when the file reader tries to open the file.
>>
>> Unfortunately the cluster is on an airgapped network, but the
>> FileNotFound line comes out like:
>>
>> java.io.FileNotFoundException:
>>
>> /tmp/hadoop-hadoop/mapred/local/taskTracker/archive/master/path/to/my/file/filename.txt/filename.txt
>>
>> Note, the duplication of filename.txt is deliberate. I'm not sure if
>> that's strange or not as this has previously worked absolutely fine.
>> Has anyone else experienced this? Apologies if this is known, I've
>> only just joined the list.
>>
>> Many thanks,
>>
>> Jamie
>>
>

Re: Problem with DistributedCache after upgrading to CDH3b2

Posted by Kim Vogt <ki...@simplegeo.com>.

I'm experiencing the same problem.  I was hoping there were be a reply to
this.  Anyone? Bueller?

-Kim

On Fri, Jul 16, 2010 at 1:58 AM, Jamie Cockrill <ja...@gmail.com>wrote:

> Dear All,
>
> We recently upgraded from CDH3b1 to b2 and ever since, all our
> mapreduce jobs that use the DistributedCache have failed. Typically,
> we add files to the cache prior to job startup, using
> addCacheFile(URI, conf) and then get them on the other side, using
> getLocalCacheFiles(conf). I believe the hadoop-core versions for these
> are 0.20.2+228 and +320 respectively.
>
> We then open the files and read them in using a standard FileReader,
> using the toString on the path object as the constructor parameter,
> which has worked fine up to now. However, we're now getting
> FileNotFound exceptions when the file reader tries to open the file.
>
> Unfortunately the cluster is on an airgapped network, but the
> FileNotFound line comes out like:
>
> java.io.FileNotFoundException:
>
> /tmp/hadoop-hadoop/mapred/local/taskTracker/archive/master/path/to/my/file/filename.txt/filename.txt
>
> Note, the duplication of filename.txt is deliberate. I'm not sure if
> that's strange or not as this has previously worked absolutely fine.
> Has anyone else experienced this? Apologies if this is known, I've
> only just joined the list.
>
> Many thanks,
>
> Jamie
>