You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Terry Healy <th...@bnl.gov> on 2013/01/25 18:48:39 UTC

TT nodes distributed cache failure

Running hadoop-0.20.2 on a 20 node cluster.

When running a Map/Reduce job that uses several .jars loaded into the
Distributed cache, several (~4) nodes have their map jobs fails because
of ClassNotFoundException. All the other nodes proceed through the job
normally and the jobs completes. But this is wasting 20-25% of my TT nodes.

Can anyone explain why some nodes might fail to read all the .jars from
the Distributed cache?

Thanks

Re: TT nodes distributed cache failure

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Could you post the stack trace from the job logs. Also looking at the task
tracker logs on the failed nodes may help.

Thanks
Hemanth

On Friday, January 25, 2013, Terry Healy wrote:

> Running hadoop-0.20.2 on a 20 node cluster.
>
> When running a Map/Reduce job that uses several .jars loaded into the
> Distributed cache, several (~4) nodes have their map jobs fails because
> of ClassNotFoundException. All the other nodes proceed through the job
> normally and the jobs completes. But this is wasting 20-25% of my TT nodes.
>
> Can anyone explain why some nodes might fail to read all the .jars from
> the Distributed cache?
>
> Thanks
>

Re: TT nodes distributed cache failure

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Could you post the stack trace from the job logs. Also looking at the task
tracker logs on the failed nodes may help.

Thanks
Hemanth

On Friday, January 25, 2013, Terry Healy wrote:

> Running hadoop-0.20.2 on a 20 node cluster.
>
> When running a Map/Reduce job that uses several .jars loaded into the
> Distributed cache, several (~4) nodes have their map jobs fails because
> of ClassNotFoundException. All the other nodes proceed through the job
> normally and the jobs completes. But this is wasting 20-25% of my TT nodes.
>
> Can anyone explain why some nodes might fail to read all the .jars from
> the Distributed cache?
>
> Thanks
>

Re: TT nodes distributed cache failure

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Could you post the stack trace from the job logs. Also looking at the task
tracker logs on the failed nodes may help.

Thanks
Hemanth

On Friday, January 25, 2013, Terry Healy wrote:

> Running hadoop-0.20.2 on a 20 node cluster.
>
> When running a Map/Reduce job that uses several .jars loaded into the
> Distributed cache, several (~4) nodes have their map jobs fails because
> of ClassNotFoundException. All the other nodes proceed through the job
> normally and the jobs completes. But this is wasting 20-25% of my TT nodes.
>
> Can anyone explain why some nodes might fail to read all the .jars from
> the Distributed cache?
>
> Thanks
>

Re: TT nodes distributed cache failure

Posted by Hemanth Yamijala <yh...@thoughtworks.com>.
Could you post the stack trace from the job logs. Also looking at the task
tracker logs on the failed nodes may help.

Thanks
Hemanth

On Friday, January 25, 2013, Terry Healy wrote:

> Running hadoop-0.20.2 on a 20 node cluster.
>
> When running a Map/Reduce job that uses several .jars loaded into the
> Distributed cache, several (~4) nodes have their map jobs fails because
> of ClassNotFoundException. All the other nodes proceed through the job
> normally and the jobs completes. But this is wasting 20-25% of my TT nodes.
>
> Can anyone explain why some nodes might fail to read all the .jars from
> the Distributed cache?
>
> Thanks
>