You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Marco Didonna <m....@gmail.com> on 2011/10/27 18:43:11 UTC

Weird NPE at TaskLogAppender.flush()

Hello everybody,
I am working on Terrier (www.terrier.org) an IR toolkit that leverages
hadoop for indexing large amount of data (ie documents). I am working
both local with a small subset of the whole dataset and on amazon EC2
with the full size dataset. I am experiencing a weird (at least to me)
exception which occurs always at 66% of the map phase. Here's the log
http://pastebin.com/XtUkHFYE. I really have no idea where the problem
could be.
>From the original Terrier3.5 I've only modified the inputformat which
is used to read the collection of document: I use a custom
sequencefileinputformat in order to process a custom sequence file
made up of all the tiny documents of the trec collection (a standard
document collection used in IR).
I guess the problem is not here since even using unmodified version of
terrier I get the same error. In that case, however, there is no
failure maybe because the authors of terrier use MultiFileCollection.

I'd love to hear from somebody since when running the indexing job on
the whole dataset the jobs fails because this error happens more than
once. In pseudo mode, after a failure the job is completed ... on the
cloud it isn't.

Thanks for your time

Marco Didonna

PS: I use both locally and on the cloud latest version of cloudera
distribution for hadoop

Re: Weird NPE at TaskLogAppender.flush()

Posted by Marco Didonna <m....@gmail.com>.
Dear Eric,
thanks for your answer but if the problem was a mismatch with the hadoop
distribution used hadoop would have complained in a very specific way,
saying "protocol mismatch, ...[..]" : been there :) . The problem was more
subtle, and thanks to the extreme kind Vinod Kumar we figured out (using
IRC chat) that the bug TR-111 was causing that NPE. The bug is marked as
not important and trivial but to me was extremely important :)

Marco



On 28 October 2011 14:47, Eric Fiala <er...@fiala.ca> wrote:

> Marco,
> I'm not familiar with terrier - however, I do notice that the download
> package includes [ hadoop-0.20.2+228-core.jar ] - try changing that out for
> the jar provided in the distribution.
> If that doesn't fix it, look into the other jars provided (or make sure
> the ones from your hadoop distro are being sourced prior to those) - your
> error on pastebin feels alot like a slight version mismatch.
>
> hth
>
> EF
>
> On Thu, Oct 27, 2011 at 10:43 AM, Marco Didonna <m....@gmail.com>wrote:
>
>> Hello everybody,
>> I am working on Terrier (www.terrier.org) an IR toolkit that leverages
>> hadoop for indexing large amount of data (ie documents). I am working
>> both local with a small subset of the whole dataset and on amazon EC2
>> with the full size dataset. I am experiencing a weird (at least to me)
>> exception which occurs always at 66% of the map phase. Here's the log
>> http://pastebin.com/XtUkHFYE. I really have no idea where the problem
>> could be.
>> From the original Terrier3.5 I've only modified the inputformat which
>> is used to read the collection of document: I use a custom
>> sequencefileinputformat in order to process a custom sequence file
>> made up of all the tiny documents of the trec collection (a standard
>> document collection used in IR).
>> I guess the problem is not here since even using unmodified version of
>> terrier I get the same error. In that case, however, there is no
>> failure maybe because the authors of terrier use MultiFileCollection.
>>
>> I'd love to hear from somebody since when running the indexing job on
>> the whole dataset the jobs fails because this error happens more than
>> once. In pseudo mode, after a failure the job is completed ... on the
>> cloud it isn't.
>>
>> Thanks for your time
>>
>> Marco Didonna
>>
>> PS: I use both locally and on the cloud latest version of cloudera
>> distribution for hadoop
>>
>
>
>
> --
> *Eric Fiala*
> *Fiala Consulting*
> T: 403.828.1117
> E: eric@fiala.ca
> http://www.fiala.ca
>
>

Re: Weird NPE at TaskLogAppender.flush()

Posted by Eric Fiala <er...@fiala.ca>.
Marco,
I'm not familiar with terrier - however, I do notice that the download
package includes [ hadoop-0.20.2+228-core.jar ] - try changing that out for
the jar provided in the distribution.
If that doesn't fix it, look into the other jars provided (or make sure the
ones from your hadoop distro are being sourced prior to those) - your error
on pastebin feels alot like a slight version mismatch.

hth

EF

On Thu, Oct 27, 2011 at 10:43 AM, Marco Didonna <m....@gmail.com>wrote:

> Hello everybody,
> I am working on Terrier (www.terrier.org) an IR toolkit that leverages
> hadoop for indexing large amount of data (ie documents). I am working
> both local with a small subset of the whole dataset and on amazon EC2
> with the full size dataset. I am experiencing a weird (at least to me)
> exception which occurs always at 66% of the map phase. Here's the log
> http://pastebin.com/XtUkHFYE. I really have no idea where the problem
> could be.
> From the original Terrier3.5 I've only modified the inputformat which
> is used to read the collection of document: I use a custom
> sequencefileinputformat in order to process a custom sequence file
> made up of all the tiny documents of the trec collection (a standard
> document collection used in IR).
> I guess the problem is not here since even using unmodified version of
> terrier I get the same error. In that case, however, there is no
> failure maybe because the authors of terrier use MultiFileCollection.
>
> I'd love to hear from somebody since when running the indexing job on
> the whole dataset the jobs fails because this error happens more than
> once. In pseudo mode, after a failure the job is completed ... on the
> cloud it isn't.
>
> Thanks for your time
>
> Marco Didonna
>
> PS: I use both locally and on the cloud latest version of cloudera
> distribution for hadoop
>



-- 
*Eric Fiala*
*Fiala Consulting*
T: 403.828.1117
E: eric@fiala.ca
http://www.fiala.ca