You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by George Smith <ge...@gmail.com> on 2012/05/25 22:00:19 UTC

nutchgora NullPointerException during parse at NutchJob.waitForCompletion / avro.util.Utf8.

I've been using the nutchgora branch for a few months so I'm very new to it
and I've been able to find information on the user or dev list, jira, or
regular web searches for most of the issues I've encountered except one.



This error occurs frequently when parsing but not always and there doesn't
seem to be a common element among the pages it is erroring out on.



I have tried a number of revisions of the nutchgora branch that built with
ant/ivy fine and also with eclipse. I also tried the prebuilt copies from
Jenkins. I've run them on Debian 6.0.5, ubuntu 10, and ubuntu 12 with the
openjdk and the sun jdk and always receive the same error.



Can someone shed some light on this or point me in the right direction.
Thanks.





The error output to stdout is:

Parsing http://www.site.com/dir/page.html

Exception in thread "main" java.lang.RuntimeException: job failed:
name=parse, jobid=job_local_0001

at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:47)

at org.apache.nutch.parse.ParserJob.run(ParserJob.java:242)

at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257)

at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304)



The error output in the hadoop.log is:

2012-05-25 13:50:29,635 INFO  parse.ParserJob - Parsing
http://www.site.com/dir/page.html

2012-05-25 13:50:29,638 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup

2012-05-25 13:50:29,639 WARN  mapred.LocalJobRunner - job_local_0001

java.lang.NullPointerException

at org.apache.avro.util.Utf8.<init>(Utf8.java:37)

at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:212)

at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:123)

at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

Re: nutchgora NullPointerException during parse at NutchJob.waitForCompletion / avro.util.Utf8.

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Thanks Ferdy

On Thu, May 31, 2012 at 9:58 PM, George Smith <ge...@gmail.com> wrote:
> Hi Ferdy.
>
> That patch is fantastic. I applied the change yesterday morning and
> everything is parsing smoothly. Thanks for your help.
>
>
>
> On Wed, May 30, 2012 at 5:30 AM, Ferdy Galema <fe...@kalooga.com>wrote:
>
>> I already stumbled upon this some time ago. I just created the following
>> issue with a work-around for it. (Already committed it so you can update
>> your workspace as an alternative to applying the patch).
>>
>> https://issues.apache.org/jira/browse/NUTCH-1379
>>
>>
>> On Tue, May 29, 2012 at 3:57 PM, Lewis John Mcgibbney <
>> lewis.mcgibbney@gmail.com> wrote:
>>
>> > Hi George,
>> >
>> > How were you executing parsing on this page?
>> >
>> > the toArgMap method in Tool Util can throw a runtime exception,
>> > however this doesn't look like the one your getting.
>> >
>> > What other kind of logging do you have around here? Specifically
>> > related to when the parse method kicks in? This might give us a bit
>> > more idea where exactly this is happening.
>> >
>> > Thanks
>> >
>> > Lewis
>> >
>> > On Fri, May 25, 2012 at 9:00 PM, George Smith <ge...@gmail.com>
>> > wrote:
>> > > I've been using the nutchgora branch for a few months so I'm very new
>> to
>> > it
>> > > and I've been able to find information on the user or dev list, jira,
>> or
>> > > regular web searches for most of the issues I've encountered except
>> one.
>> > >
>> > >
>> > >
>> > > This error occurs frequently when parsing but not always and there
>> > doesn't
>> > > seem to be a common element among the pages it is erroring out on.
>> > >
>> > >
>> > >
>> > > I have tried a number of revisions of the nutchgora branch that built
>> > with
>> > > ant/ivy fine and also with eclipse. I also tried the prebuilt copies
>> from
>> > > Jenkins. I've run them on Debian 6.0.5, ubuntu 10, and ubuntu 12 with
>> the
>> > > openjdk and the sun jdk and always receive the same error.
>> > >
>> > >
>> > >
>> > > Can someone shed some light on this or point me in the right direction.
>> > > Thanks.
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > The error output to stdout is:
>> > >
>> > > Parsing http://www.site.com/dir/page.html
>> > >
>> > > Exception in thread "main" java.lang.RuntimeException: job failed:
>> > > name=parse, jobid=job_local_0001
>> > >
>> > > at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:47)
>> > >
>> > > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:242)
>> > >
>> > > at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257)
>> > >
>> > > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300)
>> > >
>> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>> > >
>> > > at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304)
>> > >
>> > >
>> > >
>> > > The error output in the hadoop.log is:
>> > >
>> > > 2012-05-25 13:50:29,635 INFO  parse.ParserJob - Parsing
>> > > http://www.site.com/dir/page.html
>> > >
>> > > 2012-05-25 13:50:29,638 WARN  mapred.FileOutputCommitter - Output path
>> is
>> > > null in cleanup
>> > >
>> > > 2012-05-25 13:50:29,639 WARN  mapred.LocalJobRunner - job_local_0001
>> > >
>> > > java.lang.NullPointerException
>> > >
>> > > at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
>> > >
>> > > at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:212)
>> > >
>> > > at
>> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:123)
>> > >
>> > > at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76)
>> > >
>> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > >
>> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> > >
>> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>> > >
>> > > at
>> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>> >
>> >
>> >
>> > --
>> > Lewis
>> >
>>



-- 
Lewis

Re: nutchgora NullPointerException during parse at NutchJob.waitForCompletion / avro.util.Utf8.

Posted by George Smith <ge...@gmail.com>.
Hi Ferdy.

That patch is fantastic. I applied the change yesterday morning and
everything is parsing smoothly. Thanks for your help.



On Wed, May 30, 2012 at 5:30 AM, Ferdy Galema <fe...@kalooga.com>wrote:

> I already stumbled upon this some time ago. I just created the following
> issue with a work-around for it. (Already committed it so you can update
> your workspace as an alternative to applying the patch).
>
> https://issues.apache.org/jira/browse/NUTCH-1379
>
>
> On Tue, May 29, 2012 at 3:57 PM, Lewis John Mcgibbney <
> lewis.mcgibbney@gmail.com> wrote:
>
> > Hi George,
> >
> > How were you executing parsing on this page?
> >
> > the toArgMap method in Tool Util can throw a runtime exception,
> > however this doesn't look like the one your getting.
> >
> > What other kind of logging do you have around here? Specifically
> > related to when the parse method kicks in? This might give us a bit
> > more idea where exactly this is happening.
> >
> > Thanks
> >
> > Lewis
> >
> > On Fri, May 25, 2012 at 9:00 PM, George Smith <ge...@gmail.com>
> > wrote:
> > > I've been using the nutchgora branch for a few months so I'm very new
> to
> > it
> > > and I've been able to find information on the user or dev list, jira,
> or
> > > regular web searches for most of the issues I've encountered except
> one.
> > >
> > >
> > >
> > > This error occurs frequently when parsing but not always and there
> > doesn't
> > > seem to be a common element among the pages it is erroring out on.
> > >
> > >
> > >
> > > I have tried a number of revisions of the nutchgora branch that built
> > with
> > > ant/ivy fine and also with eclipse. I also tried the prebuilt copies
> from
> > > Jenkins. I've run them on Debian 6.0.5, ubuntu 10, and ubuntu 12 with
> the
> > > openjdk and the sun jdk and always receive the same error.
> > >
> > >
> > >
> > > Can someone shed some light on this or point me in the right direction.
> > > Thanks.
> > >
> > >
> > >
> > >
> > >
> > > The error output to stdout is:
> > >
> > > Parsing http://www.site.com/dir/page.html
> > >
> > > Exception in thread "main" java.lang.RuntimeException: job failed:
> > > name=parse, jobid=job_local_0001
> > >
> > > at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:47)
> > >
> > > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:242)
> > >
> > > at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257)
> > >
> > > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300)
> > >
> > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >
> > > at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304)
> > >
> > >
> > >
> > > The error output in the hadoop.log is:
> > >
> > > 2012-05-25 13:50:29,635 INFO  parse.ParserJob - Parsing
> > > http://www.site.com/dir/page.html
> > >
> > > 2012-05-25 13:50:29,638 WARN  mapred.FileOutputCommitter - Output path
> is
> > > null in cleanup
> > >
> > > 2012-05-25 13:50:29,639 WARN  mapred.LocalJobRunner - job_local_0001
> > >
> > > java.lang.NullPointerException
> > >
> > > at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
> > >
> > > at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:212)
> > >
> > > at
> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:123)
> > >
> > > at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76)
> > >
> > > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > >
> > > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > >
> > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > >
> > > at
> > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> >
> >
> >
> > --
> > Lewis
> >
>

Re: nutchgora NullPointerException during parse at NutchJob.waitForCompletion / avro.util.Utf8.

Posted by Ferdy Galema <fe...@kalooga.com>.
I already stumbled upon this some time ago. I just created the following
issue with a work-around for it. (Already committed it so you can update
your workspace as an alternative to applying the patch).

https://issues.apache.org/jira/browse/NUTCH-1379


On Tue, May 29, 2012 at 3:57 PM, Lewis John Mcgibbney <
lewis.mcgibbney@gmail.com> wrote:

> Hi George,
>
> How were you executing parsing on this page?
>
> the toArgMap method in Tool Util can throw a runtime exception,
> however this doesn't look like the one your getting.
>
> What other kind of logging do you have around here? Specifically
> related to when the parse method kicks in? This might give us a bit
> more idea where exactly this is happening.
>
> Thanks
>
> Lewis
>
> On Fri, May 25, 2012 at 9:00 PM, George Smith <ge...@gmail.com>
> wrote:
> > I've been using the nutchgora branch for a few months so I'm very new to
> it
> > and I've been able to find information on the user or dev list, jira, or
> > regular web searches for most of the issues I've encountered except one.
> >
> >
> >
> > This error occurs frequently when parsing but not always and there
> doesn't
> > seem to be a common element among the pages it is erroring out on.
> >
> >
> >
> > I have tried a number of revisions of the nutchgora branch that built
> with
> > ant/ivy fine and also with eclipse. I also tried the prebuilt copies from
> > Jenkins. I've run them on Debian 6.0.5, ubuntu 10, and ubuntu 12 with the
> > openjdk and the sun jdk and always receive the same error.
> >
> >
> >
> > Can someone shed some light on this or point me in the right direction.
> > Thanks.
> >
> >
> >
> >
> >
> > The error output to stdout is:
> >
> > Parsing http://www.site.com/dir/page.html
> >
> > Exception in thread "main" java.lang.RuntimeException: job failed:
> > name=parse, jobid=job_local_0001
> >
> > at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:47)
> >
> > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:242)
> >
> > at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257)
> >
> > at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300)
> >
> > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> >
> > at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304)
> >
> >
> >
> > The error output in the hadoop.log is:
> >
> > 2012-05-25 13:50:29,635 INFO  parse.ParserJob - Parsing
> > http://www.site.com/dir/page.html
> >
> > 2012-05-25 13:50:29,638 WARN  mapred.FileOutputCommitter - Output path is
> > null in cleanup
> >
> > 2012-05-25 13:50:29,639 WARN  mapred.LocalJobRunner - job_local_0001
> >
> > java.lang.NullPointerException
> >
> > at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
> >
> > at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:212)
> >
> > at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:123)
> >
> > at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76)
> >
> > at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >
> > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >
> > at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
>
>
>
> --
> Lewis
>

Re: nutchgora NullPointerException during parse at NutchJob.waitForCompletion / avro.util.Utf8.

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi George,

How were you executing parsing on this page?

the toArgMap method in Tool Util can throw a runtime exception,
however this doesn't look like the one your getting.

What other kind of logging do you have around here? Specifically
related to when the parse method kicks in? This might give us a bit
more idea where exactly this is happening.

Thanks

Lewis

On Fri, May 25, 2012 at 9:00 PM, George Smith <ge...@gmail.com> wrote:
> I've been using the nutchgora branch for a few months so I'm very new to it
> and I've been able to find information on the user or dev list, jira, or
> regular web searches for most of the issues I've encountered except one.
>
>
>
> This error occurs frequently when parsing but not always and there doesn't
> seem to be a common element among the pages it is erroring out on.
>
>
>
> I have tried a number of revisions of the nutchgora branch that built with
> ant/ivy fine and also with eclipse. I also tried the prebuilt copies from
> Jenkins. I've run them on Debian 6.0.5, ubuntu 10, and ubuntu 12 with the
> openjdk and the sun jdk and always receive the same error.
>
>
>
> Can someone shed some light on this or point me in the right direction.
> Thanks.
>
>
>
>
>
> The error output to stdout is:
>
> Parsing http://www.site.com/dir/page.html
>
> Exception in thread "main" java.lang.RuntimeException: job failed:
> name=parse, jobid=job_local_0001
>
> at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:47)
>
> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:242)
>
> at org.apache.nutch.parse.ParserJob.parse(ParserJob.java:257)
>
> at org.apache.nutch.parse.ParserJob.run(ParserJob.java:300)
>
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>
> at org.apache.nutch.parse.ParserJob.main(ParserJob.java:304)
>
>
>
> The error output in the hadoop.log is:
>
> 2012-05-25 13:50:29,635 INFO  parse.ParserJob - Parsing
> http://www.site.com/dir/page.html
>
> 2012-05-25 13:50:29,638 WARN  mapred.FileOutputCommitter - Output path is
> null in cleanup
>
> 2012-05-25 13:50:29,639 WARN  mapred.LocalJobRunner - job_local_0001
>
> java.lang.NullPointerException
>
> at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
>
> at org.apache.nutch.parse.ParseUtil.process(ParseUtil.java:212)
>
> at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:123)
>
> at org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:76)
>
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>
> at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)



-- 
Lewis