You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Manoj Bist <ma...@gmail.com> on 2008/01/13 04:39:13 UTC
Exception in DeleteDuplicates.java
Hi,
I am getting the following exception when I do a crawl using nutch. I am
kind of stuck due to this. I would really appreciate any pointers in
resolving this. I got a related mail thread here
<http://www.mail-archive.com/nutch-user@lucene.apache.org/msg07745.htm>but
it doesn't describe a solution to the problem.
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
at org.apache.nutch.indexer.DeleteDuplicates.dedup(
DeleteDuplicates.java:439)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
I looked at hadoop.log and it has the following stack trace.
mapred.TaskTracker - Error running child
java.lang.ArrayIndexOutOfBoundsException: -1
at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
:113)
at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
DeleteDuplicates.java:176)
at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
:1445)
Thanks,
Manoj.
--
Tired of reading blogs? Listen to your favorite blogs at
http://www.blogbard.com !!!!
Re: Exception in DeleteDuplicates.java
Posted by Manoj Bist <ma...@gmail.com>.
Hi Ismael, Thanks a lot for the response. I did not build nutch from
sources. I simply copied the nutch-0.9 release.
Would you recommend building from nightly nutch or the nutch-0.09?
Thanks,
Manoj.
On Jan 13, 2008 4:43 AM, Ismael <kr...@gmail.com> wrote:
> Hello. I apparently had a similar problem when trying to Dedup, I
> solved it updating nutch with the following patch
>
> http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06705.html
>
> I hope this will help you, good luck!
>
> 2008/1/13, Manoj Bist <ma...@gmail.com>:
> > Hi,
> >
> > I am getting the following exception when I do a crawl using nutch. I am
> > kind of stuck due to this. I would really appreciate any pointers in
> > resolving this. I got a related mail thread here
> > <http://www.mail-archive.com/nutch-user@lucene.apache.org/msg07745.htm
> >but
> > it doesn't describe a solution to the problem.
> >
> > Exception in thread "main" java.io.IOException: Job failed!
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>
> > at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> > DeleteDuplicates.java:439)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> >
> > I looked at hadoop.log and it has the following stack trace.
> >
> > mapred.TaskTracker - Error running child
> > java.lang.ArrayIndexOutOfBoundsException: -1
> > at org.apache.lucene.index.MultiReader.isDeleted(
> MultiReader.java
> > :113)
> > at
> >
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> > DeleteDuplicates.java:176)
> > at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> > at org.apache.hadoop.mapred.MapRunner.run (MapRunner.java:46)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> > at org.apache.hadoop.mapred.TaskTracker$Child.main(
> TaskTracker.java
> > :1445)
> >
> >
> > Thanks,
> >
> > Manoj.
> >
> > --
> > Tired of reading blogs? Listen to your favorite blogs at
> > http://www.blogbard.com !!!!
> >
>
--
Tired of reading blogs? Listen to your favorite blogs at
http://www.blogbard.com !!!!
Re: Exception in DeleteDuplicates.java
Posted by Manoj Bist <ma...@gmail.com>.
Thanks a lot Ismael. I applied this patch to release-0.9 and recompiled and
it worked. I can finally try out nutch successfully.
Thanks,
- Manoj.
On Jan 13, 2008 4:43 AM, Ismael <kr...@gmail.com> wrote:
> Hello. I apparently had a similar problem when trying to Dedup, I
> solved it updating nutch with the following patch
>
> http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06705.html
>
> I hope this will help you, good luck!
>
> 2008/1/13, Manoj Bist <ma...@gmail.com>:
> > Hi,
> >
> > I am getting the following exception when I do a crawl using nutch. I am
> > kind of stuck due to this. I would really appreciate any pointers in
> > resolving this. I got a related mail thread here
> > <http://www.mail-archive.com/nutch-user@lucene.apache.org/msg07745.htm
> >but
> > it doesn't describe a solution to the problem.
> >
> > Exception in thread "main" java.io.IOException: Job failed!
> > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> > at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> > DeleteDuplicates.java:439)
> > at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> >
> > I looked at hadoop.log and it has the following stack trace.
> >
> > mapred.TaskTracker - Error running child
> > java.lang.ArrayIndexOutOfBoundsException: -1
> > at org.apache.lucene.index.MultiReader.isDeleted(
> MultiReader.java
> > :113)
> > at
> >
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> > DeleteDuplicates.java:176)
> > at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> > at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> > at org.apache.hadoop.mapred.TaskTracker$Child.main(
> TaskTracker.java
> > :1445)
> >
> >
> > Thanks,
> >
> > Manoj.
> >
> > --
> > Tired of reading blogs? Listen to your favorite blogs at
> > http://www.blogbard.com !!!!
> >
>
--
Tired of reading blogs? Listen to your favorite blogs at
http://www.blogbard.com !!!!
Re: Exception in DeleteDuplicates.java
Posted by Ismael <kr...@gmail.com>.
Hello. I apparently had a similar problem when trying to Dedup, I
solved it updating nutch with the following patch
http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06705.html
I hope this will help you, good luck!
2008/1/13, Manoj Bist <ma...@gmail.com>:
> Hi,
>
> I am getting the following exception when I do a crawl using nutch. I am
> kind of stuck due to this. I would really appreciate any pointers in
> resolving this. I got a related mail thread here
> <http://www.mail-archive.com/nutch-user@lucene.apache.org/msg07745.htm>but
> it doesn't describe a solution to the problem.
>
> Exception in thread "main" java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> DeleteDuplicates.java:439)
> at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>
> I looked at hadoop.log and it has the following stack trace.
>
> mapred.TaskTracker - Error running child
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
> :113)
> at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> DeleteDuplicates.java:176)
> at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1445)
>
>
> Thanks,
>
> Manoj.
>
> --
> Tired of reading blogs? Listen to your favorite blogs at
> http://www.blogbard.com !!!!
>