You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Manoj Bist <ma...@gmail.com> on 2008/01/13 04:39:13 UTC

Exception in DeleteDuplicates.java

Hi,

I am getting the following exception when I do a crawl using nutch. I am
kind of stuck due to this.  I would really appreciate any pointers in
resolving this. I got a related mail thread here
<http://www.mail-archive.com/nutch-user@lucene.apache.org/msg07745.htm>but
it doesn't describe a solution to the problem.

Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
        at org.apache.nutch.indexer.DeleteDuplicates.dedup(
DeleteDuplicates.java:439)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)

I looked at hadoop.log and it has the following stack trace.

 mapred.TaskTracker - Error running child
java.lang.ArrayIndexOutOfBoundsException: -1
        at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
:113)
        at
org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
DeleteDuplicates.java:176)
        at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
        at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
:1445)


Thanks,

Manoj.

-- 
Tired of reading blogs? Listen to  your favorite blogs at
http://www.blogbard.com   !!!!

Re: Exception in DeleteDuplicates.java

Posted by Manoj Bist <ma...@gmail.com>.
Hi Ismael, Thanks a lot for the response. I did not build nutch from
sources. I simply copied the nutch-0.9 release.
Would you recommend building from nightly nutch or the nutch-0.09?

Thanks,

Manoj.

On Jan 13, 2008 4:43 AM, Ismael <kr...@gmail.com> wrote:

> Hello. I apparently had a similar problem when trying to Dedup, I
> solved it updating nutch with the following patch
>
> http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06705.html
>
> I hope this will help you, good luck!
>
> 2008/1/13, Manoj Bist <ma...@gmail.com>:
> > Hi,
> >
> > I am getting the following exception when I do a crawl using nutch. I am
> > kind of stuck due to this.  I would really appreciate any pointers in
> > resolving this. I got a related mail thread here
> > <http://www.mail-archive.com/nutch-user@lucene.apache.org/msg07745.htm
> >but
> > it doesn't describe a solution to the problem.
> >
> > Exception in thread "main" java.io.IOException: Job failed!
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>
> >         at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> > DeleteDuplicates.java:439)
> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> >
> > I looked at hadoop.log and it has the following stack trace.
> >
> >  mapred.TaskTracker - Error running child
> > java.lang.ArrayIndexOutOfBoundsException: -1
> >         at org.apache.lucene.index.MultiReader.isDeleted(
> MultiReader.java
> > :113)
> >         at
> >
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> > DeleteDuplicates.java:176)
> >         at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> >         at org.apache.hadoop.mapred.MapRunner.run (MapRunner.java:46)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> >         at org.apache.hadoop.mapred.TaskTracker$Child.main(
> TaskTracker.java
> > :1445)
> >
> >
> > Thanks,
> >
> > Manoj.
> >
> > --
> > Tired of reading blogs? Listen to  your favorite blogs at
> > http://www.blogbard.com   !!!!
> >
>



-- 
Tired of reading blogs? Listen to  your favorite blogs at
http://www.blogbard.com   !!!!

Re: Exception in DeleteDuplicates.java

Posted by Manoj Bist <ma...@gmail.com>.
Thanks a lot Ismael. I applied this patch to release-0.9 and recompiled and
it worked.  I can finally try out nutch successfully.

Thanks,

- Manoj.

On Jan 13, 2008 4:43 AM, Ismael <kr...@gmail.com> wrote:

> Hello. I apparently had a similar problem when trying to Dedup, I
> solved it updating nutch with the following patch
>
> http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06705.html
>
> I hope this will help you, good luck!
>
> 2008/1/13, Manoj Bist <ma...@gmail.com>:
> > Hi,
> >
> > I am getting the following exception when I do a crawl using nutch. I am
> > kind of stuck due to this.  I would really appreciate any pointers in
> > resolving this. I got a related mail thread here
> > <http://www.mail-archive.com/nutch-user@lucene.apache.org/msg07745.htm
> >but
> > it doesn't describe a solution to the problem.
> >
> > Exception in thread "main" java.io.IOException: Job failed!
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> >         at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> > DeleteDuplicates.java:439)
> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> >
> > I looked at hadoop.log and it has the following stack trace.
> >
> >  mapred.TaskTracker - Error running child
> > java.lang.ArrayIndexOutOfBoundsException: -1
> >         at org.apache.lucene.index.MultiReader.isDeleted(
> MultiReader.java
> > :113)
> >         at
> >
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> > DeleteDuplicates.java:176)
> >         at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
> >         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
> >         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
> >         at org.apache.hadoop.mapred.TaskTracker$Child.main(
> TaskTracker.java
> > :1445)
> >
> >
> > Thanks,
> >
> > Manoj.
> >
> > --
> > Tired of reading blogs? Listen to  your favorite blogs at
> > http://www.blogbard.com   !!!!
> >
>



-- 
Tired of reading blogs? Listen to  your favorite blogs at
http://www.blogbard.com   !!!!

Re: Exception in DeleteDuplicates.java

Posted by Ismael <kr...@gmail.com>.
Hello. I apparently had a similar problem when trying to Dedup, I
solved it updating nutch with the following patch

http://www.mail-archive.com/nutch-dev@lucene.apache.org/msg06705.html

I hope this will help you, good luck!

2008/1/13, Manoj Bist <ma...@gmail.com>:
> Hi,
>
> I am getting the following exception when I do a crawl using nutch. I am
> kind of stuck due to this.  I would really appreciate any pointers in
> resolving this. I got a related mail thread here
> <http://www.mail-archive.com/nutch-user@lucene.apache.org/msg07745.htm>but
> it doesn't describe a solution to the problem.
>
> Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>         at org.apache.nutch.indexer.DeleteDuplicates.dedup(
> DeleteDuplicates.java:439)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
>
> I looked at hadoop.log and it has the following stack trace.
>
>  mapred.TaskTracker - Error running child
> java.lang.ArrayIndexOutOfBoundsException: -1
>         at org.apache.lucene.index.MultiReader.isDeleted(MultiReader.java
> :113)
>         at
> org.apache.nutch.indexer.DeleteDuplicates$InputFormat$DDRecordReader.next(
> DeleteDuplicates.java:176)
>         at org.apache.hadoop.mapred.MapTask$1.next(MapTask.java:157)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:46)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:175)
>         at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java
> :1445)
>
>
> Thanks,
>
> Manoj.
>
> --
> Tired of reading blogs? Listen to  your favorite blogs at
> http://www.blogbard.com   !!!!
>