You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by zhao <25...@qq.com> on 2011/08/29 04:24:52 UTC

a question about job failed

Dear all，
after use nutch 0.9 ，but have a question，Detailed description of the problem
is 
       Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
        at
org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
          at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
Thank you for your help
 zhao

--
View this message in context: http://lucene.472066.n3.nabble.com/a-question-about-job-failed-tp3291669p3291669.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: a question about job failed

Posted by Markus Jelsma <ma...@openindex.io>.

Yes, i have this less than descriptive exception too:

https://issues.apache.org/jira/browse/NUTCH-1100


On Tuesday 30 August 2011 02:28:18 Markus Jelsma wrote:
> Thanks for the reminder as i believe this is an actual issue! I've got some
> indices that cannot be deduplicated from Nutch and die without giving a
> proper clue.
> 
> 
> I'll reproduce and report back on it. I know it's not a problem of not
> having the correct fields marked as STORED since that once index has all
> fields used by dedup marked as STORED.
> 
> Strange..
> 
> > Hi Zhao,
> > 
> > Do you have anymore verbose log info from hadoop.log, I have never worked
> > with Nutch 0.9 but if you could at least indicate whether you get
> > something like
> > 
> > LOG: info Dedup: starting ... blah blah blah
> > 
> > Taking this to a larger context I am not particularly happy with the
> > verboseness of logging when there are errors with indexing commands. When
> > we experience an error during any of the index related commands we get
> > back Job failed. It would be nice to get back a reason for the job
> > failing which was more clear than a stack trace.
> > 
> > Finally, this is from a personal point of view, I would highly recommend
> > that you upgrade to a newer (1.3) version of Nutch if you are using this
> > in production. There are significant improvements in functionality.
> > 
> > Lewis
> > 
> > On Mon, Aug 29, 2011 at 3:24 AM, zhao <25...@qq.com> wrote:
> > > Dear all，
> > > after use nutch 0.9 ，but have a question，Detailed description of the
> > > problem
> > > is
> > > 
> > >       Exception in thread "main" java.io.IOException: Job failed!
> > >       
> > >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> > >        at
> > > 
> > > org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:4
> > > 39 )
> > > 
> > >          at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> > > 
> > > Thank you for your help
> > > 
> > >  zhao
> > > 
> > > --
> > > View this message in context:
> > > http://lucene.472066.n3.nabble.com/a-question-about-job-failed-tp329166
> > > 9p 3291669.html Sent from the Nutch - User mailing list archive at
> > > Nabble.com.

-- 
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350

Re: a question about job failed

Posted by Markus Jelsma <ma...@openindex.io>.

Thanks for the reminder as i believe this is an actual issue! I've got some 
indices that cannot be deduplicated from Nutch and die without giving a proper 
clue.


I'll reproduce and report back on it. I know it's not a problem of not having 
the correct fields marked as STORED since that once index has all fields used 
by dedup marked as STORED. 

Strange..

> Hi Zhao,
> 
> Do you have anymore verbose log info from hadoop.log, I have never worked
> with Nutch 0.9 but if you could at least indicate whether you get something
> like
> 
> LOG: info Dedup: starting ... blah blah blah
> 
> Taking this to a larger context I am not particularly happy with the
> verboseness of logging when there are errors with indexing commands. When
> we experience an error during any of the index related commands we get
> back Job failed. It would be nice to get back a reason for the job failing
> which was more clear than a stack trace.
> 
> Finally, this is from a personal point of view, I would highly recommend
> that you upgrade to a newer (1.3) version of Nutch if you are using this in
> production. There are significant improvements in functionality.
> 
> Lewis
> 
> On Mon, Aug 29, 2011 at 3:24 AM, zhao <25...@qq.com> wrote:
> > Dear all，
> > after use nutch 0.9 ，but have a question，Detailed description of the
> > problem
> > is
> > 
> >       Exception in thread "main" java.io.IOException: Job failed!
> >       
> >        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
> >        at
> > 
> > org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439
> > )
> > 
> >          at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> > 
> > Thank you for your help
> > 
> >  zhao
> > 
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/a-question-about-job-failed-tp3291669p
> > 3291669.html Sent from the Nutch - User mailing list archive at
> > Nabble.com.

Re: a question about job failed

Posted by lewis john mcgibbney <le...@gmail.com>.

Hi Zhao,

Do you have anymore verbose log info from hadoop.log, I have never worked
with Nutch 0.9 but if you could at least indicate whether you get something
like

LOG: info Dedup: starting ... blah blah blah

Taking this to a larger context I am not particularly happy with the
verboseness of logging when there are errors with indexing commands. When we
experience an error during any of the index related commands we get back Job
failed. It would be nice to get back a reason for the job failing which was
more clear than a stack trace.

Finally, this is from a personal point of view, I would highly recommend
that you upgrade to a newer (1.3) version of Nutch if you are using this in
production. There are significant improvements in functionality.

Lewis

On Mon, Aug 29, 2011 at 3:24 AM, zhao <25...@qq.com> wrote:

> Dear all，
> after use nutch 0.9 ，but have a question，Detailed description of the
> problem
> is
>       Exception in thread "main" java.io.IOException: Job failed!
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
>        at
> org.apache.nutch.indexer.DeleteDuplicates.dedup(DeleteDuplicates.java:439)
>          at org.apache.nutch.crawl.Crawl.main(Crawl.java:135)
> Thank you for your help
>  zhao
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/a-question-about-job-failed-tp3291669p3291669.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

-- 
*Lewis*