You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ca...@globo.com on 2007/06/14 20:25:06 UTC

Indexing problems in nutch-nightly

I was experimenting the last four releases of the nightly version, using
the intranet method, with about 400 hundred seed sites a depth of 4, topN
600 e one computer.  Every time I got the following error message: 

Indexing [http://200.0.198.11/Biblioteca/p-periodicas/index.htm] with analyzer
org.apache.nutch.analysis.NutchDocumentAnalyzer@10d4f27 (null)
Indexer: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:604)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:279)
        at org.apache.nutch.indexer.Indexer.run(Indexer.java:301)
        at org.apache.hadoop.util.ToolBase.doMain(ToolBase.java:189)
        at org.apache.nutch.indexer.Indexer.main(Indexer.java:284)

[root@localhost nutch-2007-06-14_07-21-27]#

As a I got some messages of "Job failed! before (but never at this point
of indexing) and restarting the computer and indexing again, solved the problem,
I did this, but with no results.

On the other hand, the same task, with the 0.9 release was always successful,
in every try, with the same crawling specifications.  So, I am just pointing
this, because, I think, there is the possibility to exist some problem in
the indexing phase of the newer nightly versions.

Tanks