You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Magnús Skúlason <ma...@gmail.com> on 2009/10/18 13:39:56 UTC

Nutch indexer failing

Hi,
I am getting the following exception when indexing (right after adding
segments):
Exception in thread "main"
org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory
/home/user/nutch/crawl/indexes already exists
        at
org.apache.hadoop.mapred.OutputFormatBase.checkOutputSpecs(OutputFormatBase.java:96)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:329)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:543)
        at org.apache.nutch.indexer.Indexer.index(Indexer.java:273)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:134)

The strange thing is that it only happens some times (like every second time
or something like that), and before starting the crawler I delete the folder
/home/user/nutch/crawl

Is there anyone that knows what can be happening here and how I can fix it?
I am on a one year old nutch 0.9 and the problem just started recently.

best regards,
Magnus