You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "McGibbney, Lewis John" <Le...@gcu.ac.uk> on 2011/01/10 18:48:37 UTC
FileAlreadyExistsException
Hello List,
Only material I could find on this was a post by myself (some time ago) which addressed a slightly different problem case.
During the indexing stage of a recrawl, my Hadoop log reads as follows
Indexer: starting at 2011-01-10 16:40:42
Indexer: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory f
ile:/C:/Downloads/Apache/nutch-1.2/crawl/indexes already exists
at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutput
Format.java:111)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
72)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
at org.apache.nutch.indexer.Indexer.index(Indexer.java:76)
at org.apache.nutch.indexer.Indexer.run(Indexer.java:97)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.indexer.Indexer.main(Indexer.java:106)
My quick question is, is it necessary to delete/remove existing indexes before I can index freshly fetched web data?
Thank you
Lewis
Glasgow Caledonian University is a registered Scottish charity, number SC021474
Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009
http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html
Re: FileAlreadyExistsException
Posted by Charan K <ch...@gmail.com>.
Yes It is required to remove or use a different directory to create index
Sent from my iPhone
On Jan 10, 2011, at 9:48 AM, "McGibbney, Lewis John" <Le...@gcu.ac.uk> wrote:
> Hello List,
>
> Only material I could find on this was a post by myself (some time ago) which addressed a slightly different problem case.
>
> During the indexing stage of a recrawl, my Hadoop log reads as follows
>
> Indexer: starting at 2011-01-10 16:40:42
> Indexer: org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory f
> ile:/C:/Downloads/Apache/nutch-1.2/crawl/indexes already exists
> at org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutput
> Format.java:111)
> at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
> 72)
> at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> at org.apache.nutch.indexer.Indexer.index(Indexer.java:76)
> at org.apache.nutch.indexer.Indexer.run(Indexer.java:97)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.nutch.indexer.Indexer.main(Indexer.java:106)
>
> My quick question is, is it necessary to delete/remove existing indexes before I can index freshly fetched web data?
>
> Thank you
>
> Lewis
>
>
> Glasgow Caledonian University is a registered Scottish charity, number SC021474
>
> Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009
> http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html