You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Ilya Vishnevsky <Il...@e-legion.com> on 2007/04/26 14:03:32 UTC

Adding documents to already created distributed index

Hi all!
As I understand Nutch creates distributed index in Hadoop called
"Indexes" while indexing fetched segments. Then it merges these Indexes
into one Index in local file system.
We use parts of Nutch in our project. We want to use only distributed
index ("Indexes"). The problem is that we want to refresh index every
time we fetch a number of documents, but I do not know how to add newly
fetched documents to it. I wrote my own class instead of Indexer. All
the difference is that in this instanciating: 
IndexWriter(fs.startLocalOutput(perm, temp).toString(),
                          new NutchDocumentAnalyzer(job), true);
I changed parameter "create" from true to false.
Still nutch throws FileAlreadyExistsException caused by 

org.apache.hadoop.mapred.OutputFormatBase.checkOutputSpecs(OutputFormatB
ase.java:96)

Is it possible to add new documents to "Indexes" without full rewriting
of these "Indexes"?