You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by David Saile <da...@uni-koblenz.de> on 2011/02/21 09:39:17 UTC

ParallelReader

Hello everybody,

I was wondering, if someone could point me to what I need to be aware of, using a ParallelReader.

My intention is to modify Nutch (http://nutch.apache.org/) in a way, that in the Lucene-index Nutch uses, only documents for changed websites are updated. 

However, due to the existing scoring-algorithms, most page's page-score will change. After doing some research about updating single fields in a Lucene-index, I found http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing
This brought me to the idea, to create a separate index for the page-scores. 

Are there maybe any other approaches around, that I overlooked? What do I need to be aware of, when using two parallel indices?


Thanks for any help!

David

 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

RE: ParallelReader

Posted by Uwe Schindler <uw...@thetaphi.de>.

Hi David,

With current Lucene versions, the usage of ParallelReader is very
complicated to keep in sync. The problem is how merges occur. For
ParallelReader to work, all internal document ids (the integers) must be
parallel. As the new MergePolicies now work on size of documents and also
may work concurrent, it's almost impossible to have all merges also done in
parallel s internal doc ids keep the same, so ParallelReader, as it is, is
currently only working with carefully optimized indexes. Also it is not
really useable for your usecase at the moment.

There are approaches in Lucene trunk to support updateable fields (so called
parallel indexing), but this is not yet working. Please search in JIRA for
corresponding issues.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: David Saile [mailto:david@uni-koblenz.de]
> Sent: Monday, February 21, 2011 9:39 AM
> To: java-user@lucene.apache.org
> Subject: ParallelReader
> 
> Hello everybody,
> 
> I was wondering, if someone could point me to what I need to be aware of,
> using a ParallelReader.
> 
> My intention is to modify Nutch (http://nutch.apache.org/) in a way, that
in
> the Lucene-index Nutch uses, only documents for changed websites are
> updated.
> 
> However, due to the existing scoring-algorithms, most page's page-score
will
> change. After doing some research about updating single fields in a
Lucene-
> index, I found http://wiki.apache.org/lucene-
> java/ParallelIncrementalIndexing
> This brought me to the idea, to create a separate index for the
page-scores.
> 
> Are there maybe any other approaches around, that I overlooked? What do I
> need to be aware of, when using two parallel indices?
> 
> 
> Thanks for any help!
> 
> David
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org