You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Yatir Ben Shlomo <ya...@outbrain.com> on 2010/07/22 16:54:13 UTC

Question to the writer of MultiPassIndexSplitter

Hi,
I heard work is being done on re-writing MultiPassIndexSplitter so it will be a single pass and work quicker.
I was wondering if this is already done or when is it due ?
Thanks


RE: Question to the writer of MultiPassIndexSplitter

Posted by "Burton-West, Tom" <tb...@umich.edu>.
The work on MultiPassIndexSplitter is being done by Andrzej Bialecki, the creator of Luke.
See http://lucene-eurocon.org/sessions-track1-day1.html#3

http://lucene-eurocon.org/slides/Munching-&-crunching-Lucene-index-post-processing-and-applications_Andrzej-Bialecki.pdf

The slides say  "SinglePassSplitter work started, to be contributed soon."

You might try asking him directly or posting to the java-dev list.

Tom
www.hathitrust.org/blogs

-----Original Message-----
From: Christopher Condit [mailto:condit@sdsc.edu] 
Sent: Thursday, August 05, 2010 12:08 PM
To: Yatir Ben Shlomo
Cc: java-user@lucene.apache.org
Subject: RE: Question to the writer of MultiPassIndexSplitter 

> > > I heard work is being done on re-writing MultiPassIndexSplitter so it
> > > will be a single pass and work quicker.

> > Because that was so slow I just wrote a utility class to create a list of N
> > IndexWriters and round robin documents to them as the index is created.
> > Then we use a ParallelMultiSearcher for retrieval. I can send you the code if
> > you're interested...

> Yes it will be great if you can send me this code..

Here's some code: http://pastie.org/1077591
We re-index everything offline from scratch. You'll need to modify the code to support reopening and updating documents if that's a requirement...
-Chris




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Question to the writer of MultiPassIndexSplitter

Posted by Christopher Condit <co...@sdsc.edu>.
> > > I heard work is being done on re-writing MultiPassIndexSplitter so it
> > > will be a single pass and work quicker.

> > Because that was so slow I just wrote a utility class to create a list of N
> > IndexWriters and round robin documents to them as the index is created.
> > Then we use a ParallelMultiSearcher for retrieval. I can send you the code if
> > you're interested...

> Yes it will be great if you can send me this code..

Here's some code: http://pastie.org/1077591
We re-index everything offline from scratch. You'll need to modify the code to support reopening and updating documents if that's a requirement...
-Chris




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Question to the writer of MultiPassIndexSplitter

Posted by Christopher Condit <co...@sdsc.edu>.
> I heard work is being done on re-writing MultiPassIndexSplitter so it will be a
> single pass and work quicker.

Because that was so slow I just wrote a utility class to create a list of N IndexWriters and round robin documents to them as the index is created. Then we use a ParallelMultiSearcher for retrieval. I can send you the code if you're interested...

-Chris

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org