You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Daniel Creão <ld...@gmail.com> on 2007/05/28 21:10:39 UTC

Spliting index

I'd wanna split my lucene index in smaller segments, each one holding all
terms starting with the same char.

I started writing Term's and TermInfo's but i'm worried about others files
and especially the pointers.

What care should I have while splitting index?

- Daniel

Re: Spliting index

Posted by Doug Cutting <cu...@apache.org>.
You can implement a FilterIndexReader that returns only a subset of an 
index.  Then use IndexWriter#addIndexes() to add this to a new, empty 
index.  Do this for each range of terms.

This is somewhat similar to Nutch's IndexSorter:

http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSorter.java?view=markup

Note that IndexWriter#addIndexes() doesn't require that all IndexReader 
methods be implemented.

Doug

Daniel Creão wrote:
> I'd wanna split my lucene index in smaller segments, each one holding all
> terms starting with the same char.
> 
> I started writing Term's and TermInfo's but i'm worried about others files
> and especially the pointers.
> 
> What care should I have while splitting index?
> 
> - Daniel
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Spliting index

Posted by Chris Hostetter <ho...@fucit.org>.
: I'd wanna split my lucene index in smaller segments, each one holding all
: terms starting with the same char.

i can't even begin to give you any advice on how to implement this, but
perhaps i can save you some work by asking "why?"


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org