You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Joe Shaw <jo...@novell.com> on 2006/10/05 22:17:49 UTC

Mutliple indexes or many small documents?

Hi,

I'm in the process of moving away from Lucene-as-the-data-store to using
Lucene solely for text indexing and storing a lot of (frequently
changing) metadata in a database.

At present, we have two indexes which we search.  The primary index
contains the static data -- data that changes only when the content of
the underlying file changes; this is the expensive stuff to index.  The
secondary index contains the mutable data, which are often small
properties that are easily changed by the user.  Right now we search the
two indexes and combine the results (we don't use ParallelReader because
this code predates its creation).

Even once we move to the database, we'll still have mutable data which
we may want to index as text.  One possibility is to keep the basic
design the same: have two indexes, one of which contains data more
likely to be changed and just recreate that document.  The other
approach, which I am leaning more towards, is creating a separate
document essentially for each field.  That way, when a single field
changes, we only have to reindex one very small portion.

If I go with the latter approach, it will at least triple the number of
documents in the index, although the content of those documents will be
substantially less.  I can also do this in one index and not search
indexes ParallelReader-style.  What are people's gut feelings on how
this approach will impact the indexing and search performance in terms
of both speed and memory used?

Thanks,
Joe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org