You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Øyvind Stegard <oy...@usit.uio.no> on 2006/08/11 15:15:59 UTC

Re: WIll storing docs affect lucene's search performance ?

On Friday 11 August 2006 15:07, Prasenjit Mukherjee wrote:
> I have a requirement ( use highlighter) to  store the doc content
> somewhere., and I am not allowed to use a RDBMS. I am thinking of using
> Lucene's Field with (Field.Store.YES and Field.Index.NO) to store the
> doc content. Will it have any negative affect on my search performance ?
>
> I think I have read somewhere that  Lucene shouldn't be used(or
> misused)  to provide RDBMS like storage.
We are using a stored binary version of every field we index in our content 
repository implementation (mostly just primitive data types, though). I asked 
a similar question earlier on this list. I'll just quote the reply I got 
here:
> On 3/9/06, Øyvind Stegard <oy...@usit.uio.no> wrote:
> > - How does many stored fields eventually affect indexing/query
> > performance compared to if no fields were stored (only indexed) ?
>
> Additional stored fields should have no effect on querying (the
> internal information about a field is looked up in a hashmap).
>
> Additional stored fields that are used has an impact on indexing since
> that data must be copied every time segments are merged.
>
> Additional stored fields that are not used in most documents (sparse)
> should have very little performance impact on indexing.  The field
> list is walked a few times linearly (in-memory) during a segment
> merge, which should be very fast, but it's still O(n), so don't go
> crazy and have a million stored field types.
>
> > - Are there any known scalability issues with a large amount of distinct
> > fields in an index (not necessarily the same set of fields for every doc)
> > ?
>
> If they are indexed fields, yes.
> Each indexed field has a 1 byte norm *per document*, regardless of if
> the document contains that field.  In the current version of lucene,
> there is a way to omit these norms on a per field basis (see
> Field.setOmitNorms()) if you don't need length normalization or
> index-time field boosting.
>
> -Yonik
> http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

Øyvind
-- 
< Øyvind Stegard < oyvind stegard at usit uio no >
 < SAUS/USIT, UiO

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org