You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Samuru Jackson <sa...@googlemail.com> on 2006/03/09 14:55:50 UTC

Store object into Lucene index

Is there a way to save an object to the a lucene index?

In my project I noticed that the performance bottleneck is my
database. Lucene gives back a result in no time but to retrieve the
corresponding data sets at the backend in the database can take long
especially if you need to execeute many queries.

What I thought about is to store as much data as possible to the
indexed document and the best thing would be to save the model
(including corresponding relations 1:n and stuff like that).

Is there a way to do that?

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Store object into Lucene index

Posted by Øyvind Stegard <oy...@usit.uio.no>.
On Thursday 09 March 2006 15:54, Yonik Seeley wrote:
> On 3/9/06, Øyvind Stegard <oy...@usit.uio.no> wrote:
> > - How does many stored fields eventually affect indexing/query
> > performance compared to if no fields were stored (only indexed) ?
>
> Additional stored fields should have no effect on querying (the
> internal information about a field is looked up in a hashmap).
>
> Additional stored fields that are used has an impact on indexing since
> that data must be copied every time segments are merged.
>
> Additional stored fields that are not used in most documents (sparse)
> should have very little performance impact on indexing.  The field
> list is walked a few times linearly (in-memory) during a segment
> merge, which should be very fast, but it's still O(n), so don't go
> crazy and have a million stored field types.
>
> > - Are there any known scalability issues with a large amount of distinct
> > fields in an index (not necessarily the same set of fields for every doc)
> > ?
>
> If they are indexed fields, yes.
> Each indexed field has a 1 byte norm *per document*, regardless of if
> the document contains that field.  In the current version of lucene,
> there is a way to omit these norms on a per field basis (see
> Field.setOmitNorms()) if you don't need length normalization or
> index-time field boosting.
Thanks for the quick and informative reply ! I will investigate further into  
the possibility of omitting such norm data from fields (we typically do very 
exact searches, and don't use much score data, yet).

Øyvind
-- 
< Øyvind Stegard < oyviste at usit uio no >>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Store object into Lucene index

Posted by Yonik Seeley <ys...@gmail.com>.
On 3/9/06, Øyvind Stegard <oy...@usit.uio.no> wrote:
> - How does many stored fields eventually affect indexing/query performance
> compared to if no fields were stored (only indexed) ?

Additional stored fields should have no effect on querying (the
internal information about a field is looked up in a hashmap).

Additional stored fields that are used has an impact on indexing since
that data must be copied every time segments are merged.

Additional stored fields that are not used in most documents (sparse)
should have very little performance impact on indexing.  The field
list is walked a few times linearly (in-memory) during a segment
merge, which should be very fast, but it's still O(n), so don't go
crazy and have a million stored field types.

> - Are there any known scalability issues with a large amount of distinct
> fields in an index (not necessarily the same set of fields for every doc) ?

If they are indexed fields, yes.
Each indexed field has a 1 byte norm *per document*, regardless of if
the document contains that field.  In the current version of lucene,
there is a way to omit these norms on a per field basis (see
Field.setOmitNorms()) if you don't need length normalization or
index-time field boosting.

-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Store object into Lucene index

Posted by Øyvind Stegard <oy...@usit.uio.no>.
On Thursday 09 March 2006 14:55, Samuru Jackson wrote:
> Is there a way to save an object to the a lucene index?
>
> In my project I noticed that the performance bottleneck is my
> database. Lucene gives back a result in no time but to retrieve the
> corresponding data sets at the backend in the database can take long
> especially if you need to execeute many queries.
>
> What I thought about is to store as much data as possible to the
> indexed document and the best thing would be to save the model
> (including corresponding relations 1:n and stuff like that).
>
> Is there a way to do that?
We are using Lucene in a similar way, almost all fields are stored, and 
documents mostly contain properties from objects in our data model. We don't, 
however, store entire objects in a serialized form, instead we serialize 
property data types (fields) in our own way (in a way which makes them 
suitable for sorting and indexing, for instance). We are using Lucene as the 
search backend of our content repository implementation.

If you want to store an arbitrary Java object in an index document, you could 
always serialize it and store it as a binary field, see 
http://lucene.apache.org/java/docs/api/org/apache/lucene/document/Field.html
and
http://java.sun.com/j2se/1.4.2/docs/api/java/io/ByteArrayOutputStream.html
http://java.sun.com/j2se/1.4.2/docs/api/java/io/ObjectOutputStream.html

This actually reminds me of a few questions I've been meaning to ask this 
list:
- How does many stored fields eventually affect indexing/query performance  
compared to if no fields were stored (only indexed) ?

- Are there any known scalability issues with a large amount of distinct 
fields in an index (not necessarily the same set of fields for every doc) ? 

Please ask me to rephrase/elaborate, if any of my questions are unclear or 
lacks info.

Øyvind
-- 
< Øyvind Stegard < oyviste at usit uio no >>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Store object into Lucene index

Posted by Yonik Seeley <ys...@gmail.com>.
On 3/9/06, Samuru Jackson <sa...@googlemail.com> wrote:
> Is there a way to save an object to the a lucene index?

Any field may be stored as well as indexed, or stored and not indexed.
If a field is stored only (not indexed), you can opt to store it as
binary or compressed binary.

See the JavaDoc for the lucene Field class.

--
-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org