You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Andre Parodi <an...@gmail.com> on 2010/01/26 19:40:16 UTC

To store or not to store serialized objects in solr

Hi,

We currently are storing all of our data in sql database and use solr 
for indexing. We get a list of id's from solr and retrieve the data from 
the db.

We are considering storing all the data in solr to simplify 
administration and remove any synchronisation and are considering the 
following:

1. storing the data in individual fields in solr (indexed=true, store=true)
2. storing the data in a serialized form in a binary field in solr 
(using google proto buffers or similar) and keep the rest of the solr 
fields as indexed=true, stored=*false*.
3. keep as is. data stored in db and just keep solr fields as 
indexed=true, stored=false

Can anyone provide some advice in terms of performance of the different 
approaches. Are there any obvious pitfalls to option 1 and 2 that i need 
to be mindful of?

I am thinking option 2 would be the fastest as it would be reading the 
data in one contiguous block. Will be doing some preformance test to 
verify this soon.

FYI we are looking at 5-10M records, a serialised object is 500 to 1000 
bytes and we index approx 20 fields.

Thanks for any advice.
andre

Re: To store or not to store serialized objects in solr

Posted by Markus Jelsma <ma...@buyways.nl>.
Hello Andre,


We have used this approach before. We did keep all our data in a RDBMS but
added serialized objects to the index so we could simply query the record
and display it as is, without any hassle and SQL connections.

Although storing this data sounds a bit strange, it actually works well
and keeps things a bit simpler.

The performance of querying the index is the same (or with extremely tiny
differences). However, it does take some additional disk space and
transfer time for it to reach your application. On the other hand,
performance would surely be weaker if you would transfer the same data
(although in a not so verbose XML format) and need to connect and query a
SQL server.


Cheers,

Andre Parodi said:
> Hi,
>
> We currently are storing all of our data in sql database and use solr
> for indexing. We get a list of id's from solr and retrieve the data from
>  the db.
>
> We are considering storing all the data in solr to simplify
> administration and remove any synchronisation and are considering the
> following:
>
> 1. storing the data in individual fields in solr (indexed=true,
> store=true) 2. storing the data in a serialized form in a binary field
> in solr  (using google proto buffers or similar) and keep the rest of
> the solr  fields as indexed=true, stored=*false*.
> 3. keep as is. data stored in db and just keep solr fields as
> indexed=true, stored=false
>
> Can anyone provide some advice in terms of performance of the different
> approaches. Are there any obvious pitfalls to option 1 and 2 that i need
>  to be mindful of?
>
> I am thinking option 2 would be the fastest as it would be reading the
> data in one contiguous block. Will be doing some preformance test to
> verify this soon.
>
> FYI we are looking at 5-10M records, a serialised object is 500 to 1000
> bytes and we index approx 20 fields.
>
> Thanks for any advice.
> andre