You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Amit Nithian <an...@gmail.com> on 2012/07/04 09:54:08 UTC

Use of Solr as primary store for search engine

Hello all,

I am curious to know how people are using Solr in conjunction with
other data stores when building search engines to power web sites (say
an ecommerce site). The question I have for the group is given an
architecture where the primary (transactional) data store is MySQL
(Oracle, PostGres whatever) with periodic indexing into Solr, when
your front end issues a search query to Solr and returns results, are
there any joins with your primary Oracle/MySQL etc to help render
results?

Basically I guess my question is whether or not you store enough in
Solr so that when your front end renders the results page, it never
has to hit the database. The other option is that your search engine
only returns primary keys that your front end then uses to hit the DB
to fetch data to display to your end user.

With Solr 4.0 and Solr moving towards the NoSQL direction, I am
curious what people are doing and what application architectures with
Solr look like.

Thanks!
Amit

Re: Use of Solr as primary store for search engine

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Le 4 juil. 2012 à 21:17, Amit Nithian a écrit :
> Thanks for your response! Were you using the SQL database as an object
> store to pull XWiki objects or did you have to execute several queries
> to reconstruct these objects?

The first. It's all fairly transparent.
There are "XWiki Classes" and XWiki objects which are rendered, they live as composite of the XWiki-java-objects which hibernate-persisted.

> I don't know much about them sorry..
> Also for those responding, can you provide a few basic metrics for me?
> 1) Number of nodes receiving queries
> 2) Approximate queries per second
> 3) Approximate latency per query

I admire those that have this at hand.

> I know some of this may be sensitive depending on where you work so
> reasonable ranges would be nice (i.e. sub-second isn't hugely helpful
> since 50,100,200 ms have huge impacts depending on your site).

I think caching comes into play here in a very strong manner, so these measures are fairly difficult to establish. One Solr I run, in particular, makes differences between 100ms (uncached queries) and 9 ms (cached query).

Paul

Re: Use of Solr as primary store for search engine

Posted by Amit Nithian <an...@gmail.com>.
Paul,

Thanks for your response! Were you using the SQL database as an object
store to pull XWiki objects or did you have to execute several queries
to reconstruct these objects? I don't know much about them sorry..
Also for those responding, can you provide a few basic metrics for me?
1) Number of nodes receiving queries
2) Approximate queries per second
3) Approximate latency per query

I know some of this may be sensitive depending on where you work so
reasonable ranges would be nice (i.e. sub-second isn't hugely helpful
since 50,100,200 ms have huge impacts depending on your site).

Thanks again!
Amit

On Wed, Jul 4, 2012 at 1:09 AM, Paul Libbrecht <pa...@hoplahup.net> wrote:
> Amit,
>
> not exactly a response to your question but doing this with a lucene index on i2geo.net has resulted in considerably performance boost (reading from stored-fields instead of reading from the xwiki objects which pull from the SQL database). However, it implied that we had to rewrite anything necessary for the rendering, hence the rendering has not re-used that many code.
>
> Paul
>
>
> Le 4 juil. 2012 à 09:54, Amit Nithian a écrit :
>
>> Hello all,
>>
>> I am curious to know how people are using Solr in conjunction with
>> other data stores when building search engines to power web sites (say
>> an ecommerce site). The question I have for the group is given an
>> architecture where the primary (transactional) data store is MySQL
>> (Oracle, PostGres whatever) with periodic indexing into Solr, when
>> your front end issues a search query to Solr and returns results, are
>> there any joins with your primary Oracle/MySQL etc to help render
>> results?
>>
>> Basically I guess my question is whether or not you store enough in
>> Solr so that when your front end renders the results page, it never
>> has to hit the database. The other option is that your search engine
>> only returns primary keys that your front end then uses to hit the DB
>> to fetch data to display to your end user.
>>
>> With Solr 4.0 and Solr moving towards the NoSQL direction, I am
>> curious what people are doing and what application architectures with
>> Solr look like.
>>
>> Thanks!
>> Amit
>

Re: Use of Solr as primary store for search engine

Posted by Paul Libbrecht <pa...@hoplahup.net>.
Amit,

not exactly a response to your question but doing this with a lucene index on i2geo.net has resulted in considerably performance boost (reading from stored-fields instead of reading from the xwiki objects which pull from the SQL database). However, it implied that we had to rewrite anything necessary for the rendering, hence the rendering has not re-used that many code.

Paul


Le 4 juil. 2012 à 09:54, Amit Nithian a écrit :

> Hello all,
> 
> I am curious to know how people are using Solr in conjunction with
> other data stores when building search engines to power web sites (say
> an ecommerce site). The question I have for the group is given an
> architecture where the primary (transactional) data store is MySQL
> (Oracle, PostGres whatever) with periodic indexing into Solr, when
> your front end issues a search query to Solr and returns results, are
> there any joins with your primary Oracle/MySQL etc to help render
> results?
> 
> Basically I guess my question is whether or not you store enough in
> Solr so that when your front end renders the results page, it never
> has to hit the database. The other option is that your search engine
> only returns primary keys that your front end then uses to hit the DB
> to fetch data to display to your end user.
> 
> With Solr 4.0 and Solr moving towards the NoSQL direction, I am
> curious what people are doing and what application architectures with
> Solr look like.
> 
> Thanks!
> Amit


Re: Use of Solr as primary store for search engine

Posted by William Bell <bi...@gmail.com>.
For the search results we actually put the small amount of data in the core.

Once someone clicks the results and we need to go to the item to
display the detailed results, we create another core with a stored XML
string field and an ID. The ID is indexable, and the string field is
only stored.

So we have:

productsearch core
product core

This is in production and working fantastic for the last 4 months.

Our index is about 3M records.


On Thu, Jul 5, 2012 at 4:44 AM, Sohail Aboobaker <sa...@gmail.com> wrote:
> In many e-commerce sites, most of data that we display (except images)
> especially in grids and lists is minimal. We were inclined to use Solr as
> data store for only displaying the information in grids. We stopped only
> due to non-availability of joins in Solr3.5. Since, our data (like any
> other relational store) is split in multiple tables, we needed to
> de-normalize to use solr as a store. We decided against it because that
> would mean potentially heavy updates to indexes whenever related data is
> updated. With Solr 4.0, we might have decided differently and implement the
> grids using joins within solr.
>
> We are too new to Solr to have any insights into it.
>
> Regards,
> Sohail



-- 
Bill Bell
billnbell@gmail.com
cell 720-256-8076

Re: Use of Solr as primary store for search engine

Posted by Sohail Aboobaker <sa...@gmail.com>.
In many e-commerce sites, most of data that we display (except images)
especially in grids and lists is minimal. We were inclined to use Solr as
data store for only displaying the information in grids. We stopped only
due to non-availability of joins in Solr3.5. Since, our data (like any
other relational store) is split in multiple tables, we needed to
de-normalize to use solr as a store. We decided against it because that
would mean potentially heavy updates to indexes whenever related data is
updated. With Solr 4.0, we might have decided differently and implement the
grids using joins within solr.

We are too new to Solr to have any insights into it.

Regards,
Sohail

Re: Use of Solr as primary store for search engine

Posted by Savvas Andreas Moysidis <sa...@gmail.com>.
Hello,

We've used both approaches in the past and have concluded that Solr is
best not used as a data store. The reason for this is that as the UI
matures the team may be asked to include information which is not
searchable (therefore not indexed) so you may find yourselves adding
fields for the sole purpose of rendering convenience. This, of course,
has the consequence of you needing to re-index every time a new
non-searchable field is added and as we (painfully..) found out this
is not always an easy decision..
In addition you will have to update (or better replace) a specific
document when a non-searchable part is modified with all the "heavy"
consequences this has (commit, newSearcher etc).

So, the lesson for us was that we should be completely separating the
data storage/searching concerns and store only ids in Solr which, as
you mention, then use to retrieve the displayable information from our
db.
After all, as it has been mentioned many times by the commiters in
this list Solr is _not_ a data store technology. :)

One use case I could see Solr being used as storage medium is when it
is not easy or practical to reconstitute the source data every time a
hit is made (e.g. if you've indexed a number of pdf documents it might
not be practical to load a pdf when a result is selected)

Regards,
Savvas

On 5 July 2012 06:28, Shawn Heisey <so...@elyograg.org> wrote:
> On 7/4/2012 1:54 AM, Amit Nithian wrote:
>>
>> I am curious to know how people are using Solr in conjunction with
>> other data stores when building search engines to power web sites (say
>> an ecommerce site). The question I have for the group is given an
>> architecture where the primary (transactional) data store is MySQL
>> (Oracle, PostGres whatever) with periodic indexing into Solr, when
>> your front end issues a search query to Solr and returns results, are
>> there any joins with your primary Oracle/MySQL etc to help render
>> results?
>
>
> We used to pull almost everything from our previous search engine. Shortly
> after we switched to Solr, we began deploying a new version of our website
> which pulls more from the original data source.  The current goal is to only
> store just enough data in Solr to render a search result grid (pulling
> thumbails from the filesystem), but go to the database and the filesystem
> for detail pages.  We'd like to reduce the index size to the point where the
> whole thing will fit in RAM, which we hope will also reduce the amount of
> time required for a full reindex.
>
> What I hope to gain out of upgrading to Solr 4: Use the NRT features so that
> we can index item popularity and purchase data fast enough to make it
> actually useful.
>
> Thanks,
> Shawn
>

Re: Use of Solr as primary store for search engine

Posted by Shawn Heisey <so...@elyograg.org>.
On 7/4/2012 1:54 AM, Amit Nithian wrote:
> I am curious to know how people are using Solr in conjunction with
> other data stores when building search engines to power web sites (say
> an ecommerce site). The question I have for the group is given an
> architecture where the primary (transactional) data store is MySQL
> (Oracle, PostGres whatever) with periodic indexing into Solr, when
> your front end issues a search query to Solr and returns results, are
> there any joins with your primary Oracle/MySQL etc to help render
> results?

We used to pull almost everything from our previous search engine. 
Shortly after we switched to Solr, we began deploying a new version of 
our website which pulls more from the original data source.  The current 
goal is to only store just enough data in Solr to render a search result 
grid (pulling thumbails from the filesystem), but go to the database and 
the filesystem for detail pages.  We'd like to reduce the index size to 
the point where the whole thing will fit in RAM, which we hope will also 
reduce the amount of time required for a full reindex.

What I hope to gain out of upgrading to Solr 4: Use the NRT features so 
that we can index item popularity and purchase data fast enough to make 
it actually useful.

Thanks,
Shawn