You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by ilwes <on...@mailinator.com> on 2009/01/30 16:08:10 UTC

Best Practice for Lucene Search

Hello,

I googled, searched this Forum and read the manual, but I'm not sure what
would be the best practice for Lucene search.

I have an e-Commerce application with about 10 mySQL tables for my products.
And I have an Index (which is working fine), with about 10 fields for every
product. Is it a common way having the same data (title, description, tags,
paths to pictures, sold_counter..etc) redundant in my mySQL DB and in the
Index? And everytime I add a product, saving it to both? Would it not reduce
the performance doing always things twice?

What would be the best practice? 
1) Save it to both index and mySQL DB (as I'm doing right now). 
2) Save only searchable fields (title, description and tags) and an
product_id to index and use product_id to query everything else from DB?
3) ..?

Would be thankful for some hints and your experience.

Thx,
ilwes

p.s. btw. im working with Zend/PHP but this shouldn't have any impact on
this question
-- 
View this message in context: http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21748839.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best Practice for Lucene Search

Posted by ilwes <on...@mailinator.com>.
I like the point about doing things the easiest way possible until it starts
to become a problem. 
Thank you very much for your answers and for the insight how you handle this
issue. You helped me a lot.

Ilwes
-- 
View this message in context: http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21781870.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best Practice for Lucene Search

Posted by "Karsten F." <ka...@fiz-technik.de>.
Hi ilwes,

Did you noticed the thread
http://www.nabble.com/Lucene-vs.-Database-td19755932.html
?

I think it is usefull for the question about using lucene storage fields
even if you already have the information in DB.

Best regards
  Karsten



ilwes wrote:
> 
> Hello,
> 
> I googled, searched this Forum and read the manual, but I'm not sure what
> would be the best practice for Lucene search.
> 
> I have an e-Commerce application with about 10 mySQL tables for my
> products. And I have an Index (which is working fine), with about 10
> fields for every product. Is it a common way having the same data (title,
> description, tags, paths to pictures, sold_counter..etc) redundant in my
> mySQL DB and in the Index? And everytime I add a product, saving it to
> both? Would it not reduce the performance doing always things twice?
> 
> What would be the best practice? 
> 1) Save it to both index and mySQL DB (as I'm doing right now). 
> 2) Save only searchable fields (title, description and tags) and an
> product_id to index and use product_id to query everything else from DB?
> 3) ..?
> 
> Would be thankful for some hints and your experience.
> 
> Thx,
> ilwes
> 
> p.s. btw. im working with Zend/PHP but this shouldn't have any impact on
> this question
> 

-- 
View this message in context: http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21789560.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Best Practice for Lucene Search

Posted by Konstantyn Smirnov <in...@yahoo.com>.
In the beginning of the development, I was also facing a choice to mirror the
documents in DB/index.

But when the number of raws reached the mark of 7 mio, the query like 

        "select count(id) from documentz" 

(using PostgresQL) would take ages (ok, about 10 minutes!!! ), it became
clear to me, that something is not right with that approach :confused:.

The other reasons to name a few, would be the need to run the data import
(almost) twice for the index and DB, and then synchronize them in case of
changes.

At the moment, I have a set-up of 6 physical indieces, 15 GB each and in
total I have like 46 mio documents, and can say that I'm pretty happy with
the search performance.

Ah, almost forgot! Those 46 mio documents represent around 100 different
sources (field-structures), and would need to be persisted in 100 different
DB-tables. Also a whole lot of new sources are expected to come, and be
added into the stack ONLINE w/o the server restart! 

Using mixed DB/index solution it would be nightmare to maintain that, but a
single Lucene index copes with the task fast and easy.

So, my vote for text-only datas goes clearly to Lucene-only solution :)
-- 
View this message in context: http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21955474.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: Best Practice for Lucene Search

Posted by Uwe Schindler <uw...@thetaphi.de>.
We do it in the same way. We have our RDBMS for administer our
metadata/data. The search frontend for end users works completely with
Lucene/panFMP (www.pangaea.de). We marshal all our relational data to XML
files and index their contents using lucene. But the XML file is also stored
in lucene as stored field. The search results are displayed to end user
using the hits from lucene together with the stored XML content (using XSL).
This is very much faster and better decoupled from the database.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Ian Lea [mailto:ian.lea@gmail.com]
> Sent: Friday, January 30, 2009 4:57 PM
> To: java-user@lucene.apache.org
> Subject: Re: Best Practice for Lucene Search
> 
> That answer is fine, but there are others.  We store denormalized data
> in lucene, as you are doing, for display on web pages because we can
> get it out of lucene much faster then we can get it out of the various
> tables in the database.  The database is not as fast as it might be,
> quite possibly slower than yours.  And yes, there is overhead in terms
> of space and time in having 2 copies of the data but space is cheap
> and there aren't that many writes and they happen offline so we don't
> really care if they take a bit longer.  We don't store everything in
> lucene by any means - just what is returned for product searches.
> 
> Overall I don't think there is a single best practice recommendation.
> As so often, it depends on your setup, requirements and preferences.
> 
> 
> --
> Ian.
> 
> 
> On Fri, Jan 30, 2009 at 3:13 PM, Nilesh Thatte <ni...@yahoo.com>
> wrote:
> > Hello
> >
> > I would store normalised data in MySQL and index only searchable content
> in Lucene.
> >
> > Regards
> > Nilesh
> >
> >
> >
> >
> >
> >
> > ________________________________
> > From: ilwes <on...@mailinator.com>
> > To: java-user@lucene.apache.org
> > Sent: Friday, 30 January, 2009 15:08:10
> > Subject: Best Practice for Lucene Search
> >
> >
> > Hello,
> >
> > I googled, searched this Forum and read the manual, but I'm not sure
> what
> > would be the best practice for Lucene search.
> >
> > I have an e-Commerce application with about 10 mySQL tables for my
> products.
> > And I have an Index (which is working fine), with about 10 fields for
> every
> > product. Is it a common way having the same data (title, description,
> tags,
> > paths to pictures, sold_counter..etc) redundant in my mySQL DB and in
> the
> > Index? And everytime I add a product, saving it to both? Would it not
> reduce
> > the performance doing always things twice?
> >
> > What would be the best practice?
> > 1) Save it to both index and mySQL DB (as I'm doing right now).
> > 2) Save only searchable fields (title, description and tags) and an
> > product_id to index and use product_id to query everything else from DB?
> > 3) ..?
> >
> > Would be thankful for some hints and your experience.
> >
> > Thx,
> > ilwes
> >
> > p.s. btw. im working with Zend/PHP but this shouldn't have any impact on
> > this question
> > --
> > View this message in context: http://www.nabble.com/Best-Practice-for-
> Lucene-Search-tp21748839p21748839.html
> > Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best Practice for Lucene Search

Posted by Ian Lea <ia...@gmail.com>.
That answer is fine, but there are others.  We store denormalized data
in lucene, as you are doing, for display on web pages because we can
get it out of lucene much faster then we can get it out of the various
tables in the database.  The database is not as fast as it might be,
quite possibly slower than yours.  And yes, there is overhead in terms
of space and time in having 2 copies of the data but space is cheap
and there aren't that many writes and they happen offline so we don't
really care if they take a bit longer.  We don't store everything in
lucene by any means - just what is returned for product searches.

Overall I don't think there is a single best practice recommendation.
As so often, it depends on your setup, requirements and preferences.


--
Ian.


On Fri, Jan 30, 2009 at 3:13 PM, Nilesh Thatte <ni...@yahoo.com> wrote:
> Hello
>
> I would store normalised data in MySQL and index only searchable content in Lucene.
>
> Regards
> Nilesh
>
>
>
>
>
>
> ________________________________
> From: ilwes <on...@mailinator.com>
> To: java-user@lucene.apache.org
> Sent: Friday, 30 January, 2009 15:08:10
> Subject: Best Practice for Lucene Search
>
>
> Hello,
>
> I googled, searched this Forum and read the manual, but I'm not sure what
> would be the best practice for Lucene search.
>
> I have an e-Commerce application with about 10 mySQL tables for my products.
> And I have an Index (which is working fine), with about 10 fields for every
> product. Is it a common way having the same data (title, description, tags,
> paths to pictures, sold_counter..etc) redundant in my mySQL DB and in the
> Index? And everytime I add a product, saving it to both? Would it not reduce
> the performance doing always things twice?
>
> What would be the best practice?
> 1) Save it to both index and mySQL DB (as I'm doing right now).
> 2) Save only searchable fields (title, description and tags) and an
> product_id to index and use product_id to query everything else from DB?
> 3) ..?
>
> Would be thankful for some hints and your experience.
>
> Thx,
> ilwes
>
> p.s. btw. im working with Zend/PHP but this shouldn't have any impact on
> this question
> --
> View this message in context: http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21748839.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Best Practice for Lucene Search

Posted by Nilesh Thatte <ni...@yahoo.com>.
Hello 

I would store normalised data in MySQL and index only searchable content in Lucene.

Regards
Nilesh


 



________________________________
From: ilwes <on...@mailinator.com>
To: java-user@lucene.apache.org
Sent: Friday, 30 January, 2009 15:08:10
Subject: Best Practice for Lucene Search


Hello,

I googled, searched this Forum and read the manual, but I'm not sure what
would be the best practice for Lucene search.

I have an e-Commerce application with about 10 mySQL tables for my products.
And I have an Index (which is working fine), with about 10 fields for every
product. Is it a common way having the same data (title, description, tags,
paths to pictures, sold_counter..etc) redundant in my mySQL DB and in the
Index? And everytime I add a product, saving it to both? Would it not reduce
the performance doing always things twice?

What would be the best practice? 
1) Save it to both index and mySQL DB (as I'm doing right now). 
2) Save only searchable fields (title, description and tags) and an
product_id to index and use product_id to query everything else from DB?
3) ..?

Would be thankful for some hints and your experience.

Thx,
ilwes

p.s. btw. im working with Zend/PHP but this shouldn't have any impact on
this question
-- 
View this message in context: http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21748839.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


      

Re: Best Practice for Lucene Search

Posted by Erick Erickson <er...@gmail.com>.
Do you have a reasonable expectation that performance is going
to be a problem? The reason I ask is that I'm always suspicious
of efficiency arguments when "things are working fine". Unless and
until you can confidently predict that you're going to hit a
performance issue, do it the easiest way possible.

The same goes for space. Who cares if your data takes
up a Gig of extra space by storing things twice? Of course you
*do* care if you take up an extra 100G of space.

It's hard to make recommendations that mean anything
unless you fill in some of the space/time details you have/
expect to have, because the answer varies depending
upon what you need/expect.

Best
Erick


On Fri, Jan 30, 2009 at 10:08 AM, ilwes <on...@mailinator.com> wrote:

>
> Hello,
>
> I googled, searched this Forum and read the manual, but I'm not sure what
> would be the best practice for Lucene search.
>
> I have an e-Commerce application with about 10 mySQL tables for my
> products.
> And I have an Index (which is working fine), with about 10 fields for every
> product. Is it a common way having the same data (title, description, tags,
> paths to pictures, sold_counter..etc) redundant in my mySQL DB and in the
> Index? And everytime I add a product, saving it to both? Would it not
> reduce
> the performance doing always things twice?
>
> What would be the best practice?
> 1) Save it to both index and mySQL DB (as I'm doing right now).
> 2) Save only searchable fields (title, description and tags) and an
> product_id to index and use product_id to query everything else from DB?
> 3) ..?
>
> Would be thankful for some hints and your experience.
>
> Thx,
> ilwes
>
> p.s. btw. im working with Zend/PHP but this shouldn't have any impact on
> this question
> --
> View this message in context:
> http://www.nabble.com/Best-Practice-for-Lucene-Search-tp21748839p21748839.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>