You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Oscar Picasso <os...@yahoo.com> on 2006/04/12 15:07:46 UTC

Using lucene to speed up queries on a relational db

Hi,

I have a postgresql table which expects to have around 20 millions rows.

- The structure is the following;

code int              -- can take one of 100 values
property varchar(250) -- can take one of 5000 values
param01 char(10)      -- can take one of 10 values
param02 char(10)      -- can take one of 10 values
...
[ 20 similar columns }
...
parama20 char(10)     -- can take one of 10 values
kewords text          -- 0 to 15 keywords (any word from a human language like english can be a keyword)

- The queries will involve 1 to all the columns of table with an AND operator.

I find it very difficult to optimize this kind of queries in the relational database because there are too many possible field combinaisons to create useful indexes.

As the columns use a small set of values I tought that it would be more efficient to use Lucene to perform this kind of query.

Initial tests with around 200000 documents/rows are good.

But here is my concern.

What would be the performance for queries over 20 million documents/rows using up to 20 fields in the boolean query (with Occur.MUST)?.

Any idea?

Thanks

Oscar




__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com