You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Oscar Picasso <os...@yahoo.com> on 2006/04/12 15:07:46 UTC
Using lucene to speed up queries on a relational db
Hi,
I have a postgresql table which expects to have around 20 millions rows.
- The structure is the following;
code int -- can take one of 100 values
property varchar(250) -- can take one of 5000 values
param01 char(10) -- can take one of 10 values
param02 char(10) -- can take one of 10 values
...
[ 20 similar columns }
...
parama20 char(10) -- can take one of 10 values
kewords text -- 0 to 15 keywords (any word from a human language like english can be a keyword)
- The queries will involve 1 to all the columns of table with an AND operator.
I find it very difficult to optimize this kind of queries in the relational database because there are too many possible field combinaisons to create useful indexes.
As the columns use a small set of values I tought that it would be more efficient to use Lucene to perform this kind of query.
Initial tests with around 200000 documents/rows are good.
But here is my concern.
What would be the performance for queries over 20 million documents/rows using up to 20 fields in the boolean query (with Occur.MUST)?.
Any idea?
Thanks
Oscar
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com