You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sp...@gmx.eu on 2008/02/13 18:21:40 UTC

design: merging resultset from RDBMS with lucene search results

Hi,

I have the following scenario:

RDBMS which contains the metadata for documents (ID, customer number,
doctype etc.).
Now I want to add fulltext search support.

So I will index the documents content in lucene and add the documents ID as
a stored field in lucene.

Now somebody wants to search like this: customer number 1234 AND content
"foo bar".

So I go to lucene, search for content:"foo bar" and get back a hitlist
containing the documents IDs.

Now - how to merge these Ids with the resultset of the RDBM's search for
customer number 1234?

1) select ... from ... where customer=1234 and ID in (<lucenes ID list>).

or

2) select ... from ... where customer=1234 and them join both resultsets in
the application.

or

3) no idea :)

What is best practice here?

Thank you.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: design: merging resultset from RDBMS with lucene search results

Posted by sp...@gmx.eu.
The metadata is quite offen altered and there are millions of documents.
Also document access is secured by complex sql statements which lucene might
not support.
So this is not an option I think.

> -----Original Message-----
> From: John Byrne [mailto:john.byrne@propylon.com] 
> Sent: Mittwoch, 13. Februar 2008 18:44
> To: java-user@lucene.apache.org
> Subject: Re: design: merging resultset from RDBMS with lucene 
> search results
> 
> Hi,
> 
> You might consider avoiding this problem altogether, by simply adding 
> the meta data to your Lucene index. Lucene can handle untokenized 
> fields, which is ideal for meta data. It might not be as quick as the 
> RDB, but you could perhaps optimize by only searching in the RDB when 
> you only need to search meta data, and using Lucene when you 
> need both.
> 
> Regards,
> JB
> 
> spring@gmx.eu wrote:
> > Hi,
> >
> > I have the following scenario:
> >
> > RDBMS which contains the metadata for documents (ID, 
> customer number,
> > doctype etc.).
> > Now I want to add fulltext search support.
> >
> > So I will index the documents content in lucene and add the 
> documents ID as
> > a stored field in lucene.
> >
> > Now somebody wants to search like this: customer number 
> 1234 AND content
> > "foo bar".
> >
> > So I go to lucene, search for content:"foo bar" and get 
> back a hitlist
> > containing the documents IDs.
> >
> > Now - how to merge these Ids with the resultset of the 
> RDBM's search for
> > customer number 1234?
> >
> > 1) select ... from ... where customer=1234 and ID in 
> (<lucenes ID list>).
> >
> > or
> >
> > 2) select ... from ... where customer=1234 and them join 
> both resultsets in
> > the application.
> >
> > or
> >
> > 3) no idea :)
> >
> > What is best practice here?
> >
> > Thank you.
> >
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> >
> >
> >   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: design: merging resultset from RDBMS with lucene search results

Posted by John Byrne <jo...@propylon.com>.
Hi,

You might consider avoiding this problem altogether, by simply adding 
the meta data to your Lucene index. Lucene can handle untokenized 
fields, which is ideal for meta data. It might not be as quick as the 
RDB, but you could perhaps optimize by only searching in the RDB when 
you only need to search meta data, and using Lucene when you need both.

Regards,
JB

spring@gmx.eu wrote:
> Hi,
>
> I have the following scenario:
>
> RDBMS which contains the metadata for documents (ID, customer number,
> doctype etc.).
> Now I want to add fulltext search support.
>
> So I will index the documents content in lucene and add the documents ID as
> a stored field in lucene.
>
> Now somebody wants to search like this: customer number 1234 AND content
> "foo bar".
>
> So I go to lucene, search for content:"foo bar" and get back a hitlist
> containing the documents IDs.
>
> Now - how to merge these Ids with the resultset of the RDBM's search for
> customer number 1234?
>
> 1) select ... from ... where customer=1234 and ID in (<lucenes ID list>).
>
> or
>
> 2) select ... from ... where customer=1234 and them join both resultsets in
> the application.
>
> or
>
> 3) no idea :)
>
> What is best practice here?
>
> Thank you.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: design: merging resultset from RDBMS with lucene search results

Posted by Erick Erickson <er...@gmail.com>.
Another possibility is to do it backwards, it depends on how expensive the
SQL query is I suppose. The idea would be to go ahead and to your
SQL query *first*, then construct a Lucene Filter to use with your query
using TermDocs/TermEnum.

I'd guess (without knowing much about your problem space) that *if* you
can spin through your SQL query and extract all of the unique doc IDs
acceptably quickly that you won't notice constructing the filter time-wise.

Also, search through the user list archive for embedding Lucene in a
database.
I confess that that entire discussion flew right over my head, so I don't
know if it's applicable, but........

Erick

On Wed, Feb 13, 2008 at 12:21 PM, <sp...@gmx.eu> wrote:

> Hi,
>
> I have the following scenario:
>
> RDBMS which contains the metadata for documents (ID, customer number,
> doctype etc.).
> Now I want to add fulltext search support.
>
> So I will index the documents content in lucene and add the documents ID
> as
> a stored field in lucene.
>
> Now somebody wants to search like this: customer number 1234 AND content
> "foo bar".
>
> So I go to lucene, search for content:"foo bar" and get back a hitlist
> containing the documents IDs.
>
> Now - how to merge these Ids with the resultset of the RDBM's search for
> customer number 1234?
>
> 1) select ... from ... where customer=1234 and ID in (<lucenes ID list>).
>
> or
>
> 2) select ... from ... where customer=1234 and them join both resultsets
> in
> the application.
>
> or
>
> 3) no idea :)
>
> What is best practice here?
>
> Thank you.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>