You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Jochen Hebbrecht <jo...@gmail.com> on 2012/07/03 09:56:00 UTC

Mapping Lucene search results with a relational database

Hi all,

I have an application which holds a list of documents. These documents are
indexed using Lucene.
I can search on keywords of the documents. I loop the TopDocs and get the
ID field (of each Lucene doc) which is related to the ID column in my
relational database. From all these ID's, I create a list.
After building the list of ID's, I make a database query which is executing
the following SELECT statement (JPA):

<<
SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)
>>

This list of document is sent to the view (GUI).


But, some documents are private and should not be in the list. Therefore,
we have some extra statements in the SELECT query to do some security
checks:

<<
SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)
AND rule1 = foo
AND rule2 = bar
>>

But now I'm wondering: I'm using the speed of Lucene to quickly search
documents, but I still have to do the SELECT query. So I'm loosing
performance on this one :-( ...
Does Lucene have some component which does this mapping for you? Or are
there any best practices on this issue? How do big projects map the Lucene
results to the relation database? Because the view should be rendering the
results?

Many thanks!
Jochen

Re: Mapping Lucene search results with a relational database

Posted by Jochen Hebbrecht <jo...@gmail.com>.
Hi Feng,

Hmmm, but Document is a Java object? It holds all kinds of other objects
like Sets, Lists, Maps, Strings, Doubles, ...
Can we store Java objects in a Lucene index?

Jochen


2012/7/3 feng lu <am...@gmail.com>

> and you can add d as a field with STORE and NOT_ANALYZER tag to it if d in
> Document is not large.
>
> On Tue, Jul 3, 2012 at 4:04 PM, Chris Lu <ch...@gmail.com> wrote:
>
> > Can you index the rule1 and rule2 fields into the documents, and when
> > searching with the keywords, also append rule1:foo and rule2:bar to the
> > query?
> >
> > Chris
> > -------------------------
> > Instant Scalable Full-Text Search On Any Database/Application
> > site: http://www.dbsight.net
> > demo: http://search.dbsight.com
> > Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.**
> > php?title=Create_Lucene_**Database_Search_in_3_minutes<
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> >
> > DBSight customer, a shopping comparison site, (anonymous per request) got
> > 2.6 Million Euro funding!
> >
> >
> > On 7/3/12 12:56 AM, Jochen Hebbrecht wrote:
> >
> >> Hi all,
> >>
> >> I have an application which holds a list of documents. These documents
> are
> >> indexed using Lucene.
> >> I can search on keywords of the documents. I loop the TopDocs and get
> the
> >> ID field (of each Lucene doc) which is related to the ID column in my
> >> relational database. From all these ID's, I create a list.
> >> After building the list of ID's, I make a database query which is
> >> executing
> >> the following SELECT statement (JPA):
> >>
> >> <<
> >> SELECT d From Document WHERE id IN (##list of ID's retrieved from
> >> Lucene##)
> >> This list of document is sent to the view (GUI).
> >>
> >>
> >> But, some documents are private and should not be in the list.
> Therefore,
> >> we have some extra statements in the SELECT query to do some security
> >> checks:
> >>
> >> <<
> >> SELECT d From Document WHERE id IN (##list of ID's retrieved from
> >> Lucene##)
> >> AND rule1 = foo
> >> AND rule2 = bar
> >> But now I'm wondering: I'm using the speed of Lucene to quickly search
> >> documents, but I still have to do the SELECT query. So I'm loosing
> >> performance on this one :-( ...
> >> Does Lucene have some component which does this mapping for you? Or are
> >> there any best practices on this issue? How do big projects map the
> Lucene
> >> results to the relation database? Because the view should be rendering
> the
> >> results?
> >>
> >> Many thanks!
> >> Jochen
> >>
> >>
> >
> >
>
>
> --
> Don't Grow Old, Grow Up... :-)
>

Re: Mapping Lucene search results with a relational database

Posted by feng lu <am...@gmail.com>.
and you can add d as a field with STORE and NOT_ANALYZER tag to it if d in
Document is not large.

On Tue, Jul 3, 2012 at 4:04 PM, Chris Lu <ch...@gmail.com> wrote:

> Can you index the rule1 and rule2 fields into the documents, and when
> searching with the keywords, also append rule1:foo and rule2:bar to the
> query?
>
> Chris
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.**
> php?title=Create_Lucene_**Database_Search_in_3_minutes<http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes>
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
> On 7/3/12 12:56 AM, Jochen Hebbrecht wrote:
>
>> Hi all,
>>
>> I have an application which holds a list of documents. These documents are
>> indexed using Lucene.
>> I can search on keywords of the documents. I loop the TopDocs and get the
>> ID field (of each Lucene doc) which is related to the ID column in my
>> relational database. From all these ID's, I create a list.
>> After building the list of ID's, I make a database query which is
>> executing
>> the following SELECT statement (JPA):
>>
>> <<
>> SELECT d From Document WHERE id IN (##list of ID's retrieved from
>> Lucene##)
>> This list of document is sent to the view (GUI).
>>
>>
>> But, some documents are private and should not be in the list. Therefore,
>> we have some extra statements in the SELECT query to do some security
>> checks:
>>
>> <<
>> SELECT d From Document WHERE id IN (##list of ID's retrieved from
>> Lucene##)
>> AND rule1 = foo
>> AND rule2 = bar
>> But now I'm wondering: I'm using the speed of Lucene to quickly search
>> documents, but I still have to do the SELECT query. So I'm loosing
>> performance on this one :-( ...
>> Does Lucene have some component which does this mapping for you? Or are
>> there any best practices on this issue? How do big projects map the Lucene
>> results to the relation database? Because the view should be rendering the
>> results?
>>
>> Many thanks!
>> Jochen
>>
>>
>
>


-- 
Don't Grow Old, Grow Up... :-)

Re: Mapping Lucene search results with a relational database

Posted by Jochen Hebbrecht <jo...@gmail.com>.
Hi Chris,

Yeah! That's a possibility, thanks :-)!

Jochen


2012/7/3 Chris Lu <ch...@gmail.com>

> Can you index the rule1 and rule2 fields into the documents, and when
> searching with the keywords, also append rule1:foo and rule2:bar to the
> query?
>
> Chris
> -------------------------
> Instant Scalable Full-Text Search On Any Database/Application
> site: http://www.dbsight.net
> demo: http://search.dbsight.com
> Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.**
> php?title=Create_Lucene_**Database_Search_in_3_minutes<http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes>
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
>
> On 7/3/12 12:56 AM, Jochen Hebbrecht wrote:
>
>> Hi all,
>>
>> I have an application which holds a list of documents. These documents are
>> indexed using Lucene.
>> I can search on keywords of the documents. I loop the TopDocs and get the
>> ID field (of each Lucene doc) which is related to the ID column in my
>> relational database. From all these ID's, I create a list.
>> After building the list of ID's, I make a database query which is
>> executing
>> the following SELECT statement (JPA):
>>
>> <<
>> SELECT d From Document WHERE id IN (##list of ID's retrieved from
>> Lucene##)
>> This list of document is sent to the view (GUI).
>>
>>
>> But, some documents are private and should not be in the list. Therefore,
>> we have some extra statements in the SELECT query to do some security
>> checks:
>>
>> <<
>> SELECT d From Document WHERE id IN (##list of ID's retrieved from
>> Lucene##)
>> AND rule1 = foo
>> AND rule2 = bar
>> But now I'm wondering: I'm using the speed of Lucene to quickly search
>> documents, but I still have to do the SELECT query. So I'm loosing
>> performance on this one :-( ...
>> Does Lucene have some component which does this mapping for you? Or are
>> there any best practices on this issue? How do big projects map the Lucene
>> results to the relation database? Because the view should be rendering the
>> results?
>>
>> Many thanks!
>> Jochen
>>
>>
>
>

Re: Mapping Lucene search results with a relational database

Posted by Chris Lu <ch...@gmail.com>.
Can you index the rule1 and rule2 fields into the documents, and when 
searching with the keywords, also append rule1:foo and rule2:bar to the 
query?

Chris
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) 
got 2.6 Million Euro funding!

On 7/3/12 12:56 AM, Jochen Hebbrecht wrote:
> Hi all,
>
> I have an application which holds a list of documents. These documents are
> indexed using Lucene.
> I can search on keywords of the documents. I loop the TopDocs and get the
> ID field (of each Lucene doc) which is related to the ID column in my
> relational database. From all these ID's, I create a list.
> After building the list of ID's, I make a database query which is executing
> the following SELECT statement (JPA):
>
> <<
> SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)
> This list of document is sent to the view (GUI).
>
>
> But, some documents are private and should not be in the list. Therefore,
> we have some extra statements in the SELECT query to do some security
> checks:
>
> <<
> SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)
> AND rule1 = foo
> AND rule2 = bar
> But now I'm wondering: I'm using the speed of Lucene to quickly search
> documents, but I still have to do the SELECT query. So I'm loosing
> performance on this one :-( ...
> Does Lucene have some component which does this mapping for you? Or are
> there any best practices on this issue? How do big projects map the Lucene
> results to the relation database? Because the view should be rendering the
> results?
>
> Many thanks!
> Jochen
>



Re: Mapping Lucene search results with a relational database

Posted by mark harwood <ma...@yahoo.co.uk>.
Many considerations here - I find the technical concerns you present typically open a can of worms for any businesses worried about security.
It gets political quickly.
 
In environments where security is paramount, software must be formally accredited, which is a costly exercise.

Often the choice of database e.g. Oracle has been formally accredited but Lucene has not, consequently all search results have to be run by the database (as in your example) for a trusted judgement call.
Even in less demanding situations, it may just be that the Database team still want to cling to some control over who sees what once they have delegated search responsibilities to an external search engine.
 
In these scenarios it is often a mistake to have a naiive Lucene implementation which returns many results, regardless of security rules, only to have many of them filtered out by the database.
In the worst case scenario the top million Lucene results may all be filtered out by the database and the search has to be repeated for the top 2 million and so on until the desired number of results are returned.

For these reasons it is advisable that Lucene searches should attempt to mirror the security logic implemented by the database (e.g. rules like "doc must be from same dept as current user" etc).
Businesses don't tend to like duplicating rules in this way or the latency involved in seeing upstream security changes in database or security domain reflected in the search index.
Another consequence of duplicating the rules in Lucene is that while the solution as a whole is accredited to yield no false positives (returning a document as safe when it should be secured) there is a danger that Lucene could yield a false negative (an inconsistent Lucene filter or doc denies access to a document that the database would have permitted). This may be seen as equally bad.

The reality is however that for performance reasons this is the way things have to be and various business stakeholders have to be convinced of this.
Not sure if this describes your scenario but it is one that I've encountered many times.


Cheers
Mark


----- Original Message -----
From: Jochen Hebbrecht <jo...@gmail.com>
To: java-user@lucene.apache.org
Cc: 
Sent: Tuesday, 3 July 2012, 8:56
Subject: Mapping Lucene search results with a relational database

Hi all,

I have an application which holds a list of documents. These documents are
indexed using Lucene.
I can search on keywords of the documents. I loop the TopDocs and get the
ID field (of each Lucene doc) which is related to the ID column in my
relational database. From all these ID's, I create a list.
After building the list of ID's, I make a database query which is executing
the following SELECT statement (JPA):

<<
SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)
>>

This list of document is sent to the view (GUI).


But, some documents are private and should not be in the list. Therefore,
we have some extra statements in the SELECT query to do some security
checks:

<<
SELECT d From Document WHERE id IN (##list of ID's retrieved from Lucene##)
AND rule1 = foo
AND rule2 = bar
>>

But now I'm wondering: I'm using the speed of Lucene to quickly search
documents, but I still have to do the SELECT query. So I'm loosing
performance on this one :-( ...
Does Lucene have some component which does this mapping for you? Or are
there any best practices on this issue? How do big projects map the Lucene
results to the relation database? Because the view should be rendering the
results?

Many thanks!
Jochen


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org