You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Steve Skillcorn <ss...@yahoo.com> on 2004/12/19 23:05:58 UTC

Optimising A Security Filter

Hello All;

I bought the Lucene in Action ebook, which is
excellent and I can strongly recommend.  One question
that has arisen from the book though is custom
filters.

I have the situation where the text of my docs is in
Lucene, but the permissions are in my RDBMS.  I can
write a filter (in fact have done so) that loops
through the documents in the passed IndexReader and
queries the DB to detect if the user is permissioned
for them, setting the relevant BitSet.  My results are
then paged (< last | next >) to a web page.

Does the IndexReader that is passed to the �bits�
method of the filter represent the entire index, or
just the results that match the query?

Is not worrying about filters and simply checking the
returned Hit List before presenting a sensible
approach?

I can see the point to filters as presented in the
Lucene in Action ISBN example, but are they a good
approach where they could end up laboriously marking
the entire index as True?

All help greatly appreciated.  Thanks to the authors
for Lucene in Action, it's given me the high level
best practices I was needing.

Steve



		
__________________________________ 
Do you Yahoo!? 
Yahoo! Mail - now with 250MB free storage. Learn more.
http://info.mail.yahoo.com/mail_250

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optimising A Security Filter

Posted by Paul Elschot <pa...@xs4all.nl>.
On Sunday 19 December 2004 23:05, Steve Skillcorn wrote:
> Hello All;
> 
> I bought the Lucene in Action ebook, which is
> excellent and I can strongly recommend.  One question
> that has arisen from the book though is custom
> filters.
> 
> I have the situation where the text of my docs is in
> Lucene, but the permissions are in my RDBMS.  I can
> write a filter (in fact have done so) that loops
> through the documents in the passed IndexReader and
> queries the DB to detect if the user is permissioned
> for them, setting the relevant BitSet.  My results are
> then paged (< last | next >) to a web page.
> 
> Does the IndexReader that is passed to the “bits”
> method of the filter represent the entire index, or
> just the results that match the query?

The IndexReader represents the entire index.

> Is not worrying about filters and simply checking the
> returned Hit List before presenting a sensible
> approach?

That's is done by the IndexSearcher.search() methods
that take a filter argument.
 
> I can see the point to filters as presented in the
> Lucene in Action ISBN example, but are they a good
> approach where they could end up laboriously marking
> the entire index as True?

The filter is checked only for search results on the query
over the whole index.

The bit filters generally work well, except when you need
a lot of very sparse filters and memory is a concern.

Regards,
Paul Elschot
 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Optimising A Security Filter

Posted by Erik Hatcher <er...@ehatchersolutions.com>.
Paul already replied, but I'll add my thoughts below to the thread 
also...

On Dec 19, 2004, at 5:05 PM, Steve Skillcorn wrote:
> I bought the Lucene in Action ebook, which is
> excellent and I can strongly recommend.

Thank you!!!!

> Does the IndexReader that is passed to the “bits”
> method of the filter represent the entire index, or
> just the results that match the query?

It represents the entire index at the time it was instantiated.  This 
is important to know in case documents are later added to the index.

> Is not worrying about filters and simply checking the
> returned Hit List before presenting a sensible
> approach?

It depends.  Is the performance of checking a relational database for 
the results being shown to the user acceptable?  Is the security risk 
of a new piece of code forgetting to check the results of a search 
worth it?

> I can see the point to filters as presented in the
> Lucene in Action ISBN example, but are they a good
> approach where they could end up laboriously marking
> the entire index as True?

Iterating through every document in the index certainly is time 
consuming and not something you should do for every search.  However, 
filters are designed to be long-lived.  Write your filter to simply do 
the logic of checking each document against the database, then wrap 
your filter with the caching wrapper.  Be sure to use the same 
IndexReader for each search.  When the index changes, rebuild the 
filter.

There is no clear best way to do this type of filtering of results, I 
don't believe.  There are details to consider for either of these 
approaches.

> All help greatly appreciated.  Thanks to the authors
> for Lucene in Action, it's given me the high level
> best practices I was needing.

Steve - I really appreciate hearing this.  Putting this work to public 
scrutiny opens the possibilities of opinion.  Your comments hearten me.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org