You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Murali <mu...@gmail.com> on 2005/12/21 18:32:42 UTC

searching portions of an index

Hi,

    I am new to lucene. We need to provide search to several users of a
system. Each user has access to a (different)set of documents. The same
document might be accessible by different users. I want to implement this
without indexing a document multiple times. The approach I thought of was to
use a field that is indexed, as well as stored in the index, which contains
the ids of all the users that can access the document. I could then use
boolean queries to search for documents accessible by a particular user. I
figured that I would have to delete and add the whole document again into
the system if a new user is to be given access to an already indexed
document(and I figure that this will happen frequently in the system). Is
there a better approach that I can take?

Thanks,
Murali

RE: searching portions of an index

Posted by Dmitry Goldenberg <dm...@weblayers.com>.
You can implement a security filter, kind of like what the book Lucene in Action describes.  It is a class that extends org.apache.lucene.search.Filter; you're required to implement the following method:
 
public BitSet bits(IndexReader reader)
 
In it, you can decide whether a particular document may be viewed by the user.  The way I do it is I associate an instance of the Filter class with my searcher before I execute a search for a particular user:
 
Hits hits = is.search(executableQuery, (Filter) filter, getSort());
 
The Filter has a condition interface registered with it which knows how to check whether the user in question has specific access rights.  This condition is checked at runtime when I get to read from IndexReader in the bits(IndexReader reader) method.  This way, the BitSet returned by the Filter only contains the items viewable by the user in question.
 
I think this is much better than indexing your access control lists along with the document data.  Any access changes may sometimes cause a significant amount of reindexing, as you pointed out.  The only thing to watch out for is to make sure that your authorization checking mechanism is optimized enough performance-wise so as not to clog up the results filtering process...
 
Hope this helps,
- Dmitry

________________________________

From: Murali [mailto:muralidharan@gmail.com]
Sent: Wed 12/21/2005 9:32 AM
To: java-user@lucene.apache.org
Subject: searching portions of an index



Hi,

    I am new to lucene. We need to provide search to several users of a
system. Each user has access to a (different)set of documents. The same
document might be accessible by different users. I want to implement this
without indexing a document multiple times. The approach I thought of was to
use a field that is indexed, as well as stored in the index, which contains
the ids of all the users that can access the document. I could then use
boolean queries to search for documents accessible by a particular user. I
figured that I would have to delete and add the whole document again into
the system if a new user is to be given access to an already indexed
document(and I figure that this will happen frequently in the system). Is
there a better approach that I can take?

Thanks,
Murali




Re: searching portions of an index

Posted by Murali <mu...@gmail.com>.
On 12/22/05, Chris Hostetter <ho...@fucit.org> wrote:
>
>
> If the set of documents viewable by a given person is truely an arbitrary
> list of document identifiers stored in a DB somewhere, then build a Filter
> that knows how to access that list, and sets the bits only on the document
> identifiers in that list.


This is indeed the case, only that since the system is being written from
scratch, the DB itself doesnt exist now and I am free to make any design
choice on this. Also, when the system goes live, I'm expecting that both
documents and users to the system will be added continually.

Thanks,
Murali

Re: searching portions of an index

Posted by Chris Hostetter <ho...@fucit.org>.
: document might be accessible by different users. I want to implement this
: without indexing a document multiple times. The approach I thought of was to
: use a field that is indexed, as well as stored in the index, which contains
: the ids of all the users that can access the document. I could then use

The "Best" solution really depends on what determines if a user can view a
document.  If the set of documents visible to user "Bob" is based on
a set of document properties (ie: bob can view any documents in the
"reports" category that have an access level of "5" or less), then you can
store the properties bob is restricted to anywhere, and at search time
look them up and make a Filter out of them to apply to Bob's searches.
(ie: a ChainedFilter containing a simple TermFilter on the category field
and a RangeFilter on the level field)

If the set of documents viewable by a given person is truely an arbitrary
list of document identifiers stored in a DB somewhere, then build a Filter
that knows how to access that list, and sets the bits only on the document
identifiers in that list.

As always: accessing the stored fields of every document in your index
from a Filter is not a good idea -- make sure the identifier field is
indexed, and use the FieldCache to loop over all of the keys for all of
the docs


The key elements to any approach being:

  1) Don't store the lsit of users in the docs, that seems like a real
     waste.
  2) Use a Filter for each user (or if possible, group of users that have
     access permissions in common)
  3) Cache those Filters using something like CachingWrapperFilter (if
     your Index doesn't change very often, but you have lots of users, you
     may want a differnet caching approach that lets you expire Filters
     without closing the IndexReader.




-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org