You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kchellappa <ka...@gmail.com> on 2013/11/15 00:02:33 UTC

Document Security Model Question

I had earlier posted a similar discussion in LinkedIn and David Smiley
rightly advised me that solr-user is a better place for technical
discussions

----------------------------------

Our product which is hosted supports searching on educational resources. Our
customers can choose to make specific resources unavailable for their users
and also it depends on licensing. Our current solution uses full text search
support in the database and handles availability as part of sql .

My task is to move the search from the database full text search into Solr.
I searched through posts and found some that were kind of related and I am
thinking along the following lines

  a)  Use the authorization model.   I can add fields like allow and/or deny
in the index which contain the list of customers.  At query time, I can add
the constraint based on the customer Id.  I am concerned about the
performance if there are lot of values for these fields and also it requires
constant reindexing if a value in this field changes
 b) Use Query-time Join.  
     Have the resource to availability for customer in separate inner
documents.
     We are planning to deploy in SolrCloud.  I have read some challenges
about Query-time join and SolrCloud. So this may not work for us.

c) Other ideas?
 
Excerpts from David Smiley's response

You're right that there may be some re-indexing as security rules change. If
many Lucene/Solr documents share identical access control with other
documents, then it may make more sense to externally determine which unique
set of access-control sets the user has access to, then finally search by id
-- which will hopefully not be a huge number. I've seen this done both
externally and with a Solr core to join on.






--
View this message in context: http://lucene.472066.n3.nabble.com/Document-Security-Model-Question-tp4101078.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Document Security Model Question

Posted by kchellappa <ka...@gmail.com>.
Thanks Rajinimaski for the reposnse.

Agree that if the changes are frequent, then first option wouldn't work
efficiently.  Also the other challenge is that in our case for each
resource, it is easy/efficient to get a list of changes since last
checkpoint (because of our model of deployment of customer databases) rather
than getting a snapshot of allowed/disallowed across all customers for each
resource.


In your PostFilter implementation, do you cache the acls in memory, then
they get updated periodically externally to solr and the post filter just
uses the cache or something along these lines?





--
View this message in context: http://lucene.472066.n3.nabble.com/Document-Security-Model-Question-tp4101078p4102664.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Document Security Model Question

Posted by Rajani Maski <ra...@gmail.com>.
Hi,

For the case: *"it requires *constant reindexing if a value in this field
changes"
 If the acl for documents keep changing, Solr PostFilter is one of the
option. We use it in our system. We have almost near to billion documents
and 5000 approx users.


But it is important to check whether the acl changes are frequent and
decide solution based on that. The first option in your list works
efficiently without effecting search performance. In case the value changes
are less frequent then re-indexing of only those documents should not be
the concern.  But then, If changes are frequent, Post filter can be used
and will add some amount of delay.


Thanks












On Fri, Nov 15, 2013 at 4:32 AM, kchellappa <ka...@gmail.com>wrote:

> I had earlier posted a similar discussion in LinkedIn and David Smiley
> rightly advised me that solr-user is a better place for technical
> discussions
>
> ----------------------------------
>
> Our product which is hosted supports searching on educational resources.
> Our
> customers can choose to make specific resources unavailable for their users
> and also it depends on licensing. Our current solution uses full text
> search
> support in the database and handles availability as part of sql .
>
> My task is to move the search from the database full text search into Solr.
> I searched through posts and found some that were kind of related and I am
> thinking along the following lines
>
>   a)  Use the authorization model.   I can add fields like allow and/or
> deny
> in the index which contain the list of customers.  At query time, I can add
> the constraint based on the customer Id.  I am concerned about the
> performance if there are lot of values for these fields and also it
> requires
> constant reindexing if a value in this field changes
>  b) Use Query-time Join.
>      Have the resource to availability for customer in separate inner
> documents.
>      We are planning to deploy in SolrCloud.  I have read some challenges
> about Query-time join and SolrCloud. So this may not work for us.
>
> c) Other ideas?
>
> Excerpts from David Smiley's response
>
> You're right that there may be some re-indexing as security rules change.
> If
> many Lucene/Solr documents share identical access control with other
> documents, then it may make more sense to externally determine which unique
> set of access-control sets the user has access to, then finally search by
> id
> -- which will hopefully not be a huge number. I've seen this done both
> externally and with a Solr core to join on.
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Document-Security-Model-Question-tp4101078.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>