You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Liam O'Boyle <li...@intelligencebank.com> on 2011/04/13 03:52:35 UTC

Re: Solr and Permissions

ManifoldCF sounds like it might be the right solution, so long as it's
not secretly building a filter query in the back end, otherwise it
will hit the same limits.

In the meantime, I have made a minor improvement to my filter query;
it now scans the permitted IDs and attempts to build a filter query
using ranges (e.g. instead of 1 OR 2 OR 3 it will filter using [1 TO
3]) which will hopefully keep me going in the meantime.

Liam

On 12 March 2011 01:46, go canal <go...@yahoo.com> wrote:
> Thank you Jan, I will take a look at the MainfoldCF.
> So it seems that the solution is basically to implement something outside of
> Solr for permission control.
> thanks,
> canal
>
>
>
>
> ________________________________
> From: Jan Høydahl <ja...@cominvent.com>
> To: solr-user@lucene.apache.org
> Sent: Fri, March 11, 2011 4:17:22 PM
> Subject: Re: Solr and Permissions
>
> Hi,
>
> Talk to the ManifoldCF guys - they have successfully implemented support for
> document level security for many repositories including CMC/ECMs and may have
> some hints for you to write your own Authority connector against your system,
> which will fetch the ACL for the document and index it with the document itself.
> This eliminates long query-time filters.
>
> Re-indexing content for which ACLs have changed is a very common way of doing
> this, and you should not worry too much about performance implications before
> there is a real issue. In real world, you don't change folder permissions very
> often, and that will be a cost you'll have to live with. If you worry that this
> lag between repository state and index state may cause people to see content
> they are not entitled to, it is possible to do late binding filtering of the
> result set as well, but I would avoid that if possible.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> On 11. mars 2011, at 06.48, go canal wrote:
>
>> To be fair, I think there is a slight difference between a Content Management
>> and a Search Engine.
>>
>> Access control at per document level, per type level, supporting dynamic role
>> changes, etc.are more like  content management use cases; where search solution
>>
>> like Solr focuses on different set of use cases;
>>
>> But in real world, any content management systems need full text search; so the
>>
>> question is to how to support search with permission control.
>>
>> JackRabbit integrated with Lucene/Tika, this could be one solution but I do not
>>
>> know its performance and scalability;
>>
>> CouchDB also integrates with Lucene/Tika, another option?
>>
>> I have yet to see a Search Engine that provides some sort of Content Management
>>
>> features like we are discussing here (Solr, Elastic Search ?)
>>
>>
>> Then the last option is probably to build an application that works with a
>> document repository with all necessary content management features and Solr
>> which provides search capability;  and handling the permissions outside Solr?
>> thanks,
>> canal
>>
>>
>>
>>
>> ________________________________
>> From: Liam O'Boyle <li...@intelligencebank.com>
>> To: solr-user@lucene.apache.org
>> Cc: go canal <go...@yahoo.com>
>> Sent: Fri, March 11, 2011 2:28:19 PM
>> Subject: Re: Solr and Permissions
>>
>> As Canal points out,  grouping into types is not always possible.
>>
>> In our case, permissions are not on a per-type level, but either on a per
>> "folder" (of which there can be hundreds) or per item in some cases (of
>> which there can be... any number at all).
>>
>> Reindexing is also to slow to really be an option; some of the items use
>> Tika to extract content, which means that we need to reextract the content
>> (variable length of time; average is about half a second, but on some
>> documents it will sit there until the connection times out) .  Querying it,
>> modifying then resubmitting without rerunning content extraction is still
>> faster, but involves sending even more data over the network; either way is
>> relatively slow.
>>
>> Liam
>>
>> On 11 March 2011 16:24, go canal <go...@yahoo.com> wrote:
>>
>>> I have similar requirements.
>>>
>>> Content type is one solution; but there are also other use cases where this
>>> not
>>> enough.
>>>
>>> Another requirement is, when the access permission is changed, we need to
>>> update
>>> the field - my understanding is we can not unless re-index the whole
>>> document
>>> again. Am I correct?
>>> thanks,
>>> canal
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Sujit Pal <su...@comcast.net>
>>> To: solr-user@lucene.apache.org
>>> Sent: Fri, March 11, 2011 10:39:27 AM
>>> Subject: Re: Solr and Permissions
>>>
>>> How about assigning content types to documents in the index, and map
>>> users to a set of content types they are allowed to access? That way you
>>> will pass in fewer parameters in the fq.
>>>
>>> -sujit
>>>
>>> On Fri, 2011-03-11 at 11:53 +1100, Liam O'Boyle wrote:
>>>> Morning,
>>>>
>>>> We use solr to index a range of content to which, within our application,
>>>> access is restricted by a system of user groups and permissions.  In
>>> order
>>>> to ensure that search results don't reveal information about items which
>>> the
>>>> user doesn't have access to, we need to somehow filter the results; this
>>>> needs to be done within Solr itself, rather than after retrieval, so that
>>>> the facet and result counts are correct.
>>>>
>>>> Currently we do this by creating a filter query which specifies all of
>>> the
>>>> items which may be allowed to match (e.g. id: (foo OR bar OR blarg OR
>>> ...)),
>>>> but this has definite scalability issues - we're starting to run into
>>>> issues, as this can be a set of ORs of potentially unlimited size (and
>>>> practically, we're hitting the low thousands sometimes).  While we can
>>>> adjust maxBooleanClauses upwards, I understand that this has performance
>>>> implications...
>>>>
>>>> So, has anyone had to implement something similar in the past?  Any
>>>> suggestions for a more scalable approach?  Any advice on safe and
>>> sensible
>>>> limits on how far I can push maxBooleanClauses?
>>>>
>>>> Thanks for your advice,
>>>>
>>>> Liam
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Liam O'Boyle
>>
>> IntelligenceBank Pty Ltd
>> Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
>> P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44
>>
>> *Awarded 2010 "Best New Business" and "Business of the Year" - Business3000
>> Awards*
>>
>> This email and any attachments are confidential and may contain legally
>> privileged information or copyright material. If you are not an intended
>> recipient, please contact us at once by return email and then delete both
>> messages. We do not accept liability in connection with transmission of
>> information using the internet.
>>
>>
>>
>
>
>



-- 
Liam O'Boyle

IntelligenceBank Pty Ltd
Level 1, 31 Coventry Street Southbank, Victoria 3006, Australia
P:   +613 8618 7810   F:   +613 8618 7899   M: +61 403 88 66 44

Awarded 2010 "Best New Business" and "Business of the Year" -
Business3000 Awards

This email and any attachments are confidential and may contain
legally privileged information or copyright material. If you are not
an intended recipient, please contact us at once by return email and
then delete both messages. We do not accept liability in connection
with transmission of information using the internet.