You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by Paul Carey <pa...@gmail.com> on 2010/10/23 10:03:44 UTC

Modelling Access Control

Hi

My domain model is made of users that have access to projects which
are composed of items. I'm hoping to use Solr and would like to make
sure that searches only return results for items that users have
access to.

I've looked over some of the older posts on this mailing list about
access control and saw a suggestion along the lines of
acl:<user_id> AND (actual query).

While this obviously works, there are a couple of niggles. Every item
must have a list of valid user ids (typically less than 100 in my
case). Every time a collaborator is added to or removed from a
project, I need to update every item in that project. This will
typically be fewer than 1000 items, so I guess is no big deal.

I wondered if the following might be a reasonable alternative,
assuming the number of projects to which a user has access is lower
than a certain bound.
(acl:<project_id> OR acl:<project_id> OR ... ) AND (actual query)

When the numbers are small - e.g. each user has access to ~20 projects
and each project has ~20 collaborators - is one approach preferable
over another? And when outliers exist - e.g. a project with 2000
collaborators, or a user with access to 2000 projects - is one
approach more liable to fail than the other?

Many thanks

Paul

Re: Modelling Access Control

Posted by Israel Ekpo <is...@gmail.com>.

On Mon, Oct 25, 2010 at 8:16 AM, Paul Carey <pa...@gmail.com> wrote:

> Many thanks for all the responses. I now plan on benchmarking and
> validating both the filter query approach, and maintaining the ACL
> entirely outside of Solr. I'll decide from there.
>
> Paul
>


Great.

I am looking forward for some feedback on the benchmarks.
-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Re: Modelling Access Control

Posted by Dennis Gearon <ge...@sbcglobal.net>.

I'll also be interested in how that works for you. Bringing out the whole dataset not filtered for some kind of access control will mean that you will have then do the filtering of the result set in your server side/command line program.

So the speed comparison with the filter query vs the outside langauge environement will be very  interesting :-)

I will also do this, but in about 3-5 months. I will report it then.

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.

--- On Mon, 10/25/10, Paul Carey <pa...@gmail.com> wrote:

> From: Paul Carey <pa...@gmail.com>
> Subject: Re: Modelling Access Control
> To: solr-user@lucene.apache.org
> Date: Monday, October 25, 2010, 5:16 AM
> Many thanks for all the responses. I
> now plan on benchmarking and
> validating both the filter query approach, and maintaining
> the ACL
> entirely outside of Solr. I'll decide from there.
> 
> Paul
>

Re: Modelling Access Control

Posted by Paul Carey <pa...@gmail.com>.

Many thanks for all the responses. I now plan on benchmarking and
validating both the filter query approach, and maintaining the ACL
entirely outside of Solr. I'll decide from there.

Paul

Re: Modelling Access Control

Posted by Peter Sturge <pe...@gmail.com>.

Hi,

See SOLR-1872 for a way of providing access control, whilst placing
the ACL configuration itself outside of Solr, which is generally a
good idea.
   http://www.lucidimagination.com/search/out?u=http://issues.apache.org/jira/browse/SOLR-1872

There are a number of ways to approach Access Control, but you will
need to take a number of factors into account that aren't issues if
you're doing non-acl Solr queries.
You can use this patch to achieve authentication and authorization, or
use it as a template for similar techniques.

Peter



On Sat, Oct 23, 2010 at 9:03 AM, Paul Carey <pa...@gmail.com> wrote:
> Hi
>
> My domain model is made of users that have access to projects which
> are composed of items. I'm hoping to use Solr and would like to make
> sure that searches only return results for items that users have
> access to.
>
> I've looked over some of the older posts on this mailing list about
> access control and saw a suggestion along the lines of
> acl:<user_id> AND (actual query).
>
> While this obviously works, there are a couple of niggles. Every item
> must have a list of valid user ids (typically less than 100 in my
> case). Every time a collaborator is added to or removed from a
> project, I need to update every item in that project. This will
> typically be fewer than 1000 items, so I guess is no big deal.
>
> I wondered if the following might be a reasonable alternative,
> assuming the number of projects to which a user has access is lower
> than a certain bound.
> (acl:<project_id> OR acl:<project_id> OR ... ) AND (actual query)
>
> When the numbers are small - e.g. each user has access to ~20 projects
> and each project has ~20 collaborators - is one approach preferable
> over another? And when outliers exist - e.g. a project with 2000
> collaborators, or a user with access to 2000 projects - is one
> approach more liable to fail than the other?
>
> Many thanks
>
> Paul
>

Re: Modelling Access Control

Posted by Israel Ekpo <is...@gmail.com>.

Hi All,

I think using filter queries will be a good option to consider because of
the following reasons

* The filter query does not affect the score of the items in the result set.
If the ACL logic is part of the main query, it could influence the scores of
the items in the result set.

* Using a filter query could lead to better performance in complex queries
because the results from the query specified with fq are cached
independently from that of the main query. Since the result of a filter
query is cached, it will be used to filter the primary query result using
set intersection without having to fetch the ids of the documents from the
fq again a second time.

It think this will be useful because we could assume that the ACL portion in
the fq is relatively constant since the permissions for each user is not
something that is changing frequently.

http://wiki.apache.org/solr/FilterQueryGuidance


On Sat, Oct 23, 2010 at 2:58 PM, Dennis Gearon <ge...@sbcglobal.net>wrote:

> why use filter queries?
>
> Wouldn't reducing the set headed into the filters by putting it in the main
> query be faster? (A question to learn, since I do NOT know :-)
>
> Dennis Gearon
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make them
> yourself. from '
> http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
> EARTH has a Right To Life,
>  otherwise we all die.
>
>
> --- On Sat, 10/23/10, Israel Ekpo <is...@gmail.com> wrote:
>
> > From: Israel Ekpo <is...@gmail.com>
> > Subject: Re: Modelling Access Control
> > To: solr-user@lucene.apache.org
> > Date: Saturday, October 23, 2010, 7:01 AM
> > Hi Paul,
> >
> > Regardless of how you implement it, I would recommend you
> > use filter queries
> > for the permissions check rather than making it part of the
> > main query.
> >
> > On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey <pa...@gmail.com>
> > wrote:
> >
> > > Hi
> > >
> > > My domain model is made of users that have access to
> > projects which
> > > are composed of items. I'm hoping to use Solr and
> > would like to make
> > > sure that searches only return results for items that
> > users have
> > > access to.
> > >
> > > I've looked over some of the older posts on this
> > mailing list about
> > > access control and saw a suggestion along the lines
> > of
> > > acl:<user_id> AND (actual query).
> > >
> > > While this obviously works, there are a couple of
> > niggles. Every item
> > > must have a list of valid user ids (typically less
> > than 100 in my
> > > case). Every time a collaborator is added to or
> > removed from a
> > > project, I need to update every item in that project.
> > This will
> > > typically be fewer than 1000 items, so I guess is no
> > big deal.
> > >
> > > I wondered if the following might be a reasonable
> > alternative,
> > > assuming the number of projects to which a user has
> > access is lower
> > > than a certain bound.
> > > (acl:<project_id> OR acl:<project_id> OR
> > ... ) AND (actual query)
> > >
> > > When the numbers are small - e.g. each user has access
> > to ~20 projects
> > > and each project has ~20 collaborators - is one
> > approach preferable
> > > over another? And when outliers exist - e.g. a project
> > with 2000
> > > collaborators, or a user with access to 2000 projects
> > - is one
> > > approach more liable to fail than the other?
> > >
> > > Many thanks
> > >
> > > Paul
> > >
> >
> >
> >
> > --
> > °O°
> > "Good Enough" is not good enough.
> > To give anything less than your best is to sacrifice the
> > gift.
> > Quality First. Measure Twice. Cut Once.
> > http://www.israelekpo.com/
> >
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Re: Modelling Access Control

Posted by Otis Gospodnetic <ot...@gmail.com>.

Hi,

I think the best place to look is ManifoldCF.  Those guys have already
figured it all out, so I'm guessing what's done is ManifoldCF works well.

Have a look at http://search-lucene.com/?q=acl&fc_project=ManifoldCF

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html


On Thu, Nov 1, 2012 at 6:08 AM, hupadhyay <hu...@asite.com> wrote:

> Hello All,
>
> I am also trying to model ACL on solr search. Since in my case the data
> itself is very huge and user base is also too big. Putting ACL inside solr
> gives quite good response time, but ACL outside the solr seems to a
> nightmare.
>
> In case of ACL inside the solr puts heavy load on keeping solr index up to
> date, because adding a single user in the project with 30000 entities in it
> requires to update them all in solr index. And we have 500 approx user
> addition per day.
>
> Can any body please explain how to implement ACL outside the solr?
>
> one more thing, in my case *search should return in < 1sec*
>
> Thanks in advance
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Modelling-Access-Control-tp1756817p4017479.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Modelling Access Control

Posted by hupadhyay <hu...@asite.com>.

Hello All,

I am also trying to model ACL on solr search. Since in my case the data
itself is very huge and user base is also too big. Putting ACL inside solr
gives quite good response time, but ACL outside the solr seems to a
nightmare.

In case of ACL inside the solr puts heavy load on keeping solr index up to
date, because adding a single user in the project with 30000 entities in it
requires to update them all in solr index. And we have 500 approx user
addition per day.

Can any body please explain how to implement ACL outside the solr?

one more thing, in my case *search should return in < 1sec*

Thanks in advance



--
View this message in context: http://lucene.472066.n3.nabble.com/Modelling-Access-Control-tp1756817p4017479.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Modelling Access Control

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Ah haaa. I see now. :-) 

I didn't make that connection. Hopefully I would hbave before I ever tried to implement that :-)

Kind of like user names and icons on a windows login :-)

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/30/10, Erick Erickson <er...@gmail.com> wrote:

> From: Erick Erickson <er...@gmail.com>
> Subject: Re: Modelling Access Control
> To: solr-user@lucene.apache.org
> Date: Saturday, October 30, 2010, 6:01 PM
> If that's in response to Lance's
> comment, the answer is that if you return
> autosuggest possibilities you effectively allow users to
> see data they
> shouldn't. Imagine you have a field of the real names of
> spies. You only
> want the persons way high up in the security chain to
> access these names and
> you control that on a document level.
> 
> Allowing autocomplete on that field would be...er...very
> tough on your
> spies' health...
> 
> HTH
> Erick
> 
> On Tue, Oct 26, 2010 at 2:24 PM, Dennis Gearon <ge...@sbcglobal.net>wrote:
> 
> > "Son, don't touch that stove . . . .",
> >
> > "OUCH! Hey Dad, I BURNED my hand on that stove, why
> didn't you tell me
> > that?!?#! You know I need to know WHY, not just
> DON'T!"
> >
> > Dennis Gearon
> >
> > > Very important: do not make a spelling or
> autosuggest index
> > > from a
> > > text field which some people can see and other
> people
> > > can't.
> > >
> >
> >
>

Re: Modelling Access Control

Posted by Erick Erickson <er...@gmail.com>.

If that's in response to Lance's comment, the answer is that if you return
autosuggest possibilities you effectively allow users to see data they
shouldn't. Imagine you have a field of the real names of spies. You only
want the persons way high up in the security chain to access these names and
you control that on a document level.

Allowing autocomplete on that field would be...er...very tough on your
spies' health...

HTH
Erick

On Tue, Oct 26, 2010 at 2:24 PM, Dennis Gearon <ge...@sbcglobal.net>wrote:

> "Son, don't touch that stove . . . .",
>
> "OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me
> that?!?#! You know I need to know WHY, not just DON'T!"
>
> Dennis Gearon
>
> > Very important: do not make a spelling or autosuggest index
> > from a
> > text field which some people can see and other people
> > can't.
> >
>
>

Re: Modelling Access Control

Posted by Dennis Gearon <ge...@sbcglobal.net>.

"Son, don't touch that stove . . . .",

"OUCH! Hey Dad, I BURNED my hand on that stove, why didn't you tell me that?!?#! You know I need to know WHY, not just DON'T!"

Dennis Gearon

> Very important: do not make a spelling or autosuggest index
> from a
> text field which some people can see and other people
> can't.
>

Re: Modelling Access Control

Posted by Lance Norskog <go...@gmail.com>.

The idea of ACL-based queries is: each document carries all of the
groups or roles that it is ok with. Each user search includes all of
the groups or roles the user has.

The roles are stored as multivalued string fields. Each ACL-based
query passes in "roles:A OR roles:B OR roles:C" and if any of A,B,C
are in the stored ACL field, you have a match.

This is called "early binding". "Late binding" is when you return
everything and the app calls LDAP and say "can she see this? or
this?". This is slow and puts a monster load on the ACL server.

Very important: do not make a spelling or autosuggest index from a
text field which some people can see and other people can't.

On Tue, Oct 26, 2010 at 12:06 AM, Lance Norskog <go...@gmail.com> wrote:
> Filter queries are a set of bits which is ANDed against query results
> at a very early stage of query processing. They are very useful.  Note
> that they are stored (I think) in parsed query order, so you have to
> pass in the same filter query string each time.
>
> On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon <ge...@sbcglobal.net> wrote:
>> Thanks for that insight, a lot.
>>
>> Dennis Gearon
>>
>> Signature Warning
>> ----------------
>> It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>>
>> EARTH has a Right To Life,
>>  otherwise we all die.
>>
>>
>> --- On Mon, 10/25/10, Jonathan Rochkind <ro...@jhu.edu> wrote:
>>
>>> From: Jonathan Rochkind <ro...@jhu.edu>
>>> Subject: Re: Modelling Access Control
>>> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>>> Date: Monday, October 25, 2010, 8:19 AM
>>> Dennis Gearon wrote:
>>> > why use filter queries?
>>> >
>>> > Wouldn't reducing the set headed into the filters by
>>> putting it in the main query be faster? (A question to
>>> learn, since I do NOT know :-)
>>> >
>>> >
>>> No. At least as I understand it. In the best case, the
>>> filter query will be a lot faster, because filter queries
>>> are cached seperately in the filter cache.  So if the
>>> existing filter query can be found in the cache, it'll be a
>>> lot faster. If it's not in the cache, the performance should
>>> be pretty much the same as if you had included it as an
>>> additional clause in the main q query.
>>>
>>> The reasons to put it in a fq filter are:
>>>
>>> 1) The caching behavior. You can have that certain part of
>>> the query be cached on it's own, speeding up any subsequent
>>> queries that use that same fq.
>>>
>>> 2) Simplification of client code. You can leave your 'q'
>>> however you want it, using whatever kind of query parser you
>>> want too (dismax, whatever), and just add on the 'fq'
>>> without touching the 'q'.   This is a lot
>>> easier to do, and especially when you're using it for access
>>> control like this, a lot harder for a bug to creep in.
>>>
>>> Jonathan
>>>
>>>
>>>
>>
>
>
>
> --
> Lance Norskog
> goksron@gmail.com
>



-- 
Lance Norskog
goksron@gmail.com

Re: Modelling Access Control

Posted by Lance Norskog <go...@gmail.com>.

Filter queries are a set of bits which is ANDed against query results
at a very early stage of query processing. They are very useful.  Note
that they are stored (I think) in parsed query order, so you have to
pass in the same filter query string each time.

On Mon, Oct 25, 2010 at 8:59 AM, Dennis Gearon <ge...@sbcglobal.net> wrote:
> Thanks for that insight, a lot.
>
> Dennis Gearon
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
> EARTH has a Right To Life,
>  otherwise we all die.
>
>
> --- On Mon, 10/25/10, Jonathan Rochkind <ro...@jhu.edu> wrote:
>
>> From: Jonathan Rochkind <ro...@jhu.edu>
>> Subject: Re: Modelling Access Control
>> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
>> Date: Monday, October 25, 2010, 8:19 AM
>> Dennis Gearon wrote:
>> > why use filter queries?
>> >
>> > Wouldn't reducing the set headed into the filters by
>> putting it in the main query be faster? (A question to
>> learn, since I do NOT know :-)
>> >
>> >
>> No. At least as I understand it. In the best case, the
>> filter query will be a lot faster, because filter queries
>> are cached seperately in the filter cache.  So if the
>> existing filter query can be found in the cache, it'll be a
>> lot faster. If it's not in the cache, the performance should
>> be pretty much the same as if you had included it as an
>> additional clause in the main q query.
>>
>> The reasons to put it in a fq filter are:
>>
>> 1) The caching behavior. You can have that certain part of
>> the query be cached on it's own, speeding up any subsequent
>> queries that use that same fq.
>>
>> 2) Simplification of client code. You can leave your 'q'
>> however you want it, using whatever kind of query parser you
>> want too (dismax, whatever), and just add on the 'fq'
>> without touching the 'q'.   This is a lot
>> easier to do, and especially when you're using it for access
>> control like this, a lot harder for a bug to creep in.
>>
>> Jonathan
>>
>>
>>
>



-- 
Lance Norskog
goksron@gmail.com

Re: Modelling Access Control

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Thanks for that insight, a lot.

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Mon, 10/25/10, Jonathan Rochkind <ro...@jhu.edu> wrote:

> From: Jonathan Rochkind <ro...@jhu.edu>
> Subject: Re: Modelling Access Control
> To: "solr-user@lucene.apache.org" <so...@lucene.apache.org>
> Date: Monday, October 25, 2010, 8:19 AM
> Dennis Gearon wrote:
> > why use filter queries?
> > 
> > Wouldn't reducing the set headed into the filters by
> putting it in the main query be faster? (A question to
> learn, since I do NOT know :-)
> > 
> >   
> No. At least as I understand it. In the best case, the
> filter query will be a lot faster, because filter queries
> are cached seperately in the filter cache.  So if the
> existing filter query can be found in the cache, it'll be a
> lot faster. If it's not in the cache, the performance should
> be pretty much the same as if you had included it as an
> additional clause in the main q query.
> 
> The reasons to put it in a fq filter are:
> 
> 1) The caching behavior. You can have that certain part of
> the query be cached on it's own, speeding up any subsequent
> queries that use that same fq.
> 
> 2) Simplification of client code. You can leave your 'q'
> however you want it, using whatever kind of query parser you
> want too (dismax, whatever), and just add on the 'fq'
> without touching the 'q'.   This is a lot
> easier to do, and especially when you're using it for access
> control like this, a lot harder for a bug to creep in.
> 
> Jonathan
> 
> 
>

Re: Modelling Access Control

Posted by Jonathan Rochkind <ro...@jhu.edu>.

Dennis Gearon wrote:
> why use filter queries?
>
> Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-)
>
>   
No. At least as I understand it. In the best case, the filter query will 
be a lot faster, because filter queries are cached seperately in the 
filter cache.  So if the existing filter query can be found in the 
cache, it'll be a lot faster. If it's not in the cache, the performance 
should be pretty much the same as if you had included it as an 
additional clause in the main q query.

The reasons to put it in a fq filter are:

1) The caching behavior. You can have that certain part of the query be 
cached on it's own, speeding up any subsequent queries that use that 
same fq.

2) Simplification of client code. You can leave your 'q' however you 
want it, using whatever kind of query parser you want too (dismax, 
whatever), and just add on the 'fq' without touching the 'q'.   This is 
a lot easier to do, and especially when you're using it for access 
control like this, a lot harder for a bug to creep in.

Jonathan

Re: Modelling Access Control

Posted by Dennis Gearon <ge...@sbcglobal.net>.

why use filter queries?

Wouldn't reducing the set headed into the filters by putting it in the main query be faster? (A question to learn, since I do NOT know :-)

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/23/10, Israel Ekpo <is...@gmail.com> wrote:

> From: Israel Ekpo <is...@gmail.com>
> Subject: Re: Modelling Access Control
> To: solr-user@lucene.apache.org
> Date: Saturday, October 23, 2010, 7:01 AM
> Hi Paul,
> 
> Regardless of how you implement it, I would recommend you
> use filter queries
> for the permissions check rather than making it part of the
> main query.
> 
> On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey <pa...@gmail.com>
> wrote:
> 
> > Hi
> >
> > My domain model is made of users that have access to
> projects which
> > are composed of items. I'm hoping to use Solr and
> would like to make
> > sure that searches only return results for items that
> users have
> > access to.
> >
> > I've looked over some of the older posts on this
> mailing list about
> > access control and saw a suggestion along the lines
> of
> > acl:<user_id> AND (actual query).
> >
> > While this obviously works, there are a couple of
> niggles. Every item
> > must have a list of valid user ids (typically less
> than 100 in my
> > case). Every time a collaborator is added to or
> removed from a
> > project, I need to update every item in that project.
> This will
> > typically be fewer than 1000 items, so I guess is no
> big deal.
> >
> > I wondered if the following might be a reasonable
> alternative,
> > assuming the number of projects to which a user has
> access is lower
> > than a certain bound.
> > (acl:<project_id> OR acl:<project_id> OR
> ... ) AND (actual query)
> >
> > When the numbers are small - e.g. each user has access
> to ~20 projects
> > and each project has ~20 collaborators - is one
> approach preferable
> > over another? And when outliers exist - e.g. a project
> with 2000
> > collaborators, or a user with access to 2000 projects
> - is one
> > approach more liable to fail than the other?
> >
> > Many thanks
> >
> > Paul
> >
> 
> 
> 
> -- 
> °O°
> "Good Enough" is not good enough.
> To give anything less than your best is to sacrifice the
> gift.
> Quality First. Measure Twice. Cut Once.
> http://www.israelekpo.com/
>

Re: Modelling Access Control

Posted by Israel Ekpo <is...@gmail.com>.

Hi Paul,

Regardless of how you implement it, I would recommend you use filter queries
for the permissions check rather than making it part of the main query.

On Sat, Oct 23, 2010 at 4:03 AM, Paul Carey <pa...@gmail.com> wrote:

> Hi
>
> My domain model is made of users that have access to projects which
> are composed of items. I'm hoping to use Solr and would like to make
> sure that searches only return results for items that users have
> access to.
>
> I've looked over some of the older posts on this mailing list about
> access control and saw a suggestion along the lines of
> acl:<user_id> AND (actual query).
>
> While this obviously works, there are a couple of niggles. Every item
> must have a list of valid user ids (typically less than 100 in my
> case). Every time a collaborator is added to or removed from a
> project, I need to update every item in that project. This will
> typically be fewer than 1000 items, so I guess is no big deal.
>
> I wondered if the following might be a reasonable alternative,
> assuming the number of projects to which a user has access is lower
> than a certain bound.
> (acl:<project_id> OR acl:<project_id> OR ... ) AND (actual query)
>
> When the numbers are small - e.g. each user has access to ~20 projects
> and each project has ~20 collaborators - is one approach preferable
> over another? And when outliers exist - e.g. a project with 2000
> collaborators, or a user with access to 2000 projects - is one
> approach more liable to fail than the other?
>
> Many thanks
>
> Paul
>



-- 
°O°
"Good Enough" is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.
http://www.israelekpo.com/

Re: Modelling Access Control

Posted by Savvas-Andreas Moysidis <sa...@googlemail.com>.

Pushing ACL logic outside Solr sounds like a prudent choice indeed as in, my
opinion, all of the business rules/conceptual logic should reside only
within the code boundaries. This way your domain will be easier to model and
your code to read, understand and maintain.

More information on Filter Queries, when they should be used and how they
affect performance can be found here:
http://wiki.apache.org/solr/FilterQueryGuidance

On 23 October 2010 20:00, Dennis Gearon <ge...@sbcglobal.net> wrote:

> Forgot to add,
> 3/ The external, application code selects the GROUPS that the user has
> permission to read (Solr will only serve up what is to be read?) then search
> on those groups.
>
>
> Dennis Gearon
>
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes. It is usually a
> better idea to learn from others’ mistakes, so you do not have to make them
> yourself. from '
> http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
>
> EARTH has a Right To Life,
>  otherwise we all die.
>
>
> --- On Sat, 10/23/10, Dennis Gearon <ge...@sbcglobal.net> wrote:
>
> > From: Dennis Gearon <ge...@sbcglobal.net>
> > Subject: Re: Modelling Access Control
> > To: solr-user@lucene.apache.org
> > Date: Saturday, October 23, 2010, 11:49 AM
> > Two things will lessen the solr
> > admininstrative load :
> >
> > 1/ Follow examples of databases and *nix OSs. Give each
> > user their own group, or set up groups that don't have
> > regular users as OWNERS, but can have users assigned to the
> > group to give them particular permissions. I.E. Roles, like
> > publishers, reviewers, friends, etc.
> >
> > 2/ Put your ACL outside of Solr, using your
> > server-side/command line language's object oriented
> > properties. Force all searches to come from a single
> > location in code (not sure how to do that), and make the
> > piece of code check authentication and authorization.
> >
> > This is what my research shows how others do it, and how I
> > plan to do it. ANY insight others have on this, I really
> > want to hear.
> >
> > Dennis Gearon
> >
> > Signature Warning
> > ----------------
> > It is always a good idea to learn from your own mistakes.
> > It is usually a better idea to learn from others’
> > mistakes, so you do not have to make them yourself. from '
> http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> >
> > EARTH has a Right To Life,
> >   otherwise we all die.
> >
> >
> > --- On Sat, 10/23/10, Paul Carey <pa...@gmail.com>
> > wrote:
> >
> > > From: Paul Carey <pa...@gmail.com>
> > > Subject: Modelling Access Control
> > > To: solr-user@lucene.apache.org
> > > Date: Saturday, October 23, 2010, 1:03 AM
> > > Hi
> > >
> > > My domain model is made of users that have access to
> > > projects which
> > > are composed of items. I'm hoping to use Solr and
> > would
> > > like to make
> > > sure that searches only return results for items that
> > users
> > > have
> > > access to.
> > >
> > > I've looked over some of the older posts on this
> > mailing
> > > list about
> > > access control and saw a suggestion along the lines
> > of
> > > acl:<user_id> AND (actual query).
> > >
> > > While this obviously works, there are a couple of
> > niggles.
> > > Every item
> > > must have a list of valid user ids (typically less
> > than 100
> > > in my
> > > case). Every time a collaborator is added to or
> > removed
> > > from a
> > > project, I need to update every item in that project.
> > This
> > > will
> > > typically be fewer than 1000 items, so I guess is no
> > big
> > > deal.
> > >
> > > I wondered if the following might be a reasonable
> > > alternative,
> > > assuming the number of projects to which a user has
> > access
> > > is lower
> > > than a certain bound.
> > > (acl:<project_id> OR acl:<project_id> OR
> > ... )
> > > AND (actual query)
> > >
> > > When the numbers are small - e.g. each user has access
> > to
> > > ~20 projects
> > > and each project has ~20 collaborators - is one
> > approach
> > > preferable
> > > over another? And when outliers exist - e.g. a project
> > with
> > > 2000
> > > collaborators, or a user with access to 2000 projects
> > - is
> > > one
> > > approach more liable to fail than the other?
> > >
> > > Many thanks
> > >
> > > Paul
> > >
> >
>

Re: Modelling Access Control

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Forgot to add,
3/ The external, application code selects the GROUPS that the user has permission to read (Solr will only serve up what is to be read?) then search on those groups.


Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/23/10, Dennis Gearon <ge...@sbcglobal.net> wrote:

> From: Dennis Gearon <ge...@sbcglobal.net>
> Subject: Re: Modelling Access Control
> To: solr-user@lucene.apache.org
> Date: Saturday, October 23, 2010, 11:49 AM
> Two things will lessen the solr
> admininstrative load :
> 
> 1/ Follow examples of databases and *nix OSs. Give each
> user their own group, or set up groups that don't have
> regular users as OWNERS, but can have users assigned to the
> group to give them particular permissions. I.E. Roles, like
> publishers, reviewers, friends, etc.
> 
> 2/ Put your ACL outside of Solr, using your
> server-side/command line language's object oriented
> properties. Force all searches to come from a single
> location in code (not sure how to do that), and make the
> piece of code check authentication and authorization.
> 
> This is what my research shows how others do it, and how I
> plan to do it. ANY insight others have on this, I really
> want to hear.
> 
> Dennis Gearon
> 
> Signature Warning
> ----------------
> It is always a good idea to learn from your own mistakes.
> It is usually a better idea to learn from others’
> mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'
> 
> EARTH has a Right To Life,
>   otherwise we all die.
> 
> 
> --- On Sat, 10/23/10, Paul Carey <pa...@gmail.com>
> wrote:
> 
> > From: Paul Carey <pa...@gmail.com>
> > Subject: Modelling Access Control
> > To: solr-user@lucene.apache.org
> > Date: Saturday, October 23, 2010, 1:03 AM
> > Hi
> > 
> > My domain model is made of users that have access to
> > projects which
> > are composed of items. I'm hoping to use Solr and
> would
> > like to make
> > sure that searches only return results for items that
> users
> > have
> > access to.
> > 
> > I've looked over some of the older posts on this
> mailing
> > list about
> > access control and saw a suggestion along the lines
> of
> > acl:<user_id> AND (actual query).
> > 
> > While this obviously works, there are a couple of
> niggles.
> > Every item
> > must have a list of valid user ids (typically less
> than 100
> > in my
> > case). Every time a collaborator is added to or
> removed
> > from a
> > project, I need to update every item in that project.
> This
> > will
> > typically be fewer than 1000 items, so I guess is no
> big
> > deal.
> > 
> > I wondered if the following might be a reasonable
> > alternative,
> > assuming the number of projects to which a user has
> access
> > is lower
> > than a certain bound.
> > (acl:<project_id> OR acl:<project_id> OR
> ... )
> > AND (actual query)
> > 
> > When the numbers are small - e.g. each user has access
> to
> > ~20 projects
> > and each project has ~20 collaborators - is one
> approach
> > preferable
> > over another? And when outliers exist - e.g. a project
> with
> > 2000
> > collaborators, or a user with access to 2000 projects
> - is
> > one
> > approach more liable to fail than the other?
> > 
> > Many thanks
> > 
> > Paul
> >
>

Re: Modelling Access Control

Posted by Dennis Gearon <ge...@sbcglobal.net>.

Two things will lessen the solr admininstrative load :

1/ Follow examples of databases and *nix OSs. Give each user their own group, or set up groups that don't have regular users as OWNERS, but can have users assigned to the group to give them particular permissions. I.E. Roles, like publishers, reviewers, friends, etc.

2/ Put your ACL outside of Solr, using your server-side/command line language's object oriented properties. Force all searches to come from a single location in code (not sure how to do that), and make the piece of code check authentication and authorization.

This is what my research shows how others do it, and how I plan to do it. ANY insight others have on this, I really want to hear.

Dennis Gearon

Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'

EARTH has a Right To Life,
  otherwise we all die.


--- On Sat, 10/23/10, Paul Carey <pa...@gmail.com> wrote:

> From: Paul Carey <pa...@gmail.com>
> Subject: Modelling Access Control
> To: solr-user@lucene.apache.org
> Date: Saturday, October 23, 2010, 1:03 AM
> Hi
> 
> My domain model is made of users that have access to
> projects which
> are composed of items. I'm hoping to use Solr and would
> like to make
> sure that searches only return results for items that users
> have
> access to.
> 
> I've looked over some of the older posts on this mailing
> list about
> access control and saw a suggestion along the lines of
> acl:<user_id> AND (actual query).
> 
> While this obviously works, there are a couple of niggles.
> Every item
> must have a list of valid user ids (typically less than 100
> in my
> case). Every time a collaborator is added to or removed
> from a
> project, I need to update every item in that project. This
> will
> typically be fewer than 1000 items, so I guess is no big
> deal.
> 
> I wondered if the following might be a reasonable
> alternative,
> assuming the number of projects to which a user has access
> is lower
> than a certain bound.
> (acl:<project_id> OR acl:<project_id> OR ... )
> AND (actual query)
> 
> When the numbers are small - e.g. each user has access to
> ~20 projects
> and each project has ~20 collaborators - is one approach
> preferable
> over another? And when outliers exist - e.g. a project with
> 2000
> collaborators, or a user with access to 2000 projects - is
> one
> approach more liable to fail than the other?
> 
> Many thanks
> 
> Paul
>