You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by John Bickerstaff <jo...@johnbickerstaff.com> on 2016/10/18 21:00:33 UTC

Public/Private data in Solr :: Metadata or ?

I have a question that I suspect I'll need to answer very soon in my
current position.

How (or is it even wise) to "segregate data" in Solr so that some data can
be seen by some users and some data not be seen?

Taking the case of "public / private" as a (hopefully) simple, binary
example...

Let's imagine I have a data set that can be seen by a user.  Some of that
data can be seen ONLY by the user (this would be the private data) and some
of it can be seen by others (assume the user gave permission for this in
some way)

What is a best practice for handling this type of situation?  I can see
putting metadata in Solr of course, but the instant I do that, I create the
obligation to keep it updated (Document-level CRUD?) and I start using Solr
more like a DB than a search engine.

(Assume the user can change this public/private setting on any one piece of
"their" data at any time).

Of course, I can also see some kind of post-results massaging of data to
remove private data based on ID's which are stored in a database or similar
datastore...

How have others solved this and is there a consensus on whether to keep it
out of Solr, or how best to handle it in Solr?

Are there clever implementations of "secondary" collections in Solr for
this purpose?

Any advice / hard-won experience is greatly appreciated...

Re: Public/Private data in Solr :: Metadata or ?

Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
Thanks Erick - also very helpful.

On Wed, Oct 19, 2016 at 1:24 PM, Erick Erickson <er...@gmail.com>
wrote:

> And for hairy ACL processing, consider a post-filter. It's custom code
> that only evaluates a document _after_ it has made it through the
> primary query and any "lower cost" filters. See:
> http://yonik.com/advanced-filter-caching-in-solr/.
>
> NOTE: this isn't the thing I would do first, it's much more efficient
> to implement some of the suggestions above. Any time you can trade off
> index-time work for query-time work, it's almost always better to do
> the work up-front during queries....
>
> Best,
> Erick
>
> On Wed, Oct 19, 2016 at 12:07 PM, John Bickerstaff
> <jo...@johnbickerstaff.com> wrote:
> > Thank you both!  Very helpful.
> >
> > On Wed, Oct 19, 2016 at 8:48 AM, Shawn Heisey <ap...@elyograg.org>
> wrote:
> >
> >> On 10/18/2016 3:00 PM, John Bickerstaff wrote:
> >> > How (or is it even wise) to "segregate data" in Solr so that some data
> >> > can be seen by some users and some data not be seen?
> >>
> >> IMHO, security like this isn't really Solr's job ... but with the right
> >> data in the index, the system that DOES handle the security can include
> >> a filter with each user's query to restrict them to only the data they
> >> are allowed to see.  There are many ways to put data in the index for
> >> efficient use by a filter.  The simplest would be a boolean field with a
> >> name like isPublic or isPrivate, where true and false are mapped as
> >> necessary to public and private.
> >>
> >> Naturally, the users must not be able to reach Solr directly ... they
> >> must be restricted to the software that connects to Solr on their
> >> behalf.  Blocking end users from direct network access to Solr is a good
> >> idea even if there are no other security needs.
> >>
> >> There are more comprehensive solutions available, as you will notice
> >> from other replies, but the idea of simple filtering, controlled by your
> >> application, should work.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>

Re: Public/Private data in Solr :: Metadata or ?

Posted by Erick Erickson <er...@gmail.com>.
And for hairy ACL processing, consider a post-filter. It's custom code
that only evaluates a document _after_ it has made it through the
primary query and any "lower cost" filters. See:
http://yonik.com/advanced-filter-caching-in-solr/.

NOTE: this isn't the thing I would do first, it's much more efficient
to implement some of the suggestions above. Any time you can trade off
index-time work for query-time work, it's almost always better to do
the work up-front during queries....

Best,
Erick

On Wed, Oct 19, 2016 at 12:07 PM, John Bickerstaff
<jo...@johnbickerstaff.com> wrote:
> Thank you both!  Very helpful.
>
> On Wed, Oct 19, 2016 at 8:48 AM, Shawn Heisey <ap...@elyograg.org> wrote:
>
>> On 10/18/2016 3:00 PM, John Bickerstaff wrote:
>> > How (or is it even wise) to "segregate data" in Solr so that some data
>> > can be seen by some users and some data not be seen?
>>
>> IMHO, security like this isn't really Solr's job ... but with the right
>> data in the index, the system that DOES handle the security can include
>> a filter with each user's query to restrict them to only the data they
>> are allowed to see.  There are many ways to put data in the index for
>> efficient use by a filter.  The simplest would be a boolean field with a
>> name like isPublic or isPrivate, where true and false are mapped as
>> necessary to public and private.
>>
>> Naturally, the users must not be able to reach Solr directly ... they
>> must be restricted to the software that connects to Solr on their
>> behalf.  Blocking end users from direct network access to Solr is a good
>> idea even if there are no other security needs.
>>
>> There are more comprehensive solutions available, as you will notice
>> from other replies, but the idea of simple filtering, controlled by your
>> application, should work.
>>
>> Thanks,
>> Shawn
>>
>>

Re: Public/Private data in Solr :: Metadata or ?

Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
Thank you both!  Very helpful.

On Wed, Oct 19, 2016 at 8:48 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/18/2016 3:00 PM, John Bickerstaff wrote:
> > How (or is it even wise) to "segregate data" in Solr so that some data
> > can be seen by some users and some data not be seen?
>
> IMHO, security like this isn't really Solr's job ... but with the right
> data in the index, the system that DOES handle the security can include
> a filter with each user's query to restrict them to only the data they
> are allowed to see.  There are many ways to put data in the index for
> efficient use by a filter.  The simplest would be a boolean field with a
> name like isPublic or isPrivate, where true and false are mapped as
> necessary to public and private.
>
> Naturally, the users must not be able to reach Solr directly ... they
> must be restricted to the software that connects to Solr on their
> behalf.  Blocking end users from direct network access to Solr is a good
> idea even if there are no other security needs.
>
> There are more comprehensive solutions available, as you will notice
> from other replies, but the idea of simple filtering, controlled by your
> application, should work.
>
> Thanks,
> Shawn
>
>

Re: Public/Private data in Solr :: Metadata or ?

Posted by Hrishikesh Gadre <ga...@gmail.com>.
As part of Cloudera Search, we have integrated with Apache Sentry for
document level authorization. Currently we are using custom search
component to implement filtering. Please refer to this blog post for
details,
http://blog.cloudera.com/blog/2014/07/new-in-cdh-5-1-document-level-security-for-cloudera-search/

I am currently working on a Sentry based plugin implementation which can be
hooked in the Solr authorization framework. Currently Solr authorization
framework doesn't implement document level security. I filed SOLR-9578
<https://issues.apache.org/jira/browse/SOLR-9578> to add the relevant doc
level security support in Solr.

The main drawback of custom search component based mechanism is that it
requires a special solrconfig.xml file (which is using these custom search
components). On the other hand, once Solr provides hooks to implement doc
level security as part of authorization framework, then this restriction
will go away.

If you have any ideas (or concerns) with this feature, please feel free to
comment on the jira.

Thanks
Hrishikesh

On Wed, Oct 19, 2016 at 7:48 AM, Shawn Heisey <ap...@elyograg.org> wrote:

> On 10/18/2016 3:00 PM, John Bickerstaff wrote:
> > How (or is it even wise) to "segregate data" in Solr so that some data
> > can be seen by some users and some data not be seen?
>
> IMHO, security like this isn't really Solr's job ... but with the right
> data in the index, the system that DOES handle the security can include
> a filter with each user's query to restrict them to only the data they
> are allowed to see.  There are many ways to put data in the index for
> efficient use by a filter.  The simplest would be a boolean field with a
> name like isPublic or isPrivate, where true and false are mapped as
> necessary to public and private.
>
> Naturally, the users must not be able to reach Solr directly ... they
> must be restricted to the software that connects to Solr on their
> behalf.  Blocking end users from direct network access to Solr is a good
> idea even if there are no other security needs.
>
> There are more comprehensive solutions available, as you will notice
> from other replies, but the idea of simple filtering, controlled by your
> application, should work.
>
> Thanks,
> Shawn
>
>

Re: Public/Private data in Solr :: Metadata or ?

Posted by Shawn Heisey <ap...@elyograg.org>.
On 10/18/2016 3:00 PM, John Bickerstaff wrote:
> How (or is it even wise) to "segregate data" in Solr so that some data
> can be seen by some users and some data not be seen? 

IMHO, security like this isn't really Solr's job ... but with the right
data in the index, the system that DOES handle the security can include
a filter with each user's query to restrict them to only the data they
are allowed to see.  There are many ways to put data in the index for
efficient use by a filter.  The simplest would be a boolean field with a
name like isPublic or isPrivate, where true and false are mapped as
necessary to public and private.

Naturally, the users must not be able to reach Solr directly ... they
must be restricted to the software that connects to Solr on their
behalf.  Blocking end users from direct network access to Solr is a good
idea even if there are no other security needs.

There are more comprehensive solutions available, as you will notice
from other replies, but the idea of simple filtering, controlled by your
application, should work.

Thanks,
Shawn


Re: Public/Private data in Solr :: Metadata or ?

Posted by John Bickerstaff <jo...@johnbickerstaff.com>.
Thanks Jan --

I did a quick scan on the wiki and here:
http://www.slideshare.net/lucenerevolution/wright-nokia-manifoldcfeurocon-2011
and couldn't find the answer to the following question in the 5 or 10
minutes I spent looking.  Admittedly I'm being lazy and hoping you have
enough experience with the project to answer easily...

Do you know if ManifoldCF helps with a use case where the security token
needs to be changed arbitrarily and a re-index of the collection is not
practical?  Or is ManifoldCF an index-time only kind of thing?


Use Case:  User A changes "record A" from private to public so a friend
(User B) can see it.  User B logs in and expects to see what User A changed
to public a few minutes earlier.

The security token on "record A" would need to be changed immediately, and
that change would have to occur in Solr - yes?



On Tue, Oct 18, 2016 at 3:32 PM, Jan Høydahl <ja...@cominvent.com> wrote:

> https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security <
> https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security>
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 18. okt. 2016 kl. 23.00 skrev John Bickerstaff <john@johnbickerstaff.com
> >:
> >
> > I have a question that I suspect I'll need to answer very soon in my
> > current position.
> >
> > How (or is it even wise) to "segregate data" in Solr so that some data
> can
> > be seen by some users and some data not be seen?
> >
> > Taking the case of "public / private" as a (hopefully) simple, binary
> > example...
> >
> > Let's imagine I have a data set that can be seen by a user.  Some of that
> > data can be seen ONLY by the user (this would be the private data) and
> some
> > of it can be seen by others (assume the user gave permission for this in
> > some way)
> >
> > What is a best practice for handling this type of situation?  I can see
> > putting metadata in Solr of course, but the instant I do that, I create
> the
> > obligation to keep it updated (Document-level CRUD?) and I start using
> Solr
> > more like a DB than a search engine.
> >
> > (Assume the user can change this public/private setting on any one piece
> of
> > "their" data at any time).
> >
> > Of course, I can also see some kind of post-results massaging of data to
> > remove private data based on ID's which are stored in a database or
> similar
> > datastore...
> >
> > How have others solved this and is there a consensus on whether to keep
> it
> > out of Solr, or how best to handle it in Solr?
> >
> > Are there clever implementations of "secondary" collections in Solr for
> > this purpose?
> >
> > Any advice / hard-won experience is greatly appreciated...
>
>

Re: Public/Private data in Solr :: Metadata or ?

Posted by Jan Høydahl <ja...@cominvent.com>.
https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security <https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security>

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 18. okt. 2016 kl. 23.00 skrev John Bickerstaff <jo...@johnbickerstaff.com>:
> 
> I have a question that I suspect I'll need to answer very soon in my
> current position.
> 
> How (or is it even wise) to "segregate data" in Solr so that some data can
> be seen by some users and some data not be seen?
> 
> Taking the case of "public / private" as a (hopefully) simple, binary
> example...
> 
> Let's imagine I have a data set that can be seen by a user.  Some of that
> data can be seen ONLY by the user (this would be the private data) and some
> of it can be seen by others (assume the user gave permission for this in
> some way)
> 
> What is a best practice for handling this type of situation?  I can see
> putting metadata in Solr of course, but the instant I do that, I create the
> obligation to keep it updated (Document-level CRUD?) and I start using Solr
> more like a DB than a search engine.
> 
> (Assume the user can change this public/private setting on any one piece of
> "their" data at any time).
> 
> Of course, I can also see some kind of post-results massaging of data to
> remove private data based on ID's which are stored in a database or similar
> datastore...
> 
> How have others solved this and is there a consensus on whether to keep it
> out of Solr, or how best to handle it in Solr?
> 
> Are there clever implementations of "secondary" collections in Solr for
> this purpose?
> 
> Any advice / hard-won experience is greatly appreciated...


Re: Public/Private data in Solr :: Metadata or ?

Posted by Doug Turnbull <dt...@opensourceconnections.com>.
You might want to talk to Kevin Waters or look at some of the work being
done with the graph plugin. It's being used to model permissions with Solr.
It's a bit of normalization within Solr whereby you could localize updates
to a users shared-with document. Kevin can probably talk more intelligently
than I can about it.

-Doug
On Tue, Oct 18, 2016 at 5:00 PM John Bickerstaff <jo...@johnbickerstaff.com>
wrote:

> I have a question that I suspect I'll need to answer very soon in my
> current position.
>
> How (or is it even wise) to "segregate data" in Solr so that some data can
> be seen by some users and some data not be seen?
>
> Taking the case of "public / private" as a (hopefully) simple, binary
> example...
>
> Let's imagine I have a data set that can be seen by a user.  Some of that
> data can be seen ONLY by the user (this would be the private data) and some
> of it can be seen by others (assume the user gave permission for this in
> some way)
>
> What is a best practice for handling this type of situation?  I can see
> putting metadata in Solr of course, but the instant I do that, I create the
> obligation to keep it updated (Document-level CRUD?) and I start using Solr
> more like a DB than a search engine.
>
> (Assume the user can change this public/private setting on any one piece of
> "their" data at any time).
>
> Of course, I can also see some kind of post-results massaging of data to
> remove private data based on ID's which are stored in a database or similar
> datastore...
>
> How have others solved this and is there a consensus on whether to keep it
> out of Solr, or how best to handle it in Solr?
>
> Are there clever implementations of "secondary" collections in Solr for
> this purpose?
>
> Any advice / hard-won experience is greatly appreciated...
>