You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by jo...@aol.com on 2015/03/03 15:32:19 UTC

Access permission

Hi,


I'm indexing data off a DB.  The data is secured with access permission.  That is record-A can be seen by users-x, while record-B can be seen by users-y and yet record-C can be seen by users x and y.  Even more, the group access permission can change over time.


The question I have is this: how to handle this in Solr?  Is there anything I can do during index and / or search time?  What's the best practice to handle access permission in search?


Thanks!


- MJ


RE: Cores and and ranking (search quality)

Posted by jo...@aol.com.
Help me understand this better (regarding ranking).

If I have two docs that are 100% identical with the exception of uid (which is stored but not indexed).  In a single core setup, if I search "xyz" such that those 2 docs end up ranking as #1 and #2.  When I switch over to two core setup, doc-A goes to core-A (which has 10 records) and doc-B goes to core-B (which has 100,000 records).

Now, are you saying in 2 core setup if I search on "xyz" (just like in singe core setup) this time I will not see doc-A and doc-B as #1 and #2 in ranking?  That is, are you saying doc-A may now be somewhere at the top / bottom far away from doc-B?  If so, which will be #1: the doc off core-A (that has 10 records) or doc-B off core-B (that has 100,000 records)?

If I got all this right, are you saying SOLR-1632 will fix this issue such that the end result will now be as if I had 1 core?

- MJ


-----Original Message-----
From: Toke Eskildsen [mailto:te@statsbiblioteket.dk] 
Sent: Thursday, March 5, 2015 9:06 AM
To: solr-user@lucene.apache.org
Subject: Re: Cores and and ranking (search quality)

On Thu, 2015-03-05 at 14:34 +0100, johnmunir@aol.com wrote:
> My question is this: if I put my data in multiple cores and use 
> distributed search will the ranking be different if I had all my data 
> in a single core?

Yes, it will be different. The practical impact depends on how homogeneous your data are across the shards and how large your shards are. If you have small and dissimilar shards, your ranking will suffer a lot.

Work is being done to remedy this:
https://issues.apache.org/jira/browse/SOLR-1632

> Also, will facet and more-like-this quality / result be the same?

It is not formally guaranteed, but for most practical purposes, faceting on multi-shards will give you the same results as single-shards.

I don't know about more-like-this. My guess is that it will be affected in the same way that standard searches are.

> Also, reading the distributed search wiki
> (http://wiki.apache.org/solr/DistributedSearch) it looks like Solr 
> does the search and result merging (all I have to do is issue a 
> search), is this correct?

Yes. From a user-perspective, searches are no different.

- Toke Eskildsen, State and University Library, Denmark


Re: Access permission

Posted by John Maker <jo...@aol.com>.
Option #2 is far better.


I found this: https://wiki.apache.org/solr/SolrSecurity#Document_Level_Security but this solution requires that I use Manifold CF which I cannot.  Does anyone know how Manifold does it and can it be adopted to Solr?


Another idea I'm wandering about is what if I create two cores, one core holds the indexed docs, while the other core holds doc-id + user-ids to which they have access to docs.  Then I can do a join between those two cores?  I have not given this enough thinking to know if it will work.  If it does, will ranking be impacted (the fact that I'm now searching across two cores)?


- MJ



-----Original Message-----
From: Erick Erickson <er...@gmail.com>
To: solr-user <so...@lucene.apache.org>
Sent: Tue, Mar 3, 2015 6:46 pm
Subject: Re: Access permission



You really have two choices:
1> index tokens with each doc of those (usually
groups) that are
authorized to see them.
    Then when a
user signs on, the front end assembles the list of groups that the user
     belongs to and
appends a filter query to each request like
&fq=auth:(group1 group5 group89)
    This starts to
break down if any particular user can belong to many hundreds of groups,
    although if you
construct the fq clause _exactly_ the same way each time, requests 2-n will
    use the
filterCache.
    The other way
this breaks down is if you have to grant individual user/doc rights.
     The user
changing groups isn't really a problem, since the fq clause you assemble will
just change.
    The big
downside here is if the doc/group permissions change. Say
group1 suddenly gets or loses
    permissions to
docs 1, 4, 90, 108. You must then re-index (or use atomic updates) to update
the
    auth tokens in
each of those docs


2> use a "post filter", see:
http://heliosearch.org/advanced-filter-caching-in-solr/.
The advantage here
    is that the
filter is run _only_ on docs that make it through the original query _and_ all
   more costly
filters.


HTH,
Erick
 
On Tue, Mar 3, 2015 at 6:32 AM,  <jo...@aol.com>
wrote:
> 
> Hi,
> 
> 
> I'm indexing data off a DB.  The data is secured with access
permission.  That is record-A can be seen
by users-x, while record-B can be seen by users-y and yet record-C can be seen
by users x and y.  Even more, the group
access permission can change over time.
> 
> 
> The question I have is this: how to handle this in
Solr?  Is there anything I can do during
index and / or search time?  What's the
best practice to handle access permission in search?
> 
> 
> Thanks!
> 
> 
> - MJ
> 

 

Re: Access permission

Posted by Erick Erickson <er...@gmail.com>.
You really have two choices:
1> index tokens with each doc of those (usually groups) that are
authorized to see them.
    Then when a user signs on, the front end assembles the list of
groups that the user
     belongs to and appends a filter query to each request like
&fq=auth:(group1 group5 group89)
    This starts to break down if any particular user can belong to
many hundreds of groups,
    although if you construct the fq clause _exactly_ the same way
each time, requests 2-n will
    use the filterCache.
    The other way this breaks down is if you have to grant individual
user/doc rights.
     The user changing groups isn't really a problem, since the fq
clause you assemble will just change.
    The big downside here is if the doc/group permissions change. Say
group1 suddenly gets or loses
    permissions to docs 1, 4, 90, 108. You must then re-index (or use
atomic updates) to update the
    auth tokens in each of those docs

2> use a "post filter", see:
http://heliosearch.org/advanced-filter-caching-in-solr/. The advantage
here
    is that the filter is run _only_ on docs that make it through the
original query _and_ all
   more costly filters.

HTH,
Erick

On Tue, Mar 3, 2015 at 6:32 AM,  <jo...@aol.com> wrote:
>
> Hi,
>
>
> I'm indexing data off a DB.  The data is secured with access permission.  That is record-A can be seen by users-x, while record-B can be seen by users-y and yet record-C can be seen by users x and y.  Even more, the group access permission can change over time.
>
>
> The question I have is this: how to handle this in Solr?  Is there anything I can do during index and / or search time?  What's the best practice to handle access permission in search?
>
>
> Thanks!
>
>
> - MJ
>