You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Lisheng Zhang <lz...@gmail.com> on 2016/06/01 19:34:43 UTC

Re: Using solr with increasing complicated access control

Eric: thanks very much for your quick response (somehow msg was sent to
spam initially, sorry about that)

yes the rules has to be complicated beyond my control, we also tried to
filter after search, but after data amount grows, it becomes slow ..

Rightnow lucene has feature like document block or join to simulate
relational database behavior, did lucene implement join by:

1/ internally flatten out documents to generate one new document
2/ or search more than once, then merge results
3/ or better way i could not see?

For now i only need a high level understanding, thanks for your helps,
Lisheng


On Mon, May 23, 2016 at 6:23 PM, Erick Erickson <er...@gmail.com>
wrote:

> I know this seems facetious, but.... Talk to your
> clients about _why_ they want such increasingly
> complex access requirements. Often the logic
> is pretty flawed for the complexity. Things like
> "allow user X to see document Y if they're part of
> groups A, B, C but not D or E unless they are
> also part of sub-group F and it's raining outside"...
>
> If the rules _must_ be complicated, that's what
> post-filters were actually invented for. Pretty often
> I'll build in some "bailout" because whatever you
> build has, eventually, to deal with the system
> admin searching all documents, i.e. doing the
> ACL calcs for every document.
>
> Best,
> Erick
>
> On Mon, May 23, 2016 at 6:02 PM, Lisheng Zhang <lz...@gmail.com>
> wrote:
> > Hi, i have been using solr for many years and it is VERY helpful.
> >
> > My problem is that our app has an increasingly more complicated access
> > control to satisfy client's requirement, in solr/lucene  it means we need
> > to add more and more fields into each document and use more and more
> > complicated filter conditions, so code is hard to maintain and indexing
> > becomes a serious issue because we want to search as real time as
> possible.
> >
> > I would appreciate a high level guidance on how to deal with this issue?
> > recently i investigated mySQL fulltext search (our app uses mySQL), using
> > mySQL means we simply reuse DB for access control, but mySQL fulltext
> > search performance is far from ideal compared to solr.
> >
> > Thanks very much for helps, Lisheng
>

Re: Using solr with increasing complicated access control

Posted by Erick Erickson <er...@gmail.com>.
Lisheng:

I'm not too up on the details of Lucene block join, but I don't
think it really applies to access control. You'd have to
have documents grouped by access control (i.e. every
child doc of doc X has the same access control). If you
can do that, you can put an "authorization token" in the
doc (or more than one) and just use simple fq clauses.

Here's one technique I've seen: Implement a custom
post-filter that computes access rights when each doc
comes through (see:
http://qaware.blogspot.com/2014/11/how-to-write-postfilter-for-solr-49.html
it's a little old but it'll give you an idea of how to do this).

Then, in the collect method, quit calculating these after
N docs (where N is "how many you can compute quickly enough
to satisfy your SLA) and after reaching N, fail all other docs.
Then return some indicator about "please refine your search"
so the user knows they may not have seen the best docs,
but there were a _lot_ of docs that matched.

It's not perfect, but it often suffices. You certainly don't want
to be in the situation where you have to calculate the access
privileges for every doc in the corpus or, as you indicated,
it gets really slow.

Or get the principals to have more reasonable access rules ;)

Best,
Erick

On Wed, Jun 1, 2016 at 5:07 PM, Lisheng Zhang <lz...@gmail.com> wrote:
> Erick, very sorry that i misspelled your name earlier! later i read more
> and found that lucene seemed to implement approach 2/ (search a few times
> and combine results), i guess when joining becomes complicated the
> performance may suffer? later i will try to study more,
>
> thanks for helps, Lisheng
>
> On Wed, Jun 1, 2016 at 12:34 PM, Lisheng Zhang <lz...@gmail.com> wrote:
>
>> Eric: thanks very much for your quick response (somehow msg was sent to
>> spam initially, sorry about that)
>>
>> yes the rules has to be complicated beyond my control, we also tried to
>> filter after search, but after data amount grows, it becomes slow ..
>>
>> Rightnow lucene has feature like document block or join to simulate
>> relational database behavior, did lucene implement join by:
>>
>> 1/ internally flatten out documents to generate one new document
>> 2/ or search more than once, then merge results
>> 3/ or better way i could not see?
>>
>> For now i only need a high level understanding, thanks for your helps,
>> Lisheng
>>
>>
>> On Mon, May 23, 2016 at 6:23 PM, Erick Erickson <er...@gmail.com>
>> wrote:
>>
>>> I know this seems facetious, but.... Talk to your
>>> clients about _why_ they want such increasingly
>>> complex access requirements. Often the logic
>>> is pretty flawed for the complexity. Things like
>>> "allow user X to see document Y if they're part of
>>> groups A, B, C but not D or E unless they are
>>> also part of sub-group F and it's raining outside"...
>>>
>>> If the rules _must_ be complicated, that's what
>>> post-filters were actually invented for. Pretty often
>>> I'll build in some "bailout" because whatever you
>>> build has, eventually, to deal with the system
>>> admin searching all documents, i.e. doing the
>>> ACL calcs for every document.
>>>
>>> Best,
>>> Erick
>>>
>>> On Mon, May 23, 2016 at 6:02 PM, Lisheng Zhang <lz...@gmail.com>
>>> wrote:
>>> > Hi, i have been using solr for many years and it is VERY helpful.
>>> >
>>> > My problem is that our app has an increasingly more complicated access
>>> > control to satisfy client's requirement, in solr/lucene  it means we
>>> need
>>> > to add more and more fields into each document and use more and more
>>> > complicated filter conditions, so code is hard to maintain and indexing
>>> > becomes a serious issue because we want to search as real time as
>>> possible.
>>> >
>>> > I would appreciate a high level guidance on how to deal with this issue?
>>> > recently i investigated mySQL fulltext search (our app uses mySQL),
>>> using
>>> > mySQL means we simply reuse DB for access control, but mySQL fulltext
>>> > search performance is far from ideal compared to solr.
>>> >
>>> > Thanks very much for helps, Lisheng
>>>
>>
>>

Re: Using solr with increasing complicated access control

Posted by Lisheng Zhang <lz...@gmail.com>.
Erick, very sorry that i misspelled your name earlier! later i read more
and found that lucene seemed to implement approach 2/ (search a few times
and combine results), i guess when joining becomes complicated the
performance may suffer? later i will try to study more,

thanks for helps, Lisheng

On Wed, Jun 1, 2016 at 12:34 PM, Lisheng Zhang <lz...@gmail.com> wrote:

> Eric: thanks very much for your quick response (somehow msg was sent to
> spam initially, sorry about that)
>
> yes the rules has to be complicated beyond my control, we also tried to
> filter after search, but after data amount grows, it becomes slow ..
>
> Rightnow lucene has feature like document block or join to simulate
> relational database behavior, did lucene implement join by:
>
> 1/ internally flatten out documents to generate one new document
> 2/ or search more than once, then merge results
> 3/ or better way i could not see?
>
> For now i only need a high level understanding, thanks for your helps,
> Lisheng
>
>
> On Mon, May 23, 2016 at 6:23 PM, Erick Erickson <er...@gmail.com>
> wrote:
>
>> I know this seems facetious, but.... Talk to your
>> clients about _why_ they want such increasingly
>> complex access requirements. Often the logic
>> is pretty flawed for the complexity. Things like
>> "allow user X to see document Y if they're part of
>> groups A, B, C but not D or E unless they are
>> also part of sub-group F and it's raining outside"...
>>
>> If the rules _must_ be complicated, that's what
>> post-filters were actually invented for. Pretty often
>> I'll build in some "bailout" because whatever you
>> build has, eventually, to deal with the system
>> admin searching all documents, i.e. doing the
>> ACL calcs for every document.
>>
>> Best,
>> Erick
>>
>> On Mon, May 23, 2016 at 6:02 PM, Lisheng Zhang <lz...@gmail.com>
>> wrote:
>> > Hi, i have been using solr for many years and it is VERY helpful.
>> >
>> > My problem is that our app has an increasingly more complicated access
>> > control to satisfy client's requirement, in solr/lucene  it means we
>> need
>> > to add more and more fields into each document and use more and more
>> > complicated filter conditions, so code is hard to maintain and indexing
>> > becomes a serious issue because we want to search as real time as
>> possible.
>> >
>> > I would appreciate a high level guidance on how to deal with this issue?
>> > recently i investigated mySQL fulltext search (our app uses mySQL),
>> using
>> > mySQL means we simply reuse DB for access control, but mySQL fulltext
>> > search performance is far from ideal compared to solr.
>> >
>> > Thanks very much for helps, Lisheng
>>
>
>