You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Sujatha Arun <su...@gmail.com> on 2011/06/14 09:18:56 UTC

Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Hello,


Our Use Case is as follows

Several solr webapps (one JVM) ,Each webapp catering to one client .Each
client has their users who can purchase products from the  site .Once they
purchase ,they have full access to the products ,other wise they can only
view details .

The products are not tied to the user at the document  level, simply because
, once the purchase duration of product expires ,the user will no longer
have access to that product.

So a search for a product once the user logs in and searches for only the
products that he has access to Will translate to something like this . ,the
product ids are obtained form the db  for a particular user and can run
into  n  number.

<search term> &fq=product_id(100 10001  ......n number)

but we are currently running into too many Boolean expansion error .We are
not able to tie the user also into roles as each user is mainly any one who
comes to site and purchases a product .

Given the 2 solutions above as SOLR -1872 where we have to specify the user
in an ACL file  and
query for allow and deny also translates to what  we are trying to do above

In Case of SOLR 1834 ,we are required to use a crawler (APACHE manifoldCF)
for indexing the Permissions(also the data) into the document and then
querying on it ,this will also not work in our scenario as we have  n web
apps having the same requirement  ,it would be tedious to set this up for
each webapp and also the  requirement that once the user permission for a
product is revoked ,then he should not be able to search  on the same within
his subscribed products.

Any pointers would be helpful and sorry about the lengthy description.

Regards
Sujatha

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Sujatha Arun <su...@gmail.com>.
Peter ,

Thanks for the clarification.

Why  I specifically asked was because, we have  many search instances
(200+) on a single JVM.

Each of these instaces could have  <n> users and each user can subscribe  to
<n> products .Now accordng to your suggestion , I need to maintain an
in-memory list  of all users and their subscribed products  for each of the
instances and use this list to fllter for a given query.We are maintaining
the user and  subscrption details in a DB.

 I was wondering ,instead if it would make  more sense(with respect to
memory) to  dynamically  get the subscribed product ids when ever a user
logs in (as   access is only for the user session) and  use this data to
flter the query ?

And we really do not have budget and hence wont be able to contract  LI  for
this ,though I will certanly need to get some JAVA experts help wthin my
org.

Thanks for your time

Regards
Sujatha



On Wed, Jun 15, 2011 at 11:29 PM, Peter Sturge <pe...@gmail.com>wrote:

> Hi,
>
> By in-memory, I mean you hold a list of users (+ some other parameters
> like order number, expiry, what ever else you need) in one of those
> Greek HashMaps, and use this list to determine what query
> parameters/results will be processed for a given search request
> (SOLR-1872 reads an acl file to populate such a list). So if you had
> 500 users who had purchased stuff at a given moment, you'd have 500
> entries in the table that hold the relevant data to filter/not filter
> searches/results.
> This won't cause a memory problem unless you have a million users and
> stored their autobiography in each entry.
> I wouldn't call this sort of thing a novice or even journeyman's task,
> you would definitely need to know about using and maintaining tables
> etc.
> Would you be able to contract someone to do the work on your behalf?
> There are some excellent resources around, and Lucid would certainly
> do a great job, but of course you'd need budget for this approach.
> Alternatively, maybe you can tap some java expertise within your
> organization to help out?
>
> HTH,
> Peter
>
>
> On Wed, Jun 15, 2011 at 6:17 PM, Sujatha Arun <su...@gmail.com> wrote:
> > Thanks ,Peter.
> >
> > I am not a Java  Programmer  and hence the code seems all Greek and Latin
> to
> > me .I do have a basic knowledge ,but all this Map,hashMap
> > ,Hashlist,NamedList  , I dont understand.
> >
> > However  I would like to implement the solution that you have mentoned
>  ,so
> > if you have any pointers for me ,would be great .I would also try to dig
> > deep into JAVA.
> >
> > What s meant by  in-memory?Is it the Ram memory ,So If i  have <n>
> > concurrent users ,each having <n> products subscrbed,what would be the
> > Impact on memory ?
> >
> >
> >
> > Regards
> > Sujatha
> >
> >
> > On Tue, Jun 14, 2011 at 5:43 PM, Peter Sturge <peter.sturge@gmail.com
> >wrote:
> >
> >> SOLR-1872 doesn't add discrete booleans to the query, it does it
> >> programmatically, so you shouldn't see this problem. (if you have a
> >> look at the code, you'll see how it filters queries)
> >> I suppose you could modify SOLR-1872 to use an in-memory,
> >> dynamically-updated user list (+ associated filters) instead of using
> >> the acl file.
> >> This would give you the 'changing users' and 'expiry' functionailty you
> >> need.
> >>
> >>
> >>
> >> On Tue, Jun 14, 2011 at 10:08 AM, Sujatha Arun <su...@gmail.com>
> >> wrote:
> >> > Thanks Peter , for your input .
> >> >
> >> > I really  would like a document and schema agnostic   solution as  in
> >> solr
> >> > 1872.
> >> >
> >> >  Am I right  in my assumption that SOLR1872  is same as the solution
> that
> >> > we currently have where we add a flter query of the products  to
> orignal
> >> > query and hence (SOLR 1872) will also run into  TOO many boolean
> clause
> >> > expanson error?
> >> >
> >> > Regards
> >> > Sujatha
> >> >
> >> >
> >> > On Tue, Jun 14, 2011 at 1:53 PM, Peter Sturge <peter.sturge@gmail.com
> >> >wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> SOLR-1834 is good when the original documents' ACL is accessible.
> >> >> SOLR-1872 is good where the usernames are persistent - neither of
> >> >> these really fit your use case.
> >> >> It sounds like you need more of an 'in-memory', transient access
> >> >> control mechanism. Does the access have to exist beyond the user's
> >> >> session (or the Solr vm session)?
> >> >> Your best bet is probably something like a custom SearchComponent or
> >> >> similar, that keeps track of user purchases, and either
> adjusts/limits
> >> >> the query or the results to suit.
> >> >> With your own module in the query chain, you can then decide when the
> >> >> 'expiry' is, and limit results accordingly.
> >> >>
> >> >> SearchComponent's are pretty easy to write and integrate. Have a look
> >> at:
> >> >>   http://wiki.apache.org/solr/SearchComponent
> >> >> for info on SearchComponent and its usage.
> >> >>
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Jun 14, 2011 at 8:18 AM, Sujatha Arun <su...@gmail.com>
> >> wrote:
> >> >> > Hello,
> >> >> >
> >> >> >
> >> >> > Our Use Case is as follows
> >> >> >
> >> >> > Several solr webapps (one JVM) ,Each webapp catering to one client
> >> .Each
> >> >> > client has their users who can purchase products from the  site
> .Once
> >> >> they
> >> >> > purchase ,they have full access to the products ,other wise they
> can
> >> only
> >> >> > view details .
> >> >> >
> >> >> > The products are not tied to the user at the document  level,
> simply
> >> >> because
> >> >> > , once the purchase duration of product expires ,the user will no
> >> longer
> >> >> > have access to that product.
> >> >> >
> >> >> > So a search for a product once the user logs in and searches for
> only
> >> the
> >> >> > products that he has access to Will translate to something like
> this .
> >> >> ,the
> >> >> > product ids are obtained form the db  for a particular user and can
> >> run
> >> >> > into  n  number.
> >> >> >
> >> >> > <search term> &fq=product_id(100 10001  ......n number)
> >> >> >
> >> >> > but we are currently running into too many Boolean expansion error
> .We
> >> >> are
> >> >> > not able to tie the user also into roles as each user is mainly any
> >> one
> >> >> who
> >> >> > comes to site and purchases a product .
> >> >> >
> >> >> > Given the 2 solutions above as SOLR -1872 where we have to specify
> the
> >> >> user
> >> >> > in an ACL file  and
> >> >> > query for allow and deny also translates to what  we are trying to
> do
> >> >> above
> >> >> >
> >> >> > In Case of SOLR 1834 ,we are required to use a crawler (APACHE
> >> >> manifoldCF)
> >> >> > for indexing the Permissions(also the data) into the document and
> then
> >> >> > querying on it ,this will also not work in our scenario as we have
>  n
> >> web
> >> >> > apps having the same requirement  ,it would be tedious to set this
> up
> >> for
> >> >> > each webapp and also the  requirement that once the user permission
> >> for a
> >> >> > product is revoked ,then he should not be able to search  on the
> same
> >> >> within
> >> >> > his subscribed products.
> >> >> >
> >> >> > Any pointers would be helpful and sorry about the lengthy
> description.
> >> >> >
> >> >> > Regards
> >> >> > Sujatha
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Peter Sturge <pe...@gmail.com>.
Hi,

By in-memory, I mean you hold a list of users (+ some other parameters
like order number, expiry, what ever else you need) in one of those
Greek HashMaps, and use this list to determine what query
parameters/results will be processed for a given search request
(SOLR-1872 reads an acl file to populate such a list). So if you had
500 users who had purchased stuff at a given moment, you'd have 500
entries in the table that hold the relevant data to filter/not filter
searches/results.
This won't cause a memory problem unless you have a million users and
stored their autobiography in each entry.
I wouldn't call this sort of thing a novice or even journeyman's task,
you would definitely need to know about using and maintaining tables
etc.
Would you be able to contract someone to do the work on your behalf?
There are some excellent resources around, and Lucid would certainly
do a great job, but of course you'd need budget for this approach.
Alternatively, maybe you can tap some java expertise within your
organization to help out?

HTH,
Peter


On Wed, Jun 15, 2011 at 6:17 PM, Sujatha Arun <su...@gmail.com> wrote:
> Thanks ,Peter.
>
> I am not a Java  Programmer  and hence the code seems all Greek and Latin to
> me .I do have a basic knowledge ,but all this Map,hashMap
> ,Hashlist,NamedList  , I dont understand.
>
> However  I would like to implement the solution that you have mentoned  ,so
> if you have any pointers for me ,would be great .I would also try to dig
> deep into JAVA.
>
> What s meant by  in-memory?Is it the Ram memory ,So If i  have <n>
> concurrent users ,each having <n> products subscrbed,what would be the
> Impact on memory ?
>
>
>
> Regards
> Sujatha
>
>
> On Tue, Jun 14, 2011 at 5:43 PM, Peter Sturge <pe...@gmail.com>wrote:
>
>> SOLR-1872 doesn't add discrete booleans to the query, it does it
>> programmatically, so you shouldn't see this problem. (if you have a
>> look at the code, you'll see how it filters queries)
>> I suppose you could modify SOLR-1872 to use an in-memory,
>> dynamically-updated user list (+ associated filters) instead of using
>> the acl file.
>> This would give you the 'changing users' and 'expiry' functionailty you
>> need.
>>
>>
>>
>> On Tue, Jun 14, 2011 at 10:08 AM, Sujatha Arun <su...@gmail.com>
>> wrote:
>> > Thanks Peter , for your input .
>> >
>> > I really  would like a document and schema agnostic   solution as  in
>> solr
>> > 1872.
>> >
>> >  Am I right  in my assumption that SOLR1872  is same as the solution that
>> > we currently have where we add a flter query of the products  to orignal
>> > query and hence (SOLR 1872) will also run into  TOO many boolean clause
>> > expanson error?
>> >
>> > Regards
>> > Sujatha
>> >
>> >
>> > On Tue, Jun 14, 2011 at 1:53 PM, Peter Sturge <peter.sturge@gmail.com
>> >wrote:
>> >
>> >> Hi,
>> >>
>> >> SOLR-1834 is good when the original documents' ACL is accessible.
>> >> SOLR-1872 is good where the usernames are persistent - neither of
>> >> these really fit your use case.
>> >> It sounds like you need more of an 'in-memory', transient access
>> >> control mechanism. Does the access have to exist beyond the user's
>> >> session (or the Solr vm session)?
>> >> Your best bet is probably something like a custom SearchComponent or
>> >> similar, that keeps track of user purchases, and either adjusts/limits
>> >> the query or the results to suit.
>> >> With your own module in the query chain, you can then decide when the
>> >> 'expiry' is, and limit results accordingly.
>> >>
>> >> SearchComponent's are pretty easy to write and integrate. Have a look
>> at:
>> >>   http://wiki.apache.org/solr/SearchComponent
>> >> for info on SearchComponent and its usage.
>> >>
>> >>
>> >>
>> >>
>> >> On Tue, Jun 14, 2011 at 8:18 AM, Sujatha Arun <su...@gmail.com>
>> wrote:
>> >> > Hello,
>> >> >
>> >> >
>> >> > Our Use Case is as follows
>> >> >
>> >> > Several solr webapps (one JVM) ,Each webapp catering to one client
>> .Each
>> >> > client has their users who can purchase products from the  site .Once
>> >> they
>> >> > purchase ,they have full access to the products ,other wise they can
>> only
>> >> > view details .
>> >> >
>> >> > The products are not tied to the user at the document  level, simply
>> >> because
>> >> > , once the purchase duration of product expires ,the user will no
>> longer
>> >> > have access to that product.
>> >> >
>> >> > So a search for a product once the user logs in and searches for only
>> the
>> >> > products that he has access to Will translate to something like this .
>> >> ,the
>> >> > product ids are obtained form the db  for a particular user and can
>> run
>> >> > into  n  number.
>> >> >
>> >> > <search term> &fq=product_id(100 10001  ......n number)
>> >> >
>> >> > but we are currently running into too many Boolean expansion error .We
>> >> are
>> >> > not able to tie the user also into roles as each user is mainly any
>> one
>> >> who
>> >> > comes to site and purchases a product .
>> >> >
>> >> > Given the 2 solutions above as SOLR -1872 where we have to specify the
>> >> user
>> >> > in an ACL file  and
>> >> > query for allow and deny also translates to what  we are trying to do
>> >> above
>> >> >
>> >> > In Case of SOLR 1834 ,we are required to use a crawler (APACHE
>> >> manifoldCF)
>> >> > for indexing the Permissions(also the data) into the document and then
>> >> > querying on it ,this will also not work in our scenario as we have  n
>> web
>> >> > apps having the same requirement  ,it would be tedious to set this up
>> for
>> >> > each webapp and also the  requirement that once the user permission
>> for a
>> >> > product is revoked ,then he should not be able to search  on the same
>> >> within
>> >> > his subscribed products.
>> >> >
>> >> > Any pointers would be helpful and sorry about the lengthy description.
>> >> >
>> >> > Regards
>> >> > Sujatha
>> >> >
>> >>
>> >
>>
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Sujatha Arun <su...@gmail.com>.
Thanks ,Peter.

I am not a Java  Programmer  and hence the code seems all Greek and Latin to
me .I do have a basic knowledge ,but all this Map,hashMap
,Hashlist,NamedList  , I dont understand.

However  I would like to implement the solution that you have mentoned  ,so
if you have any pointers for me ,would be great .I would also try to dig
deep into JAVA.

What s meant by  in-memory?Is it the Ram memory ,So If i  have <n>
concurrent users ,each having <n> products subscrbed,what would be the
Impact on memory ?



Regards
Sujatha


On Tue, Jun 14, 2011 at 5:43 PM, Peter Sturge <pe...@gmail.com>wrote:

> SOLR-1872 doesn't add discrete booleans to the query, it does it
> programmatically, so you shouldn't see this problem. (if you have a
> look at the code, you'll see how it filters queries)
> I suppose you could modify SOLR-1872 to use an in-memory,
> dynamically-updated user list (+ associated filters) instead of using
> the acl file.
> This would give you the 'changing users' and 'expiry' functionailty you
> need.
>
>
>
> On Tue, Jun 14, 2011 at 10:08 AM, Sujatha Arun <su...@gmail.com>
> wrote:
> > Thanks Peter , for your input .
> >
> > I really  would like a document and schema agnostic   solution as  in
> solr
> > 1872.
> >
> >  Am I right  in my assumption that SOLR1872  is same as the solution that
> > we currently have where we add a flter query of the products  to orignal
> > query and hence (SOLR 1872) will also run into  TOO many boolean clause
> > expanson error?
> >
> > Regards
> > Sujatha
> >
> >
> > On Tue, Jun 14, 2011 at 1:53 PM, Peter Sturge <peter.sturge@gmail.com
> >wrote:
> >
> >> Hi,
> >>
> >> SOLR-1834 is good when the original documents' ACL is accessible.
> >> SOLR-1872 is good where the usernames are persistent - neither of
> >> these really fit your use case.
> >> It sounds like you need more of an 'in-memory', transient access
> >> control mechanism. Does the access have to exist beyond the user's
> >> session (or the Solr vm session)?
> >> Your best bet is probably something like a custom SearchComponent or
> >> similar, that keeps track of user purchases, and either adjusts/limits
> >> the query or the results to suit.
> >> With your own module in the query chain, you can then decide when the
> >> 'expiry' is, and limit results accordingly.
> >>
> >> SearchComponent's are pretty easy to write and integrate. Have a look
> at:
> >>   http://wiki.apache.org/solr/SearchComponent
> >> for info on SearchComponent and its usage.
> >>
> >>
> >>
> >>
> >> On Tue, Jun 14, 2011 at 8:18 AM, Sujatha Arun <su...@gmail.com>
> wrote:
> >> > Hello,
> >> >
> >> >
> >> > Our Use Case is as follows
> >> >
> >> > Several solr webapps (one JVM) ,Each webapp catering to one client
> .Each
> >> > client has their users who can purchase products from the  site .Once
> >> they
> >> > purchase ,they have full access to the products ,other wise they can
> only
> >> > view details .
> >> >
> >> > The products are not tied to the user at the document  level, simply
> >> because
> >> > , once the purchase duration of product expires ,the user will no
> longer
> >> > have access to that product.
> >> >
> >> > So a search for a product once the user logs in and searches for only
> the
> >> > products that he has access to Will translate to something like this .
> >> ,the
> >> > product ids are obtained form the db  for a particular user and can
> run
> >> > into  n  number.
> >> >
> >> > <search term> &fq=product_id(100 10001  ......n number)
> >> >
> >> > but we are currently running into too many Boolean expansion error .We
> >> are
> >> > not able to tie the user also into roles as each user is mainly any
> one
> >> who
> >> > comes to site and purchases a product .
> >> >
> >> > Given the 2 solutions above as SOLR -1872 where we have to specify the
> >> user
> >> > in an ACL file  and
> >> > query for allow and deny also translates to what  we are trying to do
> >> above
> >> >
> >> > In Case of SOLR 1834 ,we are required to use a crawler (APACHE
> >> manifoldCF)
> >> > for indexing the Permissions(also the data) into the document and then
> >> > querying on it ,this will also not work in our scenario as we have  n
> web
> >> > apps having the same requirement  ,it would be tedious to set this up
> for
> >> > each webapp and also the  requirement that once the user permission
> for a
> >> > product is revoked ,then he should not be able to search  on the same
> >> within
> >> > his subscribed products.
> >> >
> >> > Any pointers would be helpful and sorry about the lengthy description.
> >> >
> >> > Regards
> >> > Sujatha
> >> >
> >>
> >
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Peter Sturge <pe...@gmail.com>.
SOLR-1872 doesn't add discrete booleans to the query, it does it
programmatically, so you shouldn't see this problem. (if you have a
look at the code, you'll see how it filters queries)
I suppose you could modify SOLR-1872 to use an in-memory,
dynamically-updated user list (+ associated filters) instead of using
the acl file.
This would give you the 'changing users' and 'expiry' functionailty you need.



On Tue, Jun 14, 2011 at 10:08 AM, Sujatha Arun <su...@gmail.com> wrote:
> Thanks Peter , for your input .
>
> I really  would like a document and schema agnostic   solution as  in solr
> 1872.
>
>  Am I right  in my assumption that SOLR1872  is same as the solution that
> we currently have where we add a flter query of the products  to orignal
> query and hence (SOLR 1872) will also run into  TOO many boolean clause
> expanson error?
>
> Regards
> Sujatha
>
>
> On Tue, Jun 14, 2011 at 1:53 PM, Peter Sturge <pe...@gmail.com>wrote:
>
>> Hi,
>>
>> SOLR-1834 is good when the original documents' ACL is accessible.
>> SOLR-1872 is good where the usernames are persistent - neither of
>> these really fit your use case.
>> It sounds like you need more of an 'in-memory', transient access
>> control mechanism. Does the access have to exist beyond the user's
>> session (or the Solr vm session)?
>> Your best bet is probably something like a custom SearchComponent or
>> similar, that keeps track of user purchases, and either adjusts/limits
>> the query or the results to suit.
>> With your own module in the query chain, you can then decide when the
>> 'expiry' is, and limit results accordingly.
>>
>> SearchComponent's are pretty easy to write and integrate. Have a look at:
>>   http://wiki.apache.org/solr/SearchComponent
>> for info on SearchComponent and its usage.
>>
>>
>>
>>
>> On Tue, Jun 14, 2011 at 8:18 AM, Sujatha Arun <su...@gmail.com> wrote:
>> > Hello,
>> >
>> >
>> > Our Use Case is as follows
>> >
>> > Several solr webapps (one JVM) ,Each webapp catering to one client .Each
>> > client has their users who can purchase products from the  site .Once
>> they
>> > purchase ,they have full access to the products ,other wise they can only
>> > view details .
>> >
>> > The products are not tied to the user at the document  level, simply
>> because
>> > , once the purchase duration of product expires ,the user will no longer
>> > have access to that product.
>> >
>> > So a search for a product once the user logs in and searches for only the
>> > products that he has access to Will translate to something like this .
>> ,the
>> > product ids are obtained form the db  for a particular user and can run
>> > into  n  number.
>> >
>> > <search term> &fq=product_id(100 10001  ......n number)
>> >
>> > but we are currently running into too many Boolean expansion error .We
>> are
>> > not able to tie the user also into roles as each user is mainly any one
>> who
>> > comes to site and purchases a product .
>> >
>> > Given the 2 solutions above as SOLR -1872 where we have to specify the
>> user
>> > in an ACL file  and
>> > query for allow and deny also translates to what  we are trying to do
>> above
>> >
>> > In Case of SOLR 1834 ,we are required to use a crawler (APACHE
>> manifoldCF)
>> > for indexing the Permissions(also the data) into the document and then
>> > querying on it ,this will also not work in our scenario as we have  n web
>> > apps having the same requirement  ,it would be tedious to set this up for
>> > each webapp and also the  requirement that once the user permission for a
>> > product is revoked ,then he should not be able to search  on the same
>> within
>> > his subscribed products.
>> >
>> > Any pointers would be helpful and sorry about the lengthy description.
>> >
>> > Regards
>> > Sujatha
>> >
>>
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Sujatha Arun <su...@gmail.com>.
Thanks Peter , for your input .

I really  would like a document and schema agnostic   solution as  in solr
1872.

  Am I right  in my assumption that SOLR1872  is same as the solution that
we currently have where we add a flter query of the products  to orignal
query and hence (SOLR 1872) will also run into  TOO many boolean clause
expanson error?

Regards
Sujatha


On Tue, Jun 14, 2011 at 1:53 PM, Peter Sturge <pe...@gmail.com>wrote:

> Hi,
>
> SOLR-1834 is good when the original documents' ACL is accessible.
> SOLR-1872 is good where the usernames are persistent - neither of
> these really fit your use case.
> It sounds like you need more of an 'in-memory', transient access
> control mechanism. Does the access have to exist beyond the user's
> session (or the Solr vm session)?
> Your best bet is probably something like a custom SearchComponent or
> similar, that keeps track of user purchases, and either adjusts/limits
> the query or the results to suit.
> With your own module in the query chain, you can then decide when the
> 'expiry' is, and limit results accordingly.
>
> SearchComponent's are pretty easy to write and integrate. Have a look at:
>   http://wiki.apache.org/solr/SearchComponent
> for info on SearchComponent and its usage.
>
>
>
>
> On Tue, Jun 14, 2011 at 8:18 AM, Sujatha Arun <su...@gmail.com> wrote:
> > Hello,
> >
> >
> > Our Use Case is as follows
> >
> > Several solr webapps (one JVM) ,Each webapp catering to one client .Each
> > client has their users who can purchase products from the  site .Once
> they
> > purchase ,they have full access to the products ,other wise they can only
> > view details .
> >
> > The products are not tied to the user at the document  level, simply
> because
> > , once the purchase duration of product expires ,the user will no longer
> > have access to that product.
> >
> > So a search for a product once the user logs in and searches for only the
> > products that he has access to Will translate to something like this .
> ,the
> > product ids are obtained form the db  for a particular user and can run
> > into  n  number.
> >
> > <search term> &fq=product_id(100 10001  ......n number)
> >
> > but we are currently running into too many Boolean expansion error .We
> are
> > not able to tie the user also into roles as each user is mainly any one
> who
> > comes to site and purchases a product .
> >
> > Given the 2 solutions above as SOLR -1872 where we have to specify the
> user
> > in an ACL file  and
> > query for allow and deny also translates to what  we are trying to do
> above
> >
> > In Case of SOLR 1834 ,we are required to use a crawler (APACHE
> manifoldCF)
> > for indexing the Permissions(also the data) into the document and then
> > querying on it ,this will also not work in our scenario as we have  n web
> > apps having the same requirement  ,it would be tedious to set this up for
> > each webapp and also the  requirement that once the user permission for a
> > product is revoked ,then he should not be able to search  on the same
> within
> > his subscribed products.
> >
> > Any pointers would be helpful and sorry about the lengthy description.
> >
> > Regards
> > Sujatha
> >
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Peter Sturge <pe...@gmail.com>.
Hi,

SOLR-1834 is good when the original documents' ACL is accessible.
SOLR-1872 is good where the usernames are persistent - neither of
these really fit your use case.
It sounds like you need more of an 'in-memory', transient access
control mechanism. Does the access have to exist beyond the user's
session (or the Solr vm session)?
Your best bet is probably something like a custom SearchComponent or
similar, that keeps track of user purchases, and either adjusts/limits
the query or the results to suit.
With your own module in the query chain, you can then decide when the
'expiry' is, and limit results accordingly.

SearchComponent's are pretty easy to write and integrate. Have a look at:
   http://wiki.apache.org/solr/SearchComponent
for info on SearchComponent and its usage.




On Tue, Jun 14, 2011 at 8:18 AM, Sujatha Arun <su...@gmail.com> wrote:
> Hello,
>
>
> Our Use Case is as follows
>
> Several solr webapps (one JVM) ,Each webapp catering to one client .Each
> client has their users who can purchase products from the  site .Once they
> purchase ,they have full access to the products ,other wise they can only
> view details .
>
> The products are not tied to the user at the document  level, simply because
> , once the purchase duration of product expires ,the user will no longer
> have access to that product.
>
> So a search for a product once the user logs in and searches for only the
> products that he has access to Will translate to something like this . ,the
> product ids are obtained form the db  for a particular user and can run
> into  n  number.
>
> <search term> &fq=product_id(100 10001  ......n number)
>
> but we are currently running into too many Boolean expansion error .We are
> not able to tie the user also into roles as each user is mainly any one who
> comes to site and purchases a product .
>
> Given the 2 solutions above as SOLR -1872 where we have to specify the user
> in an ACL file  and
> query for allow and deny also translates to what  we are trying to do above
>
> In Case of SOLR 1834 ,we are required to use a crawler (APACHE manifoldCF)
> for indexing the Permissions(also the data) into the document and then
> querying on it ,this will also not work in our scenario as we have  n web
> apps having the same requirement  ,it would be tedious to set this up for
> each webapp and also the  requirement that once the user permission for a
> product is revoked ,then he should not be able to search  on the same within
> his subscribed products.
>
> Any pointers would be helpful and sorry about the lengthy description.
>
> Regards
> Sujatha
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Sujatha Arun <su...@gmail.com>.
 Constantijn,

I am aware of this and we have already  increased max  boolean clauses  to
<3500> from the default <1200>  for all our 200+ instances .

But the requirement is that  we could have  <n>  number of products running
to several thousands for each of the instances and since <n> is not defined
, this will not scale considering <n>  could be different for each of our
instances and also the performance impact of so many Boolean clauses.

Regards
Sujatha



On Fri, Jun 17, 2011 at 2:58 PM, Constantijn Visinescu
<ba...@gmail.com>wrote:

> Just to chip in my 2 cents:
>
> You know you can increase the max number of boolean clauses in the
> configuration files?
> Depending on your situation it might not be a permanent fix, but it
> could provide some instant relief.
>
> Constantijn
>
>
> On Fri, Jun 17, 2011 at 11:19 AM, Peter Sturge <pe...@gmail.com>
> wrote:
> > You'll need to be a bit careful using joins, as the performance hit
> > can be significant if you have lots of cross-referencing to do, which
> > I believe you would given your scenario.
> >
> > Your table could be setup to use the username as the key (for fast
> > lookup), then map these to your own data class or collection or
> > similar to hold your other information: products, expiry etc.
> > By using your own data class, it's then easy to extend it later if you
> > want to add additional parameters. (for example: HashMap<String,
> > MyDataClass>)
> >
> > When a search comes in, the user is looked up to retrieve the data
> > class, then its contents (as defined by you) is examined and the query
> > is processed/filtered appropriately.
> >
> > You'll need a bootstrap mechanism for populating the list in the first
> > place. One thing worth looking at is lazy loading - i.e. the first
> > time a user does a search (you lookup the user in the table, and it
> > isn't there), you load the data class (maybe from your DB, a file, or
> > index), then ad it to the table. This is good if you have 10's of
> > thousands or millions of users, but only a handful are actually
> > searching, some perhaps very rarely.
> >
> > If you do have millions of users, and your data class has heavy
> > requirements (e.g. many thousands of products + info etc.), you might
> > want to 'time-out' in-memory table entries, if the table gets really
> > huge - it depends on the usage of your system. (you can run a
> > synchronized cleanup thread to do this if you deemed it necessary).
> >
> >
> > On Fri, Jun 17, 2011 at 6:06 AM, Sujatha Arun <su...@gmail.com>
> wrote:
> >> Alexey,
> >>
> >> Do you mean that we  have current Index as it is and have a separate
> core
> >> which  has only the user-id ,product-id relation and at while querying
> ,do a
> >> join between the two cores based on the user-id.
> >>
> >>
> >> This would involve us to Index/delete the product  as and when the user
> >> subscription for a product changes ,This would involve some amount of
> >> latency if the Indexing (we have a queue system for Indexing across the
> >> various instances) or deletion is delayed
> >>
> >> IF we want to go ahead with this solution ,We currently are using solr
> 1.3
> >> , so  is this functionality available as a patch for solr 1.3?Would it
> be
> >> possible to  do with a separate Index  instead of a core ,then I can
> create
> >> only one  Index common for all our instances and then use this instance
> to
> >> do the join.
> >>
> >> Thanks
> >> Sujatha
> >>
> >> On Thu, Jun 16, 2011 at 9:27 PM, Alexey Serba <as...@gmail.com> wrote:
> >>
> >>> > So a search for a product once the user logs in and searches for only
> the
> >>> > products that he has access to Will translate to something like this
> .
> >>> ,the
> >>> > product ids are obtained form the db  for a particular user and can
> run
> >>> > into  n  number.
> >>> >
> >>> > <search term> &fq=product_id(100 10001  ......n number)
> >>> >
> >>> > but we are currently running into too many Boolean expansion error
> .We
> >>> are
> >>> > not able to tie the user also into roles as each user is mainly any
> one
> >>> who
> >>> > comes to site and purchases a product .
> >>>
> >>> I'm wondering if new trunk Solr join functionality can help here.
> >>>
> >>> * http://wiki.apache.org/solr/Join
> >>>
> >>> In theory you can index your products (product_id, ...) and
> >>> user_id-product many-to-many relation (user_product_id, user_id) into
> >>> signle/different cores and then do join, like
> >>> f=search terms&fq={!join from=product_id
> to=user_product_id}user_id:10101
> >>>
> >>> But I haven't tried that, so I'm just speculating.
> >>>
> >>
> >
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Constantijn Visinescu <ba...@gmail.com>.
Just to chip in my 2 cents:

You know you can increase the max number of boolean clauses in the
configuration files?
Depending on your situation it might not be a permanent fix, but it
could provide some instant relief.

Constantijn


On Fri, Jun 17, 2011 at 11:19 AM, Peter Sturge <pe...@gmail.com> wrote:
> You'll need to be a bit careful using joins, as the performance hit
> can be significant if you have lots of cross-referencing to do, which
> I believe you would given your scenario.
>
> Your table could be setup to use the username as the key (for fast
> lookup), then map these to your own data class or collection or
> similar to hold your other information: products, expiry etc.
> By using your own data class, it's then easy to extend it later if you
> want to add additional parameters. (for example: HashMap<String,
> MyDataClass>)
>
> When a search comes in, the user is looked up to retrieve the data
> class, then its contents (as defined by you) is examined and the query
> is processed/filtered appropriately.
>
> You'll need a bootstrap mechanism for populating the list in the first
> place. One thing worth looking at is lazy loading - i.e. the first
> time a user does a search (you lookup the user in the table, and it
> isn't there), you load the data class (maybe from your DB, a file, or
> index), then ad it to the table. This is good if you have 10's of
> thousands or millions of users, but only a handful are actually
> searching, some perhaps very rarely.
>
> If you do have millions of users, and your data class has heavy
> requirements (e.g. many thousands of products + info etc.), you might
> want to 'time-out' in-memory table entries, if the table gets really
> huge - it depends on the usage of your system. (you can run a
> synchronized cleanup thread to do this if you deemed it necessary).
>
>
> On Fri, Jun 17, 2011 at 6:06 AM, Sujatha Arun <su...@gmail.com> wrote:
>> Alexey,
>>
>> Do you mean that we  have current Index as it is and have a separate core
>> which  has only the user-id ,product-id relation and at while querying ,do a
>> join between the two cores based on the user-id.
>>
>>
>> This would involve us to Index/delete the product  as and when the user
>> subscription for a product changes ,This would involve some amount of
>> latency if the Indexing (we have a queue system for Indexing across the
>> various instances) or deletion is delayed
>>
>> IF we want to go ahead with this solution ,We currently are using solr 1.3
>> , so  is this functionality available as a patch for solr 1.3?Would it be
>> possible to  do with a separate Index  instead of a core ,then I can create
>> only one  Index common for all our instances and then use this instance to
>> do the join.
>>
>> Thanks
>> Sujatha
>>
>> On Thu, Jun 16, 2011 at 9:27 PM, Alexey Serba <as...@gmail.com> wrote:
>>
>>> > So a search for a product once the user logs in and searches for only the
>>> > products that he has access to Will translate to something like this .
>>> ,the
>>> > product ids are obtained form the db  for a particular user and can run
>>> > into  n  number.
>>> >
>>> > <search term> &fq=product_id(100 10001  ......n number)
>>> >
>>> > but we are currently running into too many Boolean expansion error .We
>>> are
>>> > not able to tie the user also into roles as each user is mainly any one
>>> who
>>> > comes to site and purchases a product .
>>>
>>> I'm wondering if new trunk Solr join functionality can help here.
>>>
>>> * http://wiki.apache.org/solr/Join
>>>
>>> In theory you can index your products (product_id, ...) and
>>> user_id-product many-to-many relation (user_product_id, user_id) into
>>> signle/different cores and then do join, like
>>> f=search terms&fq={!join from=product_id to=user_product_id}user_id:10101
>>>
>>> But I haven't tried that, so I'm just speculating.
>>>
>>
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Sujatha Arun <su...@gmail.com>.
Thanks ,Peter .

This very much seems to be the solution that I should be going forward with
.Thanks for your time and clear explanation.

Regards
Sujatha






On Fri, Jun 17, 2011 at 2:49 PM, Peter Sturge <pe...@gmail.com>wrote:

> You'll need to be a bit careful using joins, as the performance hit
> can be significant if you have lots of cross-referencing to do, which
> I believe you would given your scenario.
>
> Your table could be setup to use the username as the key (for fast
> lookup), then map these to your own data class or collection or
> similar to hold your other information: products, expiry etc.
> By using your own data class, it's then easy to extend it later if you
> want to add additional parameters. (for example: HashMap<String,
> MyDataClass>)
>
> When a search comes in, the user is looked up to retrieve the data
> class, then its contents (as defined by you) is examined and the query
> is processed/filtered appropriately.
>
> You'll need a bootstrap mechanism for populating the list in the first
> place. One thing worth looking at is lazy loading - i.e. the first
> time a user does a search (you lookup the user in the table, and it
> isn't there), you load the data class (maybe from your DB, a file, or
> index), then ad it to the table. This is good if you have 10's of
> thousands or millions of users, but only a handful are actually
> searching, some perhaps very rarely.
>
> If you do have millions of users, and your data class has heavy
> requirements (e.g. many thousands of products + info etc.), you might
> want to 'time-out' in-memory table entries, if the table gets really
> huge - it depends on the usage of your system. (you can run a
> synchronized cleanup thread to do this if you deemed it necessary).
>
>
> On Fri, Jun 17, 2011 at 6:06 AM, Sujatha Arun <su...@gmail.com> wrote:
> > Alexey,
> >
> > Do you mean that we  have current Index as it is and have a separate core
> > which  has only the user-id ,product-id relation and at while querying
> ,do a
> > join between the two cores based on the user-id.
> >
> >
> > This would involve us to Index/delete the product  as and when the user
> > subscription for a product changes ,This would involve some amount of
> > latency if the Indexing (we have a queue system for Indexing across the
> > various instances) or deletion is delayed
> >
> > IF we want to go ahead with this solution ,We currently are using solr
> 1.3
> > , so  is this functionality available as a patch for solr 1.3?Would it be
> > possible to  do with a separate Index  instead of a core ,then I can
> create
> > only one  Index common for all our instances and then use this instance
> to
> > do the join.
> >
> > Thanks
> > Sujatha
> >
> > On Thu, Jun 16, 2011 at 9:27 PM, Alexey Serba <as...@gmail.com> wrote:
> >
> >> > So a search for a product once the user logs in and searches for only
> the
> >> > products that he has access to Will translate to something like this .
> >> ,the
> >> > product ids are obtained form the db  for a particular user and can
> run
> >> > into  n  number.
> >> >
> >> > <search term> &fq=product_id(100 10001  ......n number)
> >> >
> >> > but we are currently running into too many Boolean expansion error .We
> >> are
> >> > not able to tie the user also into roles as each user is mainly any
> one
> >> who
> >> > comes to site and purchases a product .
> >>
> >> I'm wondering if new trunk Solr join functionality can help here.
> >>
> >> * http://wiki.apache.org/solr/Join
> >>
> >> In theory you can index your products (product_id, ...) and
> >> user_id-product many-to-many relation (user_product_id, user_id) into
> >> signle/different cores and then do join, like
> >> f=search terms&fq={!join from=product_id
> to=user_product_id}user_id:10101
> >>
> >> But I haven't tried that, so I'm just speculating.
> >>
> >
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Peter Sturge <pe...@gmail.com>.
You'll need to be a bit careful using joins, as the performance hit
can be significant if you have lots of cross-referencing to do, which
I believe you would given your scenario.

Your table could be setup to use the username as the key (for fast
lookup), then map these to your own data class or collection or
similar to hold your other information: products, expiry etc.
By using your own data class, it's then easy to extend it later if you
want to add additional parameters. (for example: HashMap<String,
MyDataClass>)

When a search comes in, the user is looked up to retrieve the data
class, then its contents (as defined by you) is examined and the query
is processed/filtered appropriately.

You'll need a bootstrap mechanism for populating the list in the first
place. One thing worth looking at is lazy loading - i.e. the first
time a user does a search (you lookup the user in the table, and it
isn't there), you load the data class (maybe from your DB, a file, or
index), then ad it to the table. This is good if you have 10's of
thousands or millions of users, but only a handful are actually
searching, some perhaps very rarely.

If you do have millions of users, and your data class has heavy
requirements (e.g. many thousands of products + info etc.), you might
want to 'time-out' in-memory table entries, if the table gets really
huge - it depends on the usage of your system. (you can run a
synchronized cleanup thread to do this if you deemed it necessary).


On Fri, Jun 17, 2011 at 6:06 AM, Sujatha Arun <su...@gmail.com> wrote:
> Alexey,
>
> Do you mean that we  have current Index as it is and have a separate core
> which  has only the user-id ,product-id relation and at while querying ,do a
> join between the two cores based on the user-id.
>
>
> This would involve us to Index/delete the product  as and when the user
> subscription for a product changes ,This would involve some amount of
> latency if the Indexing (we have a queue system for Indexing across the
> various instances) or deletion is delayed
>
> IF we want to go ahead with this solution ,We currently are using solr 1.3
> , so  is this functionality available as a patch for solr 1.3?Would it be
> possible to  do with a separate Index  instead of a core ,then I can create
> only one  Index common for all our instances and then use this instance to
> do the join.
>
> Thanks
> Sujatha
>
> On Thu, Jun 16, 2011 at 9:27 PM, Alexey Serba <as...@gmail.com> wrote:
>
>> > So a search for a product once the user logs in and searches for only the
>> > products that he has access to Will translate to something like this .
>> ,the
>> > product ids are obtained form the db  for a particular user and can run
>> > into  n  number.
>> >
>> > <search term> &fq=product_id(100 10001  ......n number)
>> >
>> > but we are currently running into too many Boolean expansion error .We
>> are
>> > not able to tie the user also into roles as each user is mainly any one
>> who
>> > comes to site and purchases a product .
>>
>> I'm wondering if new trunk Solr join functionality can help here.
>>
>> * http://wiki.apache.org/solr/Join
>>
>> In theory you can index your products (product_id, ...) and
>> user_id-product many-to-many relation (user_product_id, user_id) into
>> signle/different cores and then do join, like
>> f=search terms&fq={!join from=product_id to=user_product_id}user_id:10101
>>
>> But I haven't tried that, so I'm just speculating.
>>
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Sujatha Arun <su...@gmail.com>.
Alexey ,

We are not planning to upgrade our solr version at the moment as all is fine
with the current version so far and hence would not be able  to  try this
solution .

Regards
Sujatha
On Fri, Jun 17, 2011 at 3:47 PM, Alexey Serba <as...@gmail.com> wrote:

> > Do you mean that we  have current Index as it is and have a separate core
> > which  has only the user-id ,product-id relation and at while querying
> ,do a
> > join between the two cores based on the user-id.
> Exactly. You can index user-id, product-id relation either to the same
> core or to different core on the same Solr instance.
>
> > This would involve us to Index/delete the product  as and when the user
> > subscription for a product changes ,This would involve some amount of
> > latency if the Indexing (we have a queue system for Indexing across the
> > various instances) or deletion is delayed
> Right, but I'm not sure if it's possible to achieve good performance
> requiring zero latency.
>
> > IF we want to go ahead with this solution ,We currently are using solr
> 1.3
> > , so  is this functionality available as a patch for solr 1.3?
> No. AFAIK it's in trunk only.
>
> > Would it be
> > possible to  do with a separate Index  instead of a core ,then I can
> create
> > only one  Index common for all our instances and then use this instance
> to
> > do the join.
> No, I don't think that's possible with join feature. I guess that
> would require network request per search req and number of mapped ids
> could be huge, so it could affect performance significantly.
>
> > You'll need to be a bit careful using joins, as the performance hit
> > can be significant if you have lots of cross-referencing to do, which
> > I believe you would given your scenario.
> As far as I understand join query would build bitset filter which can
> be cached in filterCache, etc. The only performance impact I can think
> of is that user-product relations table could be too big to fit into
> single instance.
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Alexey Serba <as...@gmail.com>.
> Do you mean that we  have current Index as it is and have a separate core
> which  has only the user-id ,product-id relation and at while querying ,do a
> join between the two cores based on the user-id.
Exactly. You can index user-id, product-id relation either to the same
core or to different core on the same Solr instance.

> This would involve us to Index/delete the product  as and when the user
> subscription for a product changes ,This would involve some amount of
> latency if the Indexing (we have a queue system for Indexing across the
> various instances) or deletion is delayed
Right, but I'm not sure if it's possible to achieve good performance
requiring zero latency.

> IF we want to go ahead with this solution ,We currently are using solr 1.3
> , so  is this functionality available as a patch for solr 1.3?
No. AFAIK it's in trunk only.

> Would it be
> possible to  do with a separate Index  instead of a core ,then I can create
> only one  Index common for all our instances and then use this instance to
> do the join.
No, I don't think that's possible with join feature. I guess that
would require network request per search req and number of mapped ids
could be huge, so it could affect performance significantly.

> You'll need to be a bit careful using joins, as the performance hit
> can be significant if you have lots of cross-referencing to do, which
> I believe you would given your scenario.
As far as I understand join query would build bitset filter which can
be cached in filterCache, etc. The only performance impact I can think
of is that user-product relations table could be too big to fit into
single instance.

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Sujatha Arun <su...@gmail.com>.
Alexey,

Do you mean that we  have current Index as it is and have a separate core
which  has only the user-id ,product-id relation and at while querying ,do a
join between the two cores based on the user-id.


This would involve us to Index/delete the product  as and when the user
subscription for a product changes ,This would involve some amount of
latency if the Indexing (we have a queue system for Indexing across the
various instances) or deletion is delayed

IF we want to go ahead with this solution ,We currently are using solr 1.3
, so  is this functionality available as a patch for solr 1.3?Would it be
possible to  do with a separate Index  instead of a core ,then I can create
only one  Index common for all our instances and then use this instance to
do the join.

Thanks
Sujatha

On Thu, Jun 16, 2011 at 9:27 PM, Alexey Serba <as...@gmail.com> wrote:

> > So a search for a product once the user logs in and searches for only the
> > products that he has access to Will translate to something like this .
> ,the
> > product ids are obtained form the db  for a particular user and can run
> > into  n  number.
> >
> > <search term> &fq=product_id(100 10001  ......n number)
> >
> > but we are currently running into too many Boolean expansion error .We
> are
> > not able to tie the user also into roles as each user is mainly any one
> who
> > comes to site and purchases a product .
>
> I'm wondering if new trunk Solr join functionality can help here.
>
> * http://wiki.apache.org/solr/Join
>
> In theory you can index your products (product_id, ...) and
> user_id-product many-to-many relation (user_product_id, user_id) into
> signle/different cores and then do join, like
> f=search terms&fq={!join from=product_id to=user_product_id}user_id:10101
>
> But I haven't tried that, so I'm just speculating.
>

Re: Document Level Security (SOLR-1872 ,SOLR,SOLR-1834)

Posted by Alexey Serba <as...@gmail.com>.
> So a search for a product once the user logs in and searches for only the
> products that he has access to Will translate to something like this . ,the
> product ids are obtained form the db  for a particular user and can run
> into  n  number.
>
> <search term> &fq=product_id(100 10001  ......n number)
>
> but we are currently running into too many Boolean expansion error .We are
> not able to tie the user also into roles as each user is mainly any one who
> comes to site and purchases a product .

I'm wondering if new trunk Solr join functionality can help here.

* http://wiki.apache.org/solr/Join

In theory you can index your products (product_id, ...) and
user_id-product many-to-many relation (user_product_id, user_id) into
signle/different cores and then do join, like
f=search terms&fq={!join from=product_id to=user_product_id}user_id:10101

But I haven't tried that, so I'm just speculating.