You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Kevin Osborn <os...@yahoo.com> on 2007/03/22 18:01:21 UTC

multiple indexes

Here is an issue that I am trying to resolve. We have a large catalog of documents, but our customers (several hundred) can only see a subset of those documents. And the subsets vary in size greatly. And some of these customers will be creating a lot of traffic. Also, there is no way to map the subsets to a query. The customer either has access to a document or they don't.

Has anybody worked on this issue before? If I use one large index and do the filtering in my application, then Solr will be serving a lot of useless documents. The counts would also be screwed up for facet queries. Is the best solution to extend Solr and do the filtering there?

The other potential solution is to have one index per customer. This would require one instance of the servlet per index, correct? It just seems like this would require a lot of hardware and complexity (configuring the memory of each servlet instance to index size and traffic).

Index partitioning looks like it could help here, but I see that is still on the task list. I don't know where that is in development, if anywhere.


Re: multiple indexes

Posted by Chris Hostetter <ho...@fucit.org>.
: Why not create a multivalued field that stores the customer perms?
: add has_access:cust1 has_access:cust2, etc to the document at index
: time, and turn this into a filter query at query time?

this can be a particularly effective solution when the permissions don't
change at all .. the ideal solution is where each doc is "owned" by one
and only one customer, but either way it's a matter of listing all of the
customers that have access to the document in a field, and filtering on
it. -- for a few hundred customers it's not a lot of work to cache those
filters, autowarming will help ensure that it's efficient.

this approach doesn't scale particulararly well to the tens of thousands
of "users" thta might search your site, but at that point you have to
start thinking about how you model the "access" in your underlying
datamodel ... odds are you have some concept of "public" documents versus
"private" documents, and hte private documents might have Access Control
lists based on "groups" and you can filter on that type of information
instead.



-Hoss


Re: multiple indexes

Posted by Ma...@ibsbe.be.
> Why not create a multivalued field that stores the customer perms?
> add has_access:cust1 has_access:cust2, etc to the document at index
> time, and turn this into a filter query at query time?

that is what we are doing at the moment, and i must say, it works very and 
does not slow the server down at all (because of the efficient indexes 
that solr builds)





"Mike Klaas" <mi...@gmail.com> 
22/03/2007 19:15
Please respond to
solr-user@lucene.apache.org


To
solr-user@lucene.apache.org
cc

Subject
Re: multiple indexes






On 3/22/07, Kevin Osborn <os...@yahoo.com> wrote:
> Here is an issue that I am trying to resolve. We have a large catalog of 
documents, but our customers (several hundred) can only see a subset of 
those documents. And the subsets vary in size greatly. And some of these 
customers will be creating a lot of traffic. Also, there is no way to map 
the subsets to a query. The customer either has access to a document or 
they don't.
>
> Has anybody worked on this issue before? If I use one large index and do 
the filtering in my application, then Solr will be serving a lot of 
useless documents. The counts would also be screwed up for facet queries. 
Is the best solution to extend Solr and do the filtering there?
>
> The other potential solution is to have one index per customer. This 
would require one instance of the servlet per index, correct? It just 
seems like this would require a lot of hardware and complexity 
(configuring the memory of each servlet instance to index size and 
traffic).

Why not create a multivalued field that stores the customer perms?
add has_access:cust1 has_access:cust2, etc to the document at index
time, and turn this into a filter query at query time?

-Mike


Re: multiple indexes

Posted by Mike Klaas <mi...@gmail.com>.
On 3/22/07, Kevin Osborn <os...@yahoo.com> wrote:
> Here is an issue that I am trying to resolve. We have a large catalog of documents, but our customers (several hundred) can only see a subset of those documents. And the subsets vary in size greatly. And some of these customers will be creating a lot of traffic. Also, there is no way to map the subsets to a query. The customer either has access to a document or they don't.
>
> Has anybody worked on this issue before? If I use one large index and do the filtering in my application, then Solr will be serving a lot of useless documents. The counts would also be screwed up for facet queries. Is the best solution to extend Solr and do the filtering there?
>
> The other potential solution is to have one index per customer. This would require one instance of the servlet per index, correct? It just seems like this would require a lot of hardware and complexity (configuring the memory of each servlet instance to index size and traffic).

Why not create a multivalued field that stores the customer perms?
add has_access:cust1 has_access:cust2, etc to the document at index
time, and turn this into a filter query at query time?

-Mike