You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by kchellappa <ka...@gmail.com> on 2013/11/19 23:27:10 UTC

Indexing different customer customized field values

In our application, we index educational resources and allow searching for
them.
We allow our customers to change some of the non-textual metadata associated
with a resource (like booklevel, interestlevel etc) to serve their users
better.
So for each resource, in theory it could have different set of metadata
values for each customer, but in reality may be 10 - 25% of our customers
customize a small portion of the resources.

Our current solution uses SQL Server to manage the customizations (the
database is sharded for other reasons as well) and also uses SQL Server's
Full Text index for search.
We are replacing this with Solr.

There are few approaches we had thought about, but none of them seem ideal

a) Duplicate the entries in Solr.  Each resource would be replicated for
each customer and there would be an index entry/customer.  
The number of index entries is an big concern even though the text field
values are the same.  
(We have about 300K resources and about 50K customers and both will grow)

b) Use a dedicated solr core for each customer.  This wouldn't be using
resources efficiently and we would be duplicating textual components 
which doesn't change from customer to customer.

c) Use a Global index that has the resources with default values and then
use a separate index for each customer that contains resources that are
customized
This requires managing lot of small cores/indexes.  Also this would require
merging results from multiple cores, so don't think this will work

d) Use solr to do the text search and do Post Processing to filter based on
metadata externally -- as you can imagine, this have all the 
challenges associated with post processing (pagination support, etc)

e) Use Advanced/Post filtering Solr support --- Even if we can figure out a
reasonable way to cache the lookup for metadata values for each customer, 
not sure if this would be efficient

Any other recommendations on solutions.



--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-different-customer-customized-field-values-tp4102000.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing different customer customized field values

Posted by kchellappa <ka...@gmail.com>.
Thanks Otis

We also thought about having multiple fields, but thought that having too
many fields will be an issue.  I see threads about too many fields is an
issue for sort (we don't expect to sort on these), but look through the
archives.





--
View this message in context: http://lucene.472066.n3.nabble.com/Indexing-different-customer-customized-field-values-tp4102000p4102204.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing different customer customized field values

Posted by Otis Gospodnetic <ot...@gmail.com>.
Very fuzzy idea here, and maybe there are better approaches I'm not
thinking of right now, but would working with dynamic fields whose names
include customer ID work for you here?

e.g.
global field: booklevel=valueX
customer-specific field for customer 007: booklevel_007=valueY

Your query could then include both fields or maybe you can play with
function queries like http://wiki.apache.org/solr/FunctionQuery#exists to
make queries behave the way you want them to behave in situations like the
one above.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Tue, Nov 19, 2013 at 5:27 PM, kchellappa <ka...@gmail.com>wrote:

> In our application, we index educational resources and allow searching for
> them.
> We allow our customers to change some of the non-textual metadata
> associated
> with a resource (like booklevel, interestlevel etc) to serve their users
> better.
> So for each resource, in theory it could have different set of metadata
> values for each customer, but in reality may be 10 - 25% of our customers
> customize a small portion of the resources.
>
> Our current solution uses SQL Server to manage the customizations (the
> database is sharded for other reasons as well) and also uses SQL Server's
> Full Text index for search.
> We are replacing this with Solr.
>
> There are few approaches we had thought about, but none of them seem ideal
>
> a) Duplicate the entries in Solr.  Each resource would be replicated for
> each customer and there would be an index entry/customer.
> The number of index entries is an big concern even though the text field
> values are the same.
> (We have about 300K resources and about 50K customers and both will grow)
>
> b) Use a dedicated solr core for each customer.  This wouldn't be using
> resources efficiently and we would be duplicating textual components
> which doesn't change from customer to customer.
>
> c) Use a Global index that has the resources with default values and then
> use a separate index for each customer that contains resources that are
> customized
> This requires managing lot of small cores/indexes.  Also this would require
> merging results from multiple cores, so don't think this will work
>
> d) Use solr to do the text search and do Post Processing to filter based on
> metadata externally -- as you can imagine, this have all the
> challenges associated with post processing (pagination support, etc)
>
> e) Use Advanced/Post filtering Solr support --- Even if we can figure out a
> reasonable way to cache the lookup for metadata values for each customer,
> not sure if this would be efficient
>
> Any other recommendations on solutions.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Indexing-different-customer-customized-field-values-tp4102000.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>