You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alejandro Calbazana <ac...@gmail.com> on 2013/10/30 02:37:31 UTC

Many Dynamic Fields + Indexing Strategy

Hi,

I have an application that has a fair number of dynamic fields in addition
to static fields.  The use case is that a customer can create any number of
dynamic fields and associate them with domain objects that we then pull
into an indexed document.  I have no way to know these fields in advance
and the expectation is that these fields are searchable using a field/value
query.  It is a multi-tenant environment and it is possible that there
could be a high volume of dynamic fields created.

My question is if there is a reasonable indexing strategy that can be used
to accommodate such a use case.  My concern is that I can end up with a
large number of dynamic fields which would bring querying and full indexing
to a slow down.  Through some testing, I've created unique dynamic fields
and got into the 50K - 100K range when my JVM began to behave poorly and go
OOM.  I understand why this happens but I'm interested in how to protect
against this.

My only thought at the moment is to split my single index into multiple
cores - one per tenant.  Has anyone else had this requirement?  How did you
handle it?

My schema is pretty much what I've described.  A handful of static fields
with the stock dynamic field pattern definitions.  I am using Solr 4.2.1.

Thanks,

Al

Re: Many Dynamic Fields + Indexing Strategy

Posted by Jack Krupansky <ja...@basetechnology.com>.
Every multitenant situation is going to be different, but at the extreme a 
single core per tenant is the cleanest and provides the best separation, 
optimal performance, and supports full tf-idf relevancy of document fields 
for each tenant.

You can also do a hybrid, where you have separate cores for the bulk data 
for each tenant, but have a single common collection with a subset of tenant 
data which your admin application can use to do searches across tenants for 
common metadata.

-- Jack Krupansky

-----Original Message----- 
From: Alejandro Calbazana
Sent: Tuesday, October 29, 2013 9:37 PM
To: solr-user@lucene.apache.org
Subject: Many Dynamic Fields + Indexing Strategy

Hi,

I have an application that has a fair number of dynamic fields in addition
to static fields.  The use case is that a customer can create any number of
dynamic fields and associate them with domain objects that we then pull
into an indexed document.  I have no way to know these fields in advance
and the expectation is that these fields are searchable using a field/value
query.  It is a multi-tenant environment and it is possible that there
could be a high volume of dynamic fields created.

My question is if there is a reasonable indexing strategy that can be used
to accommodate such a use case.  My concern is that I can end up with a
large number of dynamic fields which would bring querying and full indexing
to a slow down.  Through some testing, I've created unique dynamic fields
and got into the 50K - 100K range when my JVM began to behave poorly and go
OOM.  I understand why this happens but I'm interested in how to protect
against this.

My only thought at the moment is to split my single index into multiple
cores - one per tenant.  Has anyone else had this requirement?  How did you
handle it?

My schema is pretty much what I've described.  A handful of static fields
with the stock dynamic field pattern definitions.  I am using Solr 4.2.1.

Thanks,

Al