You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Mike L." <ja...@yahoo.com> on 2014/02/05 04:00:52 UTC

Max Limit to Schema Fields - Solr 4.X

 
solr user group -
 
    I'm afraid I may have a scenario where I might need to define a few thousand fields in Solr. The context here is, this type of data is extremely granular and unfortunately cannot be grouped into logical groupings or aggregate fields because there is a need to know which granular field contains the data, and those field needs to be searchable. 
 
 With that said, I expect each <doc> to not contain more than 100 fields with loaded data at a given time. Its just not clear of the few thousand fields created, which ones will have the data pertaining to that doc. 
 
I'm just wondering here if there is any defined limit to how many fields can be created within a schema? I'm sure the configuration maintenance of a schema like this would be a nightmare, but would like to know if its at all possible in the first place before It may be attempted.
 
Thanks in advance -
Mike

Re: Max Limit to Schema Fields - Solr 4.X

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
You could probably manage the schema by using dynamic fields. Also,
enable lazy loading to avoid loading the values of the fields you do
not care about.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Wed, Feb 5, 2014 at 12:10 PM, Mike L. <ja...@yahoo.com> wrote:
>
> Hey Jack -
>
> Two types of queries:
>
> A) Return all docs that have a match for a particular value from a particular field  (fq=fieldname:value). Because of this I feel Im tied to defining all the fields. No particular field matters more than another - depends on the search context so hard to predict common searches.
>
> B) Return all docs whom have a particular value in one or more fields.
> (small subset of the 3000).
>
> Ive been a bit spoiled with Solr being used to response times less than 50ms, but in this case - search does not have to be fast. Also total index size would be less than a 1GB and with less than 1M total docs.
>
> -Mike
>
> Sent from my iPhone
>
>> On Feb 4, 2014, at 10:38 PM, "Jack Krupansky" <ja...@basetechnology.com> wrote:
>>
>> What will your queries be like? Will it be okay if they are relatively slow? I mean, how many of those 100 fields will you need to use in a typical (95th percentile) query?
>>
>> -- Jack Krupansky
>>
>> -----Original Message----- From: Mike L.
>> Sent: Tuesday, February 4, 2014 10:00 PM
>> To: solr-user@lucene.apache.org
>> Subject: Max Limit to Schema Fields - Solr 4.X
>>
>>
>> solr user group -
>>
>>   I'm afraid I may have a scenario where I might need to define a few thousand fields in Solr. The context here is, this type of data is extremely granular and unfortunately cannot be grouped into logical groupings or aggregate fields because there is a need to know which granular field contains the data, and those field needs to be searchable.
>>
>> With that said, I expect each <doc> to not contain more than 100 fields with loaded data at a given time. Its just not clear of the few thousand fields created, which ones will have the data pertaining to that doc.
>>
>> I'm just wondering here if there is any defined limit to how many fields can be created within a schema? I'm sure the configuration maintenance of a schema like this would be a nightmare, but would like to know if its at all possible in the first place before It may be attempted.
>>
>> Thanks in advance -
>> Mike

Re: Max Limit to Schema Fields - Solr 4.X

Posted by "Mike L." <ja...@yahoo.com>.
Hey Jack - 

Two types of queries:

A) Return all docs that have a match for a particular value from a particular field  (fq=fieldname:value). Because of this I feel Im tied to defining all the fields. No particular field matters more than another - depends on the search context so hard to predict common searches.

B) Return all docs whom have a particular value in one or more fields.
(small subset of the 3000). 

Ive been a bit spoiled with Solr being used to response times less than 50ms, but in this case - search does not have to be fast. Also total index size would be less than a 1GB and with less than 1M total docs. 
 
-Mike

Sent from my iPhone

> On Feb 4, 2014, at 10:38 PM, "Jack Krupansky" <ja...@basetechnology.com> wrote:
> 
> What will your queries be like? Will it be okay if they are relatively slow? I mean, how many of those 100 fields will you need to use in a typical (95th percentile) query?
> 
> -- Jack Krupansky
> 
> -----Original Message----- From: Mike L.
> Sent: Tuesday, February 4, 2014 10:00 PM
> To: solr-user@lucene.apache.org
> Subject: Max Limit to Schema Fields - Solr 4.X
> 
> 
> solr user group -
> 
>   I'm afraid I may have a scenario where I might need to define a few thousand fields in Solr. The context here is, this type of data is extremely granular and unfortunately cannot be grouped into logical groupings or aggregate fields because there is a need to know which granular field contains the data, and those field needs to be searchable.
> 
> With that said, I expect each <doc> to not contain more than 100 fields with loaded data at a given time. Its just not clear of the few thousand fields created, which ones will have the data pertaining to that doc.
> 
> I'm just wondering here if there is any defined limit to how many fields can be created within a schema? I'm sure the configuration maintenance of a schema like this would be a nightmare, but would like to know if its at all possible in the first place before It may be attempted.
> 
> Thanks in advance -
> Mike 

Re: Max Limit to Schema Fields - Solr 4.X

Posted by Jack Krupansky <ja...@basetechnology.com>.
What will your queries be like? Will it be okay if they are relatively slow? 
I mean, how many of those 100 fields will you need to use in a typical (95th 
percentile) query?

-- Jack Krupansky

-----Original Message----- 
From: Mike L.
Sent: Tuesday, February 4, 2014 10:00 PM
To: solr-user@lucene.apache.org
Subject: Max Limit to Schema Fields - Solr 4.X


solr user group -

    I'm afraid I may have a scenario where I might need to define a few 
thousand fields in Solr. The context here is, this type of data is extremely 
granular and unfortunately cannot be grouped into logical groupings or 
aggregate fields because there is a need to know which granular field 
contains the data, and those field needs to be searchable.

With that said, I expect each <doc> to not contain more than 100 fields with 
loaded data at a given time. Its just not clear of the few thousand fields 
created, which ones will have the data pertaining to that doc.

I'm just wondering here if there is any defined limit to how many fields can 
be created within a schema? I'm sure the configuration maintenance of a 
schema like this would be a nightmare, but would like to know if its at all 
possible in the first place before It may be attempted.

Thanks in advance -
Mike 


Re: Max Limit to Schema Fields - Solr 4.X

Posted by "Mike L." <ja...@yahoo.com>.
Appreciate all the support and I'll give it a whirl. Cheers!

Sent from my iPhone

> On Feb 8, 2014, at 4:25 PM, Shawn Heisey <so...@elyograg.org> wrote:
> 
>> On 2/8/2014 12:12 PM, Mike L. wrote:
>> Im going to try loading all 3000 fields in the schema and see how that goes. Only concern is doing boolean searches and whether or not Ill run into URL length issues but I guess Ill find out soon.
> 
> It will likely work without a problem.  As already mentioned, you may
> need to increase maxBooleanClauses in solrconfig.xml beyond the default
> of 1024.
> 
> The max URL size is configurable with any decent servlet container,
> including the jetty that comes with the Solr example.  In the part of
> the jetty config that adds the connector, this increases the max HTTP
> header size to 32K, and the size for the entire HTTP buffer to 64K.
> These may not be big enough with 3000 fields, but it gives you the
> general idea:
> 
>            <Set name="requestHeaderSize">32768</Set>
>            <Set name="requestBufferSize">65536</Set>
> 
> Another option is to use a POST request instead of a GET request with
> the parameters in the posted body.  The default POST buffer size in
> Jetty is 200K.  In newer versions of Solr, the limit is actually set by
> Solr, not the servlet container, and defaults to 2MB.  I believe that if
> you are using SolrJ, it uses POST requests by default.
> 
> Thanks,
> Shawn
> 

Re: Max Limit to Schema Fields - Solr 4.X

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/8/2014 12:12 PM, Mike L. wrote:
> Im going to try loading all 3000 fields in the schema and see how that goes. Only concern is doing boolean searches and whether or not Ill run into URL length issues but I guess Ill find out soon.

It will likely work without a problem.  As already mentioned, you may
need to increase maxBooleanClauses in solrconfig.xml beyond the default
of 1024.

The max URL size is configurable with any decent servlet container,
including the jetty that comes with the Solr example.  In the part of
the jetty config that adds the connector, this increases the max HTTP
header size to 32K, and the size for the entire HTTP buffer to 64K.
These may not be big enough with 3000 fields, but it gives you the
general idea:

            <Set name="requestHeaderSize">32768</Set>
            <Set name="requestBufferSize">65536</Set>

Another option is to use a POST request instead of a GET request with
the parameters in the posted body.  The default POST buffer size in
Jetty is 200K.  In newer versions of Solr, the limit is actually set by
Solr, not the servlet container, and defaults to 2MB.  I believe that if
you are using SolrJ, it uses POST requests by default.

Thanks,
Shawn


Re: Max Limit to Schema Fields - Solr 4.X

Posted by Jack Krupansky <ja...@basetechnology.com>.
It's safe to say that you are on thin ice - try it and if it works well for 
your specific data and requirements, great. But if it eventually fails, you 
can't say we didn't warn you!

Generally, I would say that "a few hundred" fields is the recommended 
limit - not that more will definitely cause problems, but because you will 
be beyond common usage and increasingly sensitive to amount of data and 
Java/JVM performance capabilities.

-- Jack Krupansky

-----Original Message----- 
From: Mike L.
Sent: Saturday, February 8, 2014 2:12 PM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Max Limit to Schema Fields - Solr 4.X

That was the original plan.

However its important to preserve the originating field that loaded the 
value. The data is very fine and granular and each field stores a particular 
value. When searching the data against solr - it would be important to know 
what docs contain that particular data from that particular field. 
(fielda:value, fieldb:value ) where as searching field:value would hide what 
originating field loaded this value.

Im going to try loading all 3000 fields in the schema and see how that goes. 
Only concern is doing boolean searches and whether or not Ill run into URL 
length issues but I guess Ill find out soon.

Thanks again!

Sent from my iPhone

> On Feb 6, 2014, at 1:02 PM, Erick Erickson <er...@gmail.com> 
> wrote:
>
> Sometimes you can spoof the many fields problem by using prefixes on the
> data. Rather than fielda, fieldb... Have field and index values like
> fielda_value, fieldb_value into a single field. Then do the right thing
> when searching. Watch tokenization though.
>
> Best
> Erick
>> On Feb 5, 2014 4:59 AM, "Mike L." <ja...@yahoo.com> wrote:
>>
>>
>> Thanks Shawn. This is good to know.
>>
>>
>> Sent from my iPhone
>>
>>>> On Feb 5, 2014, at 12:53 AM, Shawn Heisey <so...@elyograg.org> wrote:
>>>>
>>>> On 2/4/2014 8:00 PM, Mike L. wrote:
>>>> I'm just wondering here if there is any defined limit to how many
>> fields can be created within a schema? I'm sure the configuration
>> maintenance of a schema like this would be a nightmare, but would like to
>> know if its at all possible in the first place before It may be 
>> attempted.
>>>
>>> There are no hard limits on the number of fields, whether they are
>>> dynamically defined or not. Several thousand fields should be no
>>> problem.  If you have enough system resources and you don't run into an
>>> unlikely bug, there's no reason it won't work.  As you've already been
>>> told, there are potential performance concerns.  Depending on the exact
>>> nature of your queries, you might need to increase maxBooleanClauses.
>>>
>>> The only hard limitation that Lucene really has (and by extension, Solr
>>> also has that limitation) is that a single index cannot have more than
>>> about two billion documents in it - the inherent limitation on a Java
>>> "int" type.  Solr can use indexes larger than this through sharding.
>>>
>>> See the very end of this page:
>> https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/codecs/lucene46/package-summary.html#Limitations
>>>
>>> Thanks,
>>> Shawn
>> 


Re: Max Limit to Schema Fields - Solr 4.X

Posted by "Mike L." <ja...@yahoo.com>.
That was the original plan. 

However its important to preserve the originating field that loaded the value. The data is very fine and granular and each field stores a particular value. When searching the data against solr - it would be important to know what docs contain that particular data from that particular field. (fielda:value, fieldb:value ) where as searching field:value would hide what originating field loaded this value. 

Im going to try loading all 3000 fields in the schema and see how that goes. Only concern is doing boolean searches and whether or not Ill run into URL length issues but I guess Ill find out soon.

Thanks again!

Sent from my iPhone

> On Feb 6, 2014, at 1:02 PM, Erick Erickson <er...@gmail.com> wrote:
> 
> Sometimes you can spoof the many fields problem by using prefixes on the
> data. Rather than fielda, fieldb... Have field and index values like
> fielda_value, fieldb_value into a single field. Then do the right thing
> when searching. Watch tokenization though.
> 
> Best
> Erick
>> On Feb 5, 2014 4:59 AM, "Mike L." <ja...@yahoo.com> wrote:
>> 
>> 
>> Thanks Shawn. This is good to know.
>> 
>> 
>> Sent from my iPhone
>> 
>>>> On Feb 5, 2014, at 12:53 AM, Shawn Heisey <so...@elyograg.org> wrote:
>>>> 
>>>> On 2/4/2014 8:00 PM, Mike L. wrote:
>>>> I'm just wondering here if there is any defined limit to how many
>> fields can be created within a schema? I'm sure the configuration
>> maintenance of a schema like this would be a nightmare, but would like to
>> know if its at all possible in the first place before It may be attempted.
>>> 
>>> There are no hard limits on the number of fields, whether they are
>>> dynamically defined or not. Several thousand fields should be no
>>> problem.  If you have enough system resources and you don't run into an
>>> unlikely bug, there's no reason it won't work.  As you've already been
>>> told, there are potential performance concerns.  Depending on the exact
>>> nature of your queries, you might need to increase maxBooleanClauses.
>>> 
>>> The only hard limitation that Lucene really has (and by extension, Solr
>>> also has that limitation) is that a single index cannot have more than
>>> about two billion documents in it - the inherent limitation on a Java
>>> "int" type.  Solr can use indexes larger than this through sharding.
>>> 
>>> See the very end of this page:
>> https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/codecs/lucene46/package-summary.html#Limitations
>>> 
>>> Thanks,
>>> Shawn
>> 

Re: Max Limit to Schema Fields - Solr 4.X

Posted by Erick Erickson <er...@gmail.com>.
Sometimes you can spoof the many fields problem by using prefixes on the
data. Rather than fielda, fieldb... Have field and index values like
fielda_value, fieldb_value into a single field. Then do the right thing
when searching. Watch tokenization though.

Best
Erick
On Feb 5, 2014 4:59 AM, "Mike L." <ja...@yahoo.com> wrote:

>
> Thanks Shawn. This is good to know.
>
>
> Sent from my iPhone
>
> > On Feb 5, 2014, at 12:53 AM, Shawn Heisey <so...@elyograg.org> wrote:
> >
> >> On 2/4/2014 8:00 PM, Mike L. wrote:
> >> I'm just wondering here if there is any defined limit to how many
> fields can be created within a schema? I'm sure the configuration
> maintenance of a schema like this would be a nightmare, but would like to
> know if its at all possible in the first place before It may be attempted.
> >
> > There are no hard limits on the number of fields, whether they are
> > dynamically defined or not. Several thousand fields should be no
> > problem.  If you have enough system resources and you don't run into an
> > unlikely bug, there's no reason it won't work.  As you've already been
> > told, there are potential performance concerns.  Depending on the exact
> > nature of your queries, you might need to increase maxBooleanClauses.
> >
> > The only hard limitation that Lucene really has (and by extension, Solr
> > also has that limitation) is that a single index cannot have more than
> > about two billion documents in it - the inherent limitation on a Java
> > "int" type.  Solr can use indexes larger than this through sharding.
> >
> > See the very end of this page:
> >
> >
> https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/codecs/lucene46/package-summary.html#Limitations
> >
> > Thanks,
> > Shawn
> >
>

Re: Max Limit to Schema Fields - Solr 4.X

Posted by "Mike L." <ja...@yahoo.com>.
Thanks Shawn. This is good to know. 


Sent from my iPhone

> On Feb 5, 2014, at 12:53 AM, Shawn Heisey <so...@elyograg.org> wrote:
> 
>> On 2/4/2014 8:00 PM, Mike L. wrote:
>> I'm just wondering here if there is any defined limit to how many fields can be created within a schema? I'm sure the configuration maintenance of a schema like this would be a nightmare, but would like to know if its at all possible in the first place before It may be attempted.
> 
> There are no hard limits on the number of fields, whether they are
> dynamically defined or not. Several thousand fields should be no
> problem.  If you have enough system resources and you don't run into an
> unlikely bug, there's no reason it won't work.  As you've already been
> told, there are potential performance concerns.  Depending on the exact
> nature of your queries, you might need to increase maxBooleanClauses.
> 
> The only hard limitation that Lucene really has (and by extension, Solr
> also has that limitation) is that a single index cannot have more than
> about two billion documents in it - the inherent limitation on a Java
> "int" type.  Solr can use indexes larger than this through sharding.
> 
> See the very end of this page:
> 
> https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/codecs/lucene46/package-summary.html#Limitations
> 
> Thanks,
> Shawn
> 

Re: Max Limit to Schema Fields - Solr 4.X

Posted by Shawn Heisey <so...@elyograg.org>.
On 2/4/2014 8:00 PM, Mike L. wrote:
> I'm just wondering here if there is any defined limit to how many fields can be created within a schema? I'm sure the configuration maintenance of a schema like this would be a nightmare, but would like to know if its at all possible in the first place before It may be attempted.

There are no hard limits on the number of fields, whether they are
dynamically defined or not. Several thousand fields should be no
problem.  If you have enough system resources and you don't run into an
unlikely bug, there's no reason it won't work.  As you've already been
told, there are potential performance concerns.  Depending on the exact
nature of your queries, you might need to increase maxBooleanClauses.

The only hard limitation that Lucene really has (and by extension, Solr
also has that limitation) is that a single index cannot have more than
about two billion documents in it - the inherent limitation on a Java
"int" type.  Solr can use indexes larger than this through sharding.

See the very end of this page:

https://lucene.apache.org/core/4_6_0/core/org/apache/lucene/codecs/lucene46/package-summary.html#Limitations

Thanks,
Shawn