You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "A. Steven Anderson" <st...@asanderson.com> on 2009/12/29 20:19:46 UTC

performance question

Greetings!

Is there any significant negative performance impact of using a
dynamicField?

Likewise for multivalued fields?

The reason why I ask is that our system basically aggregates data from many
disparate data sources (structured, unstructured, and semi-structured), and
the management of the schema.xml has become unwieldy; i.e. we currently have
dozens of fields which grows every time we add a new data source.

I was considering redefining the domain model outside of Solr which would be
used to generate the fields for the indexing process and the metadata (e.g.
display names) for the search process.

Thoughts?
-- 
A. Steven Anderson
Independent Consultant
steve@asanderson.com

Re: performance question

Posted by "A. Steven Anderson" <a....@gmail.com>.

> You don't lose copyField capability with dynamic fields.  You can copy
> dynamic fields into a fixed field name like *_s => text or dynamic fields
> into another dynamic field like  *_s => *_t


Ahhh...I missed that little detail.  Nice!

Ok, so there are no negatives to using dynamic fields then. ;-)

Thanks for all the info!

-- 
A. Steven Anderson
Independent Consultant
steve@asanderson.com

Re: performance question

Posted by Erik Hatcher <er...@gmail.com>.

You don't lose copyField capability with dynamic fields.  You can copy  
dynamic fields into a fixed field name like *_s => text or dynamic  
fields into another dynamic field like  *_s => *_t

	Erik

On Jan 6, 2010, at 9:35 AM, A. Steven Anderson wrote:

>> Strictly speaking there is some insignificant distinctions in  
>> performance
>> related to how a field name is resolved -- Grant alluded to this
>> earlier in this thread -- but it only comes into play when you  
>> actually
>> refer to that field by name and Solr has to "look them up" in the
>> metadata.  So for example if your request refered to 100 differnet  
>> field
>> names in the q, fq, and facet.field params there would be a small  
>> overhead
>> for any of those 100 fields that existed because of <dynamicField/>
>> declarations, that would not exist for any of those fields that were
>> declared using <field/> -- but there would be no added overhead to  
>> htat
>> query if there were 9999999 other fields that existed in your index
>> because of that same <dynamicField/> declaration.
>>
>> But frankly: we're getting talking about seriously ridiculous
>> "pico-optimizing" at this point ... if you find yourselv with  
>> performance
>> concerns there are probaly 500 other things worth worrying about  
>> before
>> this should ever cross your mind.
>>
>
> Thanks for the follow up.
>
> I've converted our schema to required fields only with every other  
> field
> being a dynamic field.
>
> The only negative that I've found so far is that you lose the  
> copyField
> capability, so it makes my ingest a little bigger, since I have to  
> manually
> copy the values myself.
>
> -- 
> A. Steven Anderson
> Independent Consultant
> steve@asanderson.com

Re: performance question

Posted by "A. Steven Anderson" <a....@gmail.com>.

> Strictly speaking there is some insignificant distinctions in performance
> related to how a field name is resolved -- Grant alluded to this
> earlier in this thread -- but it only comes into play when you actually
> refer to that field by name and Solr has to "look them up" in the
> metadata.  So for example if your request refered to 100 differnet field
> names in the q, fq, and facet.field params there would be a small overhead
> for any of those 100 fields that existed because of <dynamicField/>
> declarations, that would not exist for any of those fields that were
> declared using <field/> -- but there would be no added overhead to htat
> query if there were 9999999 other fields that existed in your index
> because of that same <dynamicField/> declaration.
>
> But frankly: we're getting talking about seriously ridiculous
> "pico-optimizing" at this point ... if you find yourselv with performance
> concerns there are probaly 500 other things worth worrying about before
> this should ever cross your mind.
>

Thanks for the follow up.

I've converted our schema to required fields only with every other field
being a dynamic field.

The only negative that I've found so far is that you lose the copyField
capability, so it makes my ingest a little bigger, since I have to manually
copy the values myself.

-- 
A. Steven Anderson
Independent Consultant
steve@asanderson.com

Re: performance question

Posted by Chris Hostetter <ho...@fucit.org>.

: > So, in general, there is no *significant* performance difference with using
: > dynamic fields. Correct?
: 
: Correct.  There's not even really an "insignificant" performance difference.
: A dynamic field is the same as a regular field in practically every way on the
: search side of things.

Strictly speaking there is some insignificant distinctions in performance 
related to how a field name is resolved -- Grant alluded to this 
earlier in this thread -- but it only comes into play when you actually 
refer to that field by name and Solr has to "look them up" in the 
metadata.  So for example if your request refered to 100 differnet field 
names in the q, fq, and facet.field params there would be a small overhead 
for any of those 100 fields that existed because of <dynamicField/> 
declarations, that would not exist for any of those fields that were 
declared using <field/> -- but there would be no added overhead to htat 
query if there were 9999999 other fields that existed in your index 
because of that same <dynamicField/> declaration.

But frankly: we're getting talking about seriously ridiculous 
"pico-optimizing" at this point ... if you find yourselv with performance 
concerns there are probaly 500 other things worth worrying about before 
this should ever cross your mind.





-Hoss

Re: performance question

Posted by Erik Hatcher <er...@gmail.com>.

On Jan 4, 2010, at 12:04 AM, A. Steven Anderson wrote:

>>
>> dynamic fields don't make it worse ... the number of actaul field  
>> names
>> you sort on makes it worse.
>>
>> If you sort on 100 fields, the cost is the same regardless of  
>> wether all
>> 100 of those fields exist because of a single <dynamicField/>  
>> declaration,
>> or 100 distinct <field/> declarations.
>>
>
> Ahh...thanks for the clarification.
>
> So, in general, there is no *significant* performance difference  
> with using
> dynamic fields. Correct?

Correct.  There's not even really an "insignificant" performance  
difference.  A dynamic field is the same as a regular field in  
practically every way on the search side of things.

	Erik

Re: performance question

Posted by "A. Steven Anderson" <a....@gmail.com>.

>
> dynamic fields don't make it worse ... the number of actaul field names
> you sort on makes it worse.
>
> If you sort on 100 fields, the cost is the same regardless of wether all
> 100 of those fields exist because of a single <dynamicField/> declaration,
> or 100 distinct <field/> declarations.
>

Ahh...thanks for the clarification.

So, in general, there is no *significant* performance difference with using
dynamic fields. Correct?


-- 
A. Steven Anderson
Independent Consultant
steve@asanderson.com

Re: performance question

Posted by Chris Hostetter <ho...@fucit.org>.

: > If you sort on many of your dynamic fields your memory use will
: > explode, and the same with index norms and disk space.

: Thanks for the info.  In general, I knew sorting was expensive, but I didn't
: realize that dynamic fields made it worse.

dynamic fields don't make it worse ... the number of actaul field names 
you sort on makes it worse.  

If you sort on 100 fields, the cost is the same regardless of wether all 
100 of those fields exist because of a single <dynamicField/> declaration, 
or 100 distinct <field/> declarations.


-Hoss

Re: performance question

Posted by "A. Steven Anderson" <a....@gmail.com>.

> Sorting and index norms have space penalties.
> Sorting on a field creates an array of Java ints, one for every
> document in the index. Index norms (used for boosting documents and
> other things) create an array of bytes in the Lucene index files, one
> for every document in the index.
> If you sort on many of your dynamic fields your memory use will
> explode, and the same with index norms and disk space.


Thanks for the info.  In general, I knew sorting was expensive, but I didn't
realize that dynamic fields made it worse.

-- 
A. Steven Anderson
Independent Consultant
steve@asanderson.com

Re: performance question

Posted by Lance Norskog <go...@gmail.com>.

Sorting and index norms have space penalties.

Sorting on a field creates an array of Java ints, one for every
document in the index. Index norms (used for boosting documents and
other things) create an array of bytes in the Lucene index files, one
for every document in the index.

If you sort on many of your dynamic fields your memory use will
explode, and the same with index norms and disk space.

On Wed, Dec 30, 2009 at 6:54 AM, A. Steven Anderson
<a....@gmail.com> wrote:
>> There can be an impact if you are searching against a lot of fields or if
>> you are indexing a lot of fields on every document, but for the most part in
>> most applications it is negligible.
>>
>
> We index a lot of fields at one time, but we can tolerate the performance
> impact at index time.
>
> It probably can't hurt to be more streamlined, but without knowing more
>> about your model, it's hard to say.  I've built apps that were totally
>> dynamic field based and they worked just fine, but these were more for
>> discovery than just pure search.  In other words, the user was interacting
>> with the system in a reflective model that selected which fields to search
>> on.
>>
>
> Our application is as much about discovery as search, so this is good to
> know.
>
> Thanks for the feedback. It was very helpful.
> --
> A. Steven Anderson
> Independent Consultant
> steve@asanderson.com
>



-- 
Lance Norskog
goksron@gmail.com

Re: performance question

Posted by "A. Steven Anderson" <a....@gmail.com>.

> There can be an impact if you are searching against a lot of fields or if
> you are indexing a lot of fields on every document, but for the most part in
> most applications it is negligible.
>

We index a lot of fields at one time, but we can tolerate the performance
impact at index time.

It probably can't hurt to be more streamlined, but without knowing more
> about your model, it's hard to say.  I've built apps that were totally
> dynamic field based and they worked just fine, but these were more for
> discovery than just pure search.  In other words, the user was interacting
> with the system in a reflective model that selected which fields to search
> on.
>

Our application is as much about discovery as search, so this is good to
know.

Thanks for the feedback. It was very helpful.
-- 
A. Steven Anderson
Independent Consultant
steve@asanderson.com

Re: performance question

Posted by Grant Ingersoll <gs...@apache.org>.

On Dec 29, 2009, at 2:19 PM, A. Steven Anderson wrote:

> Greetings!
> 
> Is there any significant negative performance impact of using a
> dynamicField?

There can be an impact if you are searching against a lot of fields or if you are indexing a lot of fields on every document, but for the most part in most applications it is negligible. 

> 
> Likewise for multivalued fields?

No.  Multivalued fields are just concatenated together with a large position gap underneath the hood.

> 
> The reason why I ask is that our system basically aggregates data from many
> disparate data sources (structured, unstructured, and semi-structured), and
> the management of the schema.xml has become unwieldy; i.e. we currently have
> dozens of fields which grows every time we add a new data source.
> 
> I was considering redefining the domain model outside of Solr which would be
> used to generate the fields for the indexing process and the metadata (e.g.
> display names) for the search process.
> 
> Thoughts?

It probably can't hurt to be more streamlined, but without knowing more about your model, it's hard to say.  I've built apps that were totally dynamic field based and they worked just fine, but these were more for discovery than just pure search.  In other words, the user was interacting with the system in a reflective model that selected which fields to search on.

-Grant

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search