You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Will Milspec <wi...@gmail.com> on 2011/08/18 23:22:35 UTC

overhead of empty, unused fields

hi all,

What are the cost of unused field types?

Our application supports multiple languages. We envision separate
Lucene/Solr fields (and field types) per language (conten_en, content_fr,
content_zh_CN,etc).

We thought of a few optons:
a) auto-generating the 'multilingual' portion of the schema based on the
application's languages,
b) include fields-and-types for all languagues


In A, if an implemenation only used French and Chinese, the schema  would
only have content_en and conten_zh_CN fields-and-types.

In B, the implementation would have all field types, but a give document
would only have two fields

A seems "more efficiient", but less work.  The downside: if a user wants to
add a language, they would need to regenerate the schema (i.e. add
fields-and-types for "ja")


How much do empty field types and fields? Do a dozen-or-so unused field
types hurt scalability of indexing or search?

thanks,

will

Re: overhead of empty, unused fields

Posted by Markus Jelsma <ma...@openindex.io>.
No problem. A document without a value for some field simply doesn't have an 
entry in the inverted index. 

> hi all,
> 
> What are the cost of unused field types?
> 
> Our application supports multiple languages. We envision separate
> Lucene/Solr fields (and field types) per language (conten_en, content_fr,
> content_zh_CN,etc).
> 
> We thought of a few optons:
> a) auto-generating the 'multilingual' portion of the schema based on the
> application's languages,
> b) include fields-and-types for all languagues
> 
> 
> In A, if an implemenation only used French and Chinese, the schema  would
> only have content_en and conten_zh_CN fields-and-types.
> 
> In B, the implementation would have all field types, but a give document
> would only have two fields
> 
> A seems "more efficiient", but less work.  The downside: if a user wants to
> add a language, they would need to regenerate the schema (i.e. add
> fields-and-types for "ja")
> 
> 
> How much do empty field types and fields? Do a dozen-or-so unused field
> types hurt scalability of indexing or search?
> 
> thanks,
> 
> will