You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by ni...@arisem.com on 2008/02/29 11:56:09 UTC

Proposition of a new feature: Dynamic Field Types

Dynamic field types are field types that act as proxies to other field
types. The choice of the field type to use is done on a per document basis
and is dependent of the values of the document's fields.

The use case that led us to this feature is the indexation of documents in
different languages. We use a specific analyzer for each language but want
to index semantic information that is not specific to the language.

For example, we would add in the index the semantic tag {co:Paris} for the
expressions "Paris", "capital city of France", "the city of lights" in
English and "Paris", "capitale de la France", "la ville lumière" in French.
This allows us to provide advanced functionalities such as semantic and
cross-lingual search.

To do so in SOLR, we chose to index texts written in different languages in
the same field, while analyzing them with different analyzers. Hence the
proposition of a new feature that respond to this need: Dynamic Field Types.

The idea of this new field type is to act as a proxy to other field types.
Depending of the values of some fields of the document to index, it chooses
the correct field type to use. In our situation, we use it to choose the
correct language dependent field type based on the value of the field named
"language". It is configured with a config similar to the following:

	<fieldtype name="french_ft" ...>
	...
	</fieldtype>

	<fieldtype name="english_ft" ...>
	...
	</fieldtype>

	<dynamicFieldType name="multilanguage">
		<fieldtypes>
			<fieldtype condition="language:fr"
name="french_ft"/>
			<fieldtype condition="language:en"
name="english_ft"/>
			<fieldtype condition="*:*" name="english_ft"/>
		</fieldtypes>
	</dynamicFieldType>

The last condition is used as a catch-all if preceding conditions are not
met.

What do you think of this feature?

Best regards,
Nicolas Dessaigne

Re: Proposition of a new feature: Dynamic Field Types

Posted by Grant Ingersoll <gs...@apache.org>.
Why can't you choose the proper field in your application and keep  
separate fields per language?  Putting them all in the same field,  
regardless of language, is not a good idea in my opinion because it is  
more than likely going to skew your statistics and lower your relevance.

That being said, the dynamic field type is still an interesting idea.

-Grant

On Feb 29, 2008, at 5:56 AM, nicolas.dessaigne@arisem.com wrote:

> Dynamic field types are field types that act as proxies to other field
> types. The choice of the field type to use is done on a per document  
> basis
> and is dependent of the values of the document's fields.
>
> The use case that led us to this feature is the indexation of  
> documents in
> different languages. We use a specific analyzer for each language  
> but want
> to index semantic information that is not specific to the language.
>
> For example, we would add in the index the semantic tag {co:Paris}  
> for the
> expressions "Paris", "capital city of France", "the city of lights" in
> English and "Paris", "capitale de la France", "la ville lumière" in  
> French.
> This allows us to provide advanced functionalities such as semantic  
> and
> cross-lingual search.
>
> To do so in SOLR, we chose to index texts written in different  
> languages in
> the same field, while analyzing them with different analyzers. Hence  
> the
> proposition of a new feature that respond to this need: Dynamic  
> Field Types.
>
> The idea of this new field type is to act as a proxy to other field  
> types.
> Depending of the values of some fields of the document to index, it  
> chooses
> the correct field type to use. In our situation, we use it to choose  
> the
> correct language dependent field type based on the value of the  
> field named
> "language". It is configured with a config similar to the following:
>
> 	<fieldtype name="french_ft" ...>
> 	...
> 	</fieldtype>
>
> 	<fieldtype name="english_ft" ...>
> 	...
> 	</fieldtype>
>
> 	<dynamicFieldType name="multilanguage">
> 		<fieldtypes>
> 			<fieldtype condition="language:fr"
> name="french_ft"/>
> 			<fieldtype condition="language:en"
> name="english_ft"/>
> 			<fieldtype condition="*:*" name="english_ft"/>
> 		</fieldtypes>
> 	</dynamicFieldType>
>
> The last condition is used as a catch-all if preceding conditions  
> are not
> met.
>
> What do you think of this feature?
>
> Best regards,
> Nicolas Dessaigne