You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Marco Remy (JIRA)" <ji...@apache.org> on 2019/03/29 14:43:00 UTC

[jira] [Updated] (SOLR-13356) Language detection per value

     [ https://issues.apache.org/jira/browse/SOLR-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marco Remy updated SOLR-13356:
------------------------------
    Description: 
Hello,

We are using the _LangDetect_ language detection processor with individual field mapping.
{code:xml}
<processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
  ...
  <bool name="langid.map">true</bool>
  <bool name="langid.map.individual">true</bool>
</processor>
{code}
If a (simple structured) document is indexed containing different languages in a +multivalued field+, only one language will be predicted.

eg:
{code:xml}
<doc>
  <field>This is any text</field>
  <field>Das ist irgendein Text</field>
</doc>
{code}
The result will be either {{field_en}} or {{field_de}} and both values are mapped into that localized field. In effect some values won't be analyzed properly according to their actual language.

As enhancement, the detection should be available per value on multivalued fields. So their values can be mapped individually.

  was:
Hello,

We are using the _LangDetect_ language detection processor with individual field mapping.
{code:xml}
<processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
  ...
  <bool name="langid.map">true</bool>
  <bool name="langid.map.individual">true</bool>
</processor>
{code}
If a (simple structured) document is indexed containing different languages in a +multivalued field+, only one language will be predicted.

eg:
{code:xml}
<doc>
  <field>This is any text</field>
  <field>Das ist irgendein Text</field>
</doc>
{code}
The result will be either {{field_en}} or {{field_de}} and both values are mapped into that localized field. In effect some values won't be analyzed properly according to their actual language.

As enhancement, the detection should be available per value on multivalued fields. So their values of can be mapped individually.


> Language detection per value
> ----------------------------
>
>                 Key: SOLR-13356
>                 URL: https://issues.apache.org/jira/browse/SOLR-13356
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LangId
>            Reporter: Marco Remy
>            Priority: Minor
>              Labels: UpdateProcessor, detection, language
>
> Hello,
> We are using the _LangDetect_ language detection processor with individual field mapping.
> {code:xml}
> <processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
>   ...
>   <bool name="langid.map">true</bool>
>   <bool name="langid.map.individual">true</bool>
> </processor>
> {code}
> If a (simple structured) document is indexed containing different languages in a +multivalued field+, only one language will be predicted.
> eg:
> {code:xml}
> <doc>
>   <field>This is any text</field>
>   <field>Das ist irgendein Text</field>
> </doc>
> {code}
> The result will be either {{field_en}} or {{field_de}} and both values are mapped into that localized field. In effect some values won't be analyzed properly according to their actual language.
> As enhancement, the detection should be available per value on multivalued fields. So their values can be mapped individually.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org