You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2018/03/15 08:51:00 UTC

[jira] [Commented] (SOLR-11774) langid.map.individual won't work with langid.map.keepOrig

    [ https://issues.apache.org/jira/browse/SOLR-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400096#comment-16400096 ] 

Jan Høydahl commented on SOLR-11774:
------------------------------------

This is broken by SOLR-3381 which was introduced in Solr 5.0. The problem is that method {{detectLanguage(String text)}} was replaced with {{detectLanguage(SolrInputDocument doc)}} but the one place where detection per individual field happened was modified from detecting on the text of one field to detecting the whole document ([https://github.com/apache/lucene-solr/blob/03095ce4d20060a1c63570d8a5214e9858693080/solr/contrib/langid/src/java/org/apache/solr/update/processor/LanguageIdentifierUpdateProcessor.java#L243)] which means that all fields get the same treatment.

> langid.map.individual won't work with langid.map.keepOrig
> ---------------------------------------------------------
>
>                 Key: SOLR-11774
>                 URL: https://issues.apache.org/jira/browse/SOLR-11774
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LangId
>    Affects Versions: 6.5
>            Reporter: Marco Remy
>            Priority: Minor
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tried to get language detection to work.
> *Setting:*
> {code:xml}
> <processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>       <str name="langid.fl">title,author</str>
>       <str name="langid.langsField">detected_languages</str>
>       <str name="langid.whitelist">de,en</str>
>       <str name="langid.fallback">txt</str>
>       <bool name="langid.map">true</bool>
>       <bool name="langid.map.individual">true</bool>
>       <bool name="langid.map.keepOrig">true</bool>
>     </processor>
> {code}
> Main purpose
> * Map fields individually
> * Keep the original field
> But the fields won't be mapped individually. They are mapped to a single detected language. After some hours of investigation i finally found the reason: *The option langid.map.keepOrig breaks the individual mapping function.* Only if it is disabled the fields will be mapped as expected.
> - Regards



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org