You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jan Høydahl (JIRA)" <ji...@apache.org> on 2018/03/15 09:11:00 UTC

[jira] [Comment Edited] (SOLR-11774) langid.map.individual won't work with langid.map.keepOrig

    [ https://issues.apache.org/jira/browse/SOLR-11774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16400111#comment-16400111 ] 

Jan Høydahl edited comment on SOLR-11774 at 3/15/18 9:10 AM:
-------------------------------------------------------------

See [PR 336|https://github.com/apache/lucene-solr/pull/336] for failing test.

My plan for fixing this is:
 * Change
{{protected abstract List<DetectedLanguage> detectLanguage(SolrInputDocument content);}} 
...to... 
 {{protected abstract List<DetectedLanguage> detectLanguage(Reader content);}}
 * New method in {{LanguageIdentifierUpdateProcessor}} 
 {{protected Reader solrDocReader(SolrInputDocument doc, String[] fields)}}
 This will replace {{concatFields()}} and retrieve just enough field data to satisfy the reader
 * To detect language for one field only, return a reader for one field only
 {{detectLanguage(solrDocReader(doc, fieldName))}}
 * The implementations become simpler, and the default LangDetectLIURP can take advantage of using the {{public void append(Reader reader)}} method

This is a breaking API change, but since the class is still tagged as {{@lucene.experimental}} we are allowed to do that, not?


was (Author: janhoy):
See [PR 336|https://github.com/apache/lucene-solr/pull/336] for failing test.

My plan for fixing this is:
 * Change 
{{protected abstract List<DetectedLanguage> detectLanguage(SolrInputDocument content); }}to 
{{protected abstract List<DetectedLanguage> detectLanguage(Reader content);}}
 * New method in {{LanguageIdentifierUpdateProcessor}} 
{{protected Reader solrDocReader(SolrInputDocument doc, String[] fields)}}
This will replace {{concatFields()}} and retrieve just enough field data to satisfy the reader
 * To detect language for one field only, return a reader for one field only
{{detectLanguage(solrDocReader(doc, fieldName))}}
 * The implementations become simpler, and the default LangDetectLIURP can take advantage of using the {{public void append(Reader reader)}} method

This is a breaking API change, but since the class is still tagged as {{@lucene.experimental}} we are allowed to do that, not?

> langid.map.individual won't work with langid.map.keepOrig
> ---------------------------------------------------------
>
>                 Key: SOLR-11774
>                 URL: https://issues.apache.org/jira/browse/SOLR-11774
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: contrib - LangId
>    Affects Versions: 5.0
>            Reporter: Marco Remy
>            Assignee: Jan Høydahl
>            Priority: Minor
>             Fix For: 6.6.4, 7.4, master (8.0)
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Tried to get language detection to work.
> *Setting:*
> {code:xml}
> <processor class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
>       <str name="langid.fl">title,author</str>
>       <str name="langid.langsField">detected_languages</str>
>       <str name="langid.whitelist">de,en</str>
>       <str name="langid.fallback">txt</str>
>       <bool name="langid.map">true</bool>
>       <bool name="langid.map.individual">true</bool>
>       <bool name="langid.map.keepOrig">true</bool>
>     </processor>
> {code}
> Main purpose
> * Map fields individually
> * Keep the original field
> But the fields won't be mapped individually. They are mapped to a single detected language. After some hours of investigation i finally found the reason: *The option langid.map.keepOrig breaks the individual mapping function.* Only if it is disabled the fields will be mapped as expected.
> - Regards



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org