You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by "Jack Krupansky (JIRA)" <ji...@apache.org> on 2013/07/02 17:20:20 UTC

[jira] [Comment Edited] (SOLR-4412) LanguageIdentifier lcmap for language field

    [ https://issues.apache.org/jira/browse/SOLR-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13697873#comment-13697873 ] 

Jack Krupansky edited comment on SOLR-4412 at 7/2/13 3:19 PM:
--------------------------------------------------------------

>From the original generic description, I got the impression that this issue would cover BOTH language identifier processors, but the final patch covers only one of them - it doesn't add the feature uniformly to the Tika Language Identifier update processor.

Was this intentional or simply an oversight?

If intentional, what is the reasoning?

And the wiki update does not mention that the new feature covers only one of the two implementations, even though the wiki in general covers both implementations.
                
      was (Author: jkrupan):
    From the original generic description, I got the impression that this issue would cover BOTH language identifier processors, but the final patch covers only one of them - it doesn't add the feature uniformly to the Tika Language Identifier update processor.

Was this intentional or simply an oversight?

If intentional, what is the reasoning?

                  
> LanguageIdentifier lcmap for language field
> -------------------------------------------
>
>                 Key: SOLR-4412
>                 URL: https://issues.apache.org/jira/browse/SOLR-4412
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - LangId
>    Affects Versions: 4.1
>            Reporter: Jan Høydahl
>            Assignee: Jan Høydahl
>             Fix For: 5.0, 4.4
>
>         Attachments: SOLR-4412.patch
>
>
> For some languages, the detector will detect sub-languages, such as LangDetect detecting zh-tw or zh-cn for Chinese. Tika detector only detects zh. Today you can use {{lcmap}} to map these two into one code, e.g. {{langid.map.lcmap=zh-cn:zh zh-tw:zh}}. But the {{langField}} output is not changed.
> We need an option for {{langField}} as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org