You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2010/07/14 19:40:51 UTC

[jira] Resolved: (NUTCH-86) LanguageIdentifier API enhancements

     [ https://issues.apache.org/jira/browse/NUTCH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann resolved NUTCH-86.
------------------------------------

         Assignee: Chris A. Mattmann  (was: Jerome Charron)
    Fix Version/s: 1.2
                   2.0
       Resolution: Won't Fix

I've reported TIKA-465 to take care of this, if it's still relevant. I need to do some research more on what was being proposed, but either way, we can do Language analysis in Tika now since it has an explicit language identifier component and since that was the one of the library's original goals. In addition, there has been no activity on this issue in 5+ years...

> LanguageIdentifier API enhancements
> -----------------------------------
>
>                 Key: NUTCH-86
>                 URL: https://issues.apache.org/jira/browse/NUTCH-86
>             Project: Nutch
>          Issue Type: Improvement
>          Components: indexer
>    Affects Versions: 0.6, 0.7, 0.8
>            Reporter: Jerome Charron
>            Assignee: Chris A. Mattmann
>            Priority: Minor
>             Fix For: 1.2, 2.0
>
>
> More informations can be found on the following thread on Nutch-Dev mailing list:
> http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg00569.html
> Summary:
> 1. LanguageIdentifier API changes. The similarity methods should return an ordered array of language-code/score pairs instead of a simple String containing the language-code.
> 2. Ensure consistency between LanguageIdentifier scoring and NGramProfile.getSimilarity().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.