You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ken Krugler (JIRA)" <ji...@apache.org> on 2010/07/14 23:26:53 UTC
[jira] Assigned: (TIKA-465) LanguageIdentifier API enhancements
[ https://issues.apache.org/jira/browse/TIKA-465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ken Krugler reassigned TIKA-465:
--------------------------------
Assignee: Ken Krugler (was: Chris A. Mattmann)
> LanguageIdentifier API enhancements
> -----------------------------------
>
> Key: TIKA-465
> URL: https://issues.apache.org/jira/browse/TIKA-465
> Project: Tika
> Issue Type: Improvement
> Components: languageidentifier
> Reporter: Chris A. Mattmann
> Assignee: Ken Krugler
> Priority: Minor
>
> As originally reported by Jerome Charron in NUTCH-86, Jerome identified a set of improvements for the LanguageIdentifier that we should consider in Tika:
> {quote}
> More informations can be found on the following thread on Nutch-Dev mailing list:
> http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg00569.html
> Summary:
> 1. LanguageIdentifier API changes. The similarity methods should return an ordered array of language-code/score pairs instead of a simple String containing the language-code.
> 2. Ensure consistency between LanguageIdentifier scoring and NGramProfile.getSimilarity().
> {quote}
> I just wanted to capture the issue here in Tika, since I'm about to close it out in Nutch since LanguageIdentification is something that can happen in Tika-ville...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.