You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ken Krugler (JIRA)" <ji...@apache.org> on 2011/07/05 18:55:16 UTC

[jira] [Commented] (TIKA-681) eight new n-gram language profiles

    [ https://issues.apache.org/jira/browse/TIKA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060000#comment-13060000 ] 

Ken Krugler commented on TIKA-681:
----------------------------------

Hi Michael,

Thanks for contributing these profiles.

It would be great if there were some unit tests that validate the profiles.

And an update to the existing unit test that confirms the new set of profiles will correctly identify their languages (see LanguageIdentifierTest).

Thanks again,

-- Ken

> eight new n-gram language profiles
> ----------------------------------
>
>                 Key: TIKA-681
>                 URL: https://issues.apache.org/jira/browse/TIKA-681
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>    Affects Versions: 1.0
>            Reporter: Michael Bryant
>            Priority: Minor
>         Attachments: TIKA-xxx.bryant.20110705.patch.txt
>
>
> Eight new n-gram language profiles added: Belarusian, Catalan, Esperanto, Galician, Romanian, Slovak, Slovenian, and Ukrainian. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira