You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Michael Bryant (JIRA)" <ji...@apache.org> on 2011/07/05 15:12:16 UTC
[jira] [Created] (TIKA-681) eight new n-gram language profiles
eight new n-gram language profiles
----------------------------------
Key: TIKA-681
URL: https://issues.apache.org/jira/browse/TIKA-681
Project: Tika
Issue Type: Improvement
Components: languageidentifier
Affects Versions: 1.0
Reporter: Michael Bryant
Priority: Minor
Attachments: TIKA-xxx.bryant.20110705.patch.txt
Eight new n-gram language profiles added: Belarusian, Catalan, Esperanto, Galician, Romanian, Slovak, Slovenian, and Ukrainian.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (TIKA-681) eight new n-gram language profiles
Posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-681.
--------------------------------
Resolution: Fixed
Fix Version/s: 1.0
Assignee: Jukka Zitting
Test cases would be nice, but I guess for now we're OK also without them. I committed the patch in revision 1181278. Thanks!
> eight new n-gram language profiles
> ----------------------------------
>
> Key: TIKA-681
> URL: https://issues.apache.org/jira/browse/TIKA-681
> Project: Tika
> Issue Type: Improvement
> Components: languageidentifier
> Affects Versions: 0.10
> Reporter: Michael Bryant
> Assignee: Jukka Zitting
> Priority: Minor
> Fix For: 1.0
>
> Attachments: TIKA-xxx.bryant.20110705.patch.txt
>
>
> Eight new n-gram language profiles added: Belarusian, Catalan, Esperanto, Galician, Romanian, Slovak, Slovenian, and Ukrainian.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-681) eight new n-gram language profiles
Posted by "Michael Bryant (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Bryant updated TIKA-681:
--------------------------------
Attachment: TIKA-xxx.bryant.20110705.patch.txt
> eight new n-gram language profiles
> ----------------------------------
>
> Key: TIKA-681
> URL: https://issues.apache.org/jira/browse/TIKA-681
> Project: Tika
> Issue Type: Improvement
> Components: languageidentifier
> Affects Versions: 1.0
> Reporter: Michael Bryant
> Priority: Minor
> Attachments: TIKA-xxx.bryant.20110705.patch.txt
>
>
> Eight new n-gram language profiles added: Belarusian, Catalan, Esperanto, Galician, Romanian, Slovak, Slovenian, and Ukrainian.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-681) eight new n-gram language profiles
Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060000#comment-13060000 ]
Ken Krugler commented on TIKA-681:
----------------------------------
Hi Michael,
Thanks for contributing these profiles.
It would be great if there were some unit tests that validate the profiles.
And an update to the existing unit test that confirms the new set of profiles will correctly identify their languages (see LanguageIdentifierTest).
Thanks again,
-- Ken
> eight new n-gram language profiles
> ----------------------------------
>
> Key: TIKA-681
> URL: https://issues.apache.org/jira/browse/TIKA-681
> Project: Tika
> Issue Type: Improvement
> Components: languageidentifier
> Affects Versions: 1.0
> Reporter: Michael Bryant
> Priority: Minor
> Attachments: TIKA-xxx.bryant.20110705.patch.txt
>
>
> Eight new n-gram language profiles added: Belarusian, Catalan, Esperanto, Galician, Romanian, Slovak, Slovenian, and Ukrainian.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira