You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Michael Bryant (JIRA)" <ji...@apache.org> on 2011/07/05 15:12:16 UTC

[jira] [Created] (TIKA-681) eight new n-gram language profiles

eight new n-gram language profiles
----------------------------------

                 Key: TIKA-681
                 URL: https://issues.apache.org/jira/browse/TIKA-681
             Project: Tika
          Issue Type: Improvement
          Components: languageidentifier
    Affects Versions: 1.0
            Reporter: Michael Bryant
            Priority: Minor
         Attachments: TIKA-xxx.bryant.20110705.patch.txt

Eight new n-gram language profiles added: Belarusian, Catalan, Esperanto, Galician, Romanian, Slovak, Slovenian, and Ukrainian. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-681) eight new n-gram language profiles

Posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-681.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0
         Assignee: Jukka Zitting

Test cases would be nice, but I guess for now we're OK also without them. I committed the patch in revision 1181278. Thanks!
                
> eight new n-gram language profiles
> ----------------------------------
>
>                 Key: TIKA-681
>                 URL: https://issues.apache.org/jira/browse/TIKA-681
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>    Affects Versions: 0.10
>            Reporter: Michael Bryant
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 1.0
>
>         Attachments: TIKA-xxx.bryant.20110705.patch.txt
>
>
> Eight new n-gram language profiles added: Belarusian, Catalan, Esperanto, Galician, Romanian, Slovak, Slovenian, and Ukrainian. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-681) eight new n-gram language profiles

Posted by "Michael Bryant (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Bryant updated TIKA-681:
--------------------------------

    Attachment: TIKA-xxx.bryant.20110705.patch.txt

> eight new n-gram language profiles
> ----------------------------------
>
>                 Key: TIKA-681
>                 URL: https://issues.apache.org/jira/browse/TIKA-681
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>    Affects Versions: 1.0
>            Reporter: Michael Bryant
>            Priority: Minor
>         Attachments: TIKA-xxx.bryant.20110705.patch.txt
>
>
> Eight new n-gram language profiles added: Belarusian, Catalan, Esperanto, Galician, Romanian, Slovak, Slovenian, and Ukrainian. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-681) eight new n-gram language profiles

Posted by "Ken Krugler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13060000#comment-13060000 ] 

Ken Krugler commented on TIKA-681:
----------------------------------

Hi Michael,

Thanks for contributing these profiles.

It would be great if there were some unit tests that validate the profiles.

And an update to the existing unit test that confirms the new set of profiles will correctly identify their languages (see LanguageIdentifierTest).

Thanks again,

-- Ken

> eight new n-gram language profiles
> ----------------------------------
>
>                 Key: TIKA-681
>                 URL: https://issues.apache.org/jira/browse/TIKA-681
>             Project: Tika
>          Issue Type: Improvement
>          Components: languageidentifier
>    Affects Versions: 1.0
>            Reporter: Michael Bryant
>            Priority: Minor
>         Attachments: TIKA-xxx.bryant.20110705.patch.txt
>
>
> Eight new n-gram language profiles added: Belarusian, Catalan, Esperanto, Galician, Romanian, Slovak, Slovenian, and Ukrainian. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira