You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by GitBox <gi...@apache.org> on 2021/03/30 19:00:08 UTC

[GitHub] [tika] arky opened a new pull request #421: [TIKA-3340] LanguageProfile for Myanmar

arky opened a new pull request #421:
URL: https://github.com/apache/tika/pull/421


   Adds Myanmar LanguageProfile for Apache Tika https://issues.apache.org/jira/browse/TIKA-3340
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] kkrugler commented on pull request #421: [TIKA-3340] LanguageProfile for Myanmar

Posted by GitBox <gi...@apache.org>.
kkrugler commented on pull request #421:
URL: https://github.com/apache/tika/pull/421#issuecomment-811120110


   Hi @arky you also need to edit the `LanguageIdentifierTest.java` file, to add `my` to the list of languages, like this:
   
   ``` java
       private static final String[] languages = new String[] {
           // TODO - currently Estonian and Greek fail these tests.
           // Enable when language detection works better.
           "da", "de", /* "et", "el", */ "en", "es", "fi", "fr", "it",
           "lt", "my", "nl", "pt", "sv"
       };
   ```
   
   And then run `mvn clean test` from the `tika/tika-core` directory.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] arky commented on pull request #421: [TIKA-3340] LanguageProfile for Myanmar

Posted by GitBox <gi...@apache.org>.
arky commented on pull request #421:
URL: https://github.com/apache/tika/pull/421#issuecomment-810887814


   @kkrugler I'll be happy to contribute test cases for Myanmar. Can you please tell me more about how to do this?
   
   Just adding 'lang_code.test' file with 100 lines of Myanamar text is enough? https://github.com/apache/tika/tree/main/tika-core/src/test/resources/org/apache/tika/language
   
   How do I verify this testcase? Just  'mvn run tests...'
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] kkrugler commented on pull request #421: [TIKA-3340] LanguageProfile for Myanmar

Posted by GitBox <gi...@apache.org>.
kkrugler commented on pull request #421:
URL: https://github.com/apache/tika/pull/421#issuecomment-811185369


   @arky - re using UDHR text...that's fine, but as per the **Permissions** section on https://www.ohchr.org/EN/UDHR/Pages/Introduction.aspx,  you would need to add attribution to the end of the Tika top-level `LICENSE.txt` file (see other examples in that file of test data).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on pull request #421: [TIKA-3340] LanguageProfile for Myanmar

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #421:
URL: https://github.com/apache/tika/pull/421#issuecomment-841330976


   @arky can you please update this PR so we can review and attempt to merge into main? Thank you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] arky commented on pull request #421: [TIKA-3340] LanguageProfile for Myanmar

Posted by GitBox <gi...@apache.org>.
arky commented on pull request #421:
URL: https://github.com/apache/tika/pull/421#issuecomment-811151917


   @kkrugler Thanks for that information, I'll add a pull request to add appropriate testcase for Myanmar and few other language that were introduced. 
   
   Any technical objections to using UDHR Burmese translated text as the testcase?
   
   https://www.ohchr.org/EN/UDHR/Pages/Language.aspx?LangID=bms


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] kkrugler commented on pull request #421: [TIKA-3340] LanguageProfile for Myanmar

Posted by GitBox <gi...@apache.org>.
kkrugler commented on pull request #421:
URL: https://github.com/apache/tika/pull/421#issuecomment-810625585


   Hi @arky - thanks for the PR! Would it be possible to add `my` to the list of languages being tested in `LanguageIdentifierTest`? You'd have to add a `tika-core/src/test/resources/org/apache/tika/language/my.test` file with Burmese as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tika] lewismc commented on pull request #421: [TIKA-3340] LanguageProfile for Myanmar

Posted by GitBox <gi...@apache.org>.
lewismc commented on pull request #421:
URL: https://github.com/apache/tika/pull/421#issuecomment-1030736202


   @arky can you please rebase?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@tika.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org