You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/05/01 19:08:00 UTC

[jira] [Commented] (TIKA-3329) RTG Translator with many-to-eng translation

    [ https://issues.apache.org/jira/browse/TIKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17337876#comment-17337876 ] 

ASF GitHub Bot commented on TIKA-3329:
--------------------------------------

chrismattmann commented on pull request #419:
URL: https://github.com/apache/tika/pull/419#issuecomment-830679775


   going to test this today in Tika. If everything passes, I'll get it committed and then work to integrate it directly into tika python as the default translation package. Thanks @thammegowda !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> RTG Translator with many-to-eng translation
> -------------------------------------------
>
>                 Key: TIKA-3329
>                 URL: https://issues.apache.org/jira/browse/TIKA-3329
>             Project: Tika
>          Issue Type: Improvement
>          Components: translation
>            Reporter: Thamme Gowda
>            Assignee: Chris Mattmann
>            Priority: Major
>
> The existing translation services in tika-translate are either commercial/paid engines (e.g. Google, Microsoft  etc ) or not state of the art (such as Joshua, Moses etc). 
> Reader Translator Generator () is a neural machine translation toolkit [https://isi-nlp.github.io/rtg/]
>  and has the implementation of Transformer NMT model (current state of the art). 
> It also has massively multilingual pretrained NMT model  ( many-to-English translation direction)  [https://hub.docker.com/repository/docker/tgowda/rtg-model] 
> in which about 500 source languages are represented, with atleast ~300 source languages have good enough quality (For a comparison Google translate has ~106 languages, and Microsoft has about 80 languages). 
> This issue is for integrating RTG Translator into tika-translate
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)