You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Lewis John McGibbney (JIRA)" <ji...@apache.org> on 2016/10/26 02:38:58 UTC

[jira] [Resolved] (TIKA-1343) Create a Tika Translator implementation that uses JoshuaDecoder

     [ https://issues.apache.org/jira/browse/TIKA-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lewis John McGibbney resolved TIKA-1343.
----------------------------------------
    Resolution: Fixed
      Assignee: Lewis John McGibbney  (was: Chris A. Mattmann)

> Create a Tika Translator implementation that uses JoshuaDecoder
> ---------------------------------------------------------------
>
>                 Key: TIKA-1343
>                 URL: https://issues.apache.org/jira/browse/TIKA-1343
>             Project: Tika
>          Issue Type: New Feature
>          Components: translation
>            Reporter: Chris A. Mattmann
>            Assignee: Lewis John McGibbney
>             Fix For: 1.15
>
>
> The Joshua Decoder toolkit is a BSD licensed Java-based statistical machine translation system hosted at Github:
> http://joshua-decoder.org/
> Joshua takes in corpuses and trains models that can then be used to do language translation. Currently there is support for e.g., Spanisn->English, Indian dialects->English, Chinese->English, and a few others. 
> https://github.com/joshua-decoder/joshua/
> It would be nice to build a Tika Translator on top of Joshua. There are of course several issues with this:
> * the models are huge - so we'll need a separate package or Maven module, maybe tika-translate-joshua or something to release the models and we'll need to build the models. I just went through the process of building the Spanish->English one, and it still needs to be rebuilt b/c I did it wrong, but it took over a day
> * there is a configuration for Joshua, and so we need some way of passing that config into the Translator. Not sure of the best way to do this.
> * Joshua isn't in the Central repository. I've started a discussion on the Joshua lists about this: https://groups.google.com/forum/#!topic/joshua_support/9Y04miboUj0
> Anyhoo, I've got a working patch right now with hard code stuff, and a manual install into my Maven repo for brave souls out there that want to try it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)