You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@joshua.apache.org by "Tommaso Teofili (JIRA)" <ji...@apache.org> on 2019/03/15 07:33:00 UTC
[jira] [Created] (JOSHUA-341) Integrated Transliteration
Tommaso Teofili created JOSHUA-341:
--------------------------------------
Summary: Integrated Transliteration
Key: JOSHUA-341
URL: https://issues.apache.org/jira/browse/JOSHUA-341
Project: Joshua
Issue Type: Task
Components: core, language packs
Reporter: Tommaso Teofili
Many of the language packs released translated from languages with non-Latin scripts. Words that cannot be translated are therefore pushed through to the translation and cannot even be read by someone who doesn't know that script. At the same time, many untranslatable words are simply transliterated words. For example, an Arabic word might be an English word (like a name or technical term) that has simply been written in Arabic. These words can be transliterated. It would be good to add built-in transliteration models that can be applied to all out-of-vocabulary words and enabled for certain languages. Transliteration models can be built over the same bitext using techniques like Sajjad, Fraser, and Schmid (2012) [1].
[1] : http://www.anthology.aclweb.org/P/P12/P12-1049.pdf
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)