You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2016/10/19 16:23:03 UTC

[jira] [Updated] (TIKA-1328) Translate Metadata and Content

     [ https://issues.apache.org/jira/browse/TIKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-1328:
------------------------------------
    Fix Version/s:     (was: 1.14)
                   1.15

> Translate Metadata and Content
> ------------------------------
>
>                 Key: TIKA-1328
>                 URL: https://issues.apache.org/jira/browse/TIKA-1328
>             Project: Tika
>          Issue Type: New Feature
>          Components: translation
>            Reporter: Tyler Palsulich
>             Fix For: 1.15
>
>
> Right now, Translation is only done on Strings. Ideally, users would be able to "turn on" translation while parsing. I can think of a couple options:
> - Make a TranslateAutoDetectParser. Automatically detect the file type, parse it, then translate the content.
> - Make a Context switch. When true, translate the content regardless of the parser used. I'm not sure the best way to go about this method, but I prefer it over another Parser.
> Regardless, we need a black or white list for translation. I think black list would be the way to go -- which fields should not be translated (dates, versions, ...) Any ideas? Also, somewhat unrelated, does anyone know of any other open source translation libraries? If we were really lucky, it wouldn't depend on an online service.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)