You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2016/03/06 14:22:40 UTC

[jira] [Commented] (TIKA-1887) Add new mimetype for file extensions .po

    [ https://issues.apache.org/jira/browse/TIKA-1887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15182154#comment-15182154 ] 

Nick Burch commented on TIKA-1887:
----------------------------------

http://www.icanlocalize.com/site/tutorials/how-to-translate-with-gettext-po-and-pot-files/ seems a good introduction to these formats, for those new to it all

{{text/x-gettext-translation}} and {{text/x-po}} seem to be moderately widely used for these already, so it might be good to use the former and set the latter as an alias, rather than inventing our own. (We also shouldn't use {{text/po}} as it isn't officially assigned, so would need an x- prefix to indicate this)

> Add new mimetype for file extensions .po 
> -----------------------------------------
>
>                 Key: TIKA-1887
>                 URL: https://issues.apache.org/jira/browse/TIKA-1887
>             Project: Tika
>          Issue Type: Improvement
>          Components: core, mime
>            Reporter: Manali Shah
>              Labels: mimetypes
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Hi, 
> While analyzing the Trec DD polar data, we came across files that were classified as octet-stream. 
> On using content based algorithms such as BFA, BFCC  and FHT we were able to determine more magic bytes for certain files.
> The GNU gettext toolset is used by programmers and translators at producing, updating and using translation files, mainly those PO files which are textual, editable files.
> We suggest a new mimetype as text/po to be added to the existing mime repository of Tika.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)