You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org> on 2012/01/16 18:11:40 UTC

[jira] [Resolved] (TIKA-86) Support magic(5) files

     [ https://issues.apache.org/jira/browse/TIKA-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-86.
-------------------------------

    Resolution: Won't Fix

Agreed with the points above, so resolving as Won't Fix. Let's follow up in separate issue on more actionable tasks.

I looked at magic file parsing on a few occasions, but as noted most of the magic files around there are targeted for human-readable output and don't contain very comprehensive or accurate media type information. Matching such input to the needs of Tika seems more trouble than it's worth.

That said, some of the more complicated detection rules (like the regexp patterns mentioned above) could well be useful for Tika. I'd love to see contributions in that area! That would allow us to mine some of the larger magic files for specific complex patterns for reuse in our type database.
                
> Support magic(5) files
> ----------------------
>
>                 Key: TIKA-86
>                 URL: https://issues.apache.org/jira/browse/TIKA-86
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>            Reporter: Jukka Zitting
>
> Tika should have a parser for the magic(5) file format used by the file(1) command. Then we could use existing magic rules from places like http://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf/magic.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira