You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/10/06 22:33:29 UTC

[jira] [Commented] (TIKA-746) Support custom mime types

    [ https://issues.apache.org/jira/browse/TIKA-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122257#comment-13122257 ] 

Nick Burch commented on TIKA-746:
---------------------------------

First pass at solving this committed in r1179829.

MimeTypesFactory has been changed to optionally accept multiple URLs or InputStreams. If given file paths, the first is treated as the main resource as before. If a second is given, all of these that can be found are loaded.

By default now, when asking for the default MimeTypes, Tika will load tika-mimetypes.xml as before, but will also load any custom-mimetypes.xml files it finds. 

For people with application specific internal mimetypes, they can add a custom-mimetypes.xml file with their parser, which defines their custom mimetypes

(Leaving open for now, pending a review by other committers)
                
> Support custom mime types
> -------------------------
>
>                 Key: TIKA-746
>                 URL: https://issues.apache.org/jira/browse/TIKA-746
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 0.10
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>             Fix For: 1.0
>
>
> As discussed over the summer <http://lucene.472066.n3.nabble.com/Appending-Mime-Types-td3266434.html> there are legitimate cases for wanting to load in extra, custom mimetypes (and their matching rules) to Tika
> Discussions seem to conclude that the built in tika-mimetypes file should be used for common and public mimetypes, and people wanting to support additional public formats should open an issue to have them in the main file. People who want only a very restricted set of mimetypes can use a custom tika config with a limited, smaller mimetypes file
> For people who want to load one or two extra, likely custom mimetypes, we should provide a service loading system to pull in the extra mimetypes. This allows for the regular mimetypes file to be used for most files, and the extra custom ones merged in as needed. It also allows for a custom parser to provide the mimetype detection for the specific custom formats it handles

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira