You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/06 22:29:30 UTC

[jira] [Created] (TIKA-746) Support custom mime types

Support custom mime types
-------------------------

                 Key: TIKA-746
                 URL: https://issues.apache.org/jira/browse/TIKA-746
             Project: Tika
          Issue Type: Improvement
          Components: mime
    Affects Versions: 0.10
            Reporter: Nick Burch
            Assignee: Nick Burch
             Fix For: 1.0


As discussed over the summer <http://lucene.472066.n3.nabble.com/Appending-Mime-Types-td3266434.html> there are legitimate cases for wanting to load in extra, custom mimetypes (and their matching rules) to Tika

Discussions seem to conclude that the built in tika-mimetypes file should be used for common and public mimetypes, and people wanting to support additional public formats should open an issue to have them in the main file. People who want only a very restricted set of mimetypes can use a custom tika config with a limited, smaller mimetypes file

For people who want to load one or two extra, likely custom mimetypes, we should provide a service loading system to pull in the extra mimetypes. This allows for the regular mimetypes file to be used for most files, and the extra custom ones merged in as needed. It also allows for a custom parser to provide the mimetype detection for the specific custom formats it handles

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-746) Support custom mime types

Posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann updated TIKA-746:
-----------------------------------

    Fix Version/s:     (was: 1.0)
                   1.1

- push out to 1.1: prep for 1.0.
                
> Support custom mime types
> -------------------------
>
>                 Key: TIKA-746
>                 URL: https://issues.apache.org/jira/browse/TIKA-746
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 0.10
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>             Fix For: 1.1
>
>
> As discussed over the summer <http://lucene.472066.n3.nabble.com/Appending-Mime-Types-td3266434.html> there are legitimate cases for wanting to load in extra, custom mimetypes (and their matching rules) to Tika
> Discussions seem to conclude that the built in tika-mimetypes file should be used for common and public mimetypes, and people wanting to support additional public formats should open an issue to have them in the main file. People who want only a very restricted set of mimetypes can use a custom tika config with a limited, smaller mimetypes file
> For people who want to load one or two extra, likely custom mimetypes, we should provide a service loading system to pull in the extra mimetypes. This allows for the regular mimetypes file to be used for most files, and the extra custom ones merged in as needed. It also allows for a custom parser to provide the mimetype detection for the specific custom formats it handles

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-746) Support custom mime types

Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122257#comment-13122257 ] 

Nick Burch commented on TIKA-746:
---------------------------------

First pass at solving this committed in r1179829.

MimeTypesFactory has been changed to optionally accept multiple URLs or InputStreams. If given file paths, the first is treated as the main resource as before. If a second is given, all of these that can be found are loaded.

By default now, when asking for the default MimeTypes, Tika will load tika-mimetypes.xml as before, but will also load any custom-mimetypes.xml files it finds. 

For people with application specific internal mimetypes, they can add a custom-mimetypes.xml file with their parser, which defines their custom mimetypes

(Leaving open for now, pending a review by other committers)
                
> Support custom mime types
> -------------------------
>
>                 Key: TIKA-746
>                 URL: https://issues.apache.org/jira/browse/TIKA-746
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 0.10
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>             Fix For: 1.0
>
>
> As discussed over the summer <http://lucene.472066.n3.nabble.com/Appending-Mime-Types-td3266434.html> there are legitimate cases for wanting to load in extra, custom mimetypes (and their matching rules) to Tika
> Discussions seem to conclude that the built in tika-mimetypes file should be used for common and public mimetypes, and people wanting to support additional public formats should open an issue to have them in the main file. People who want only a very restricted set of mimetypes can use a custom tika config with a limited, smaller mimetypes file
> For people who want to load one or two extra, likely custom mimetypes, we should provide a service loading system to pull in the extra mimetypes. This allows for the regular mimetypes file to be used for most files, and the extra custom ones merged in as needed. It also allows for a custom parser to provide the mimetype detection for the specific custom formats it handles

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-746) Support custom mime types

Posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-746.
--------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 1.1)
                   1.0

There was a backwards compatibility issue with existing
client binaries that used the previous create() methods in
MimeTypesFactory. I restored those methods as wrappers around
the new ones in revision 1189126.

Overall the solution looks good to me, so resolving as fixed for 1.0.

                
> Support custom mime types
> -------------------------
>
>                 Key: TIKA-746
>                 URL: https://issues.apache.org/jira/browse/TIKA-746
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 0.10
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>             Fix For: 1.0
>
>
> As discussed over the summer <http://lucene.472066.n3.nabble.com/Appending-Mime-Types-td3266434.html> there are legitimate cases for wanting to load in extra, custom mimetypes (and their matching rules) to Tika
> Discussions seem to conclude that the built in tika-mimetypes file should be used for common and public mimetypes, and people wanting to support additional public formats should open an issue to have them in the main file. People who want only a very restricted set of mimetypes can use a custom tika config with a limited, smaller mimetypes file
> For people who want to load one or two extra, likely custom mimetypes, we should provide a service loading system to pull in the extra mimetypes. This allows for the regular mimetypes file to be used for most files, and the extra custom ones merged in as needed. It also allows for a custom parser to provide the mimetype detection for the specific custom formats it handles

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira