You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Created) (JIRA)" <ji...@apache.org> on 2011/10/06 22:29:30 UTC
[jira] [Created] (TIKA-746) Support custom mime types
Support custom mime types
-------------------------
Key: TIKA-746
URL: https://issues.apache.org/jira/browse/TIKA-746
Project: Tika
Issue Type: Improvement
Components: mime
Affects Versions: 0.10
Reporter: Nick Burch
Assignee: Nick Burch
Fix For: 1.0
As discussed over the summer <http://lucene.472066.n3.nabble.com/Appending-Mime-Types-td3266434.html> there are legitimate cases for wanting to load in extra, custom mimetypes (and their matching rules) to Tika
Discussions seem to conclude that the built in tika-mimetypes file should be used for common and public mimetypes, and people wanting to support additional public formats should open an issue to have them in the main file. People who want only a very restricted set of mimetypes can use a custom tika config with a limited, smaller mimetypes file
For people who want to load one or two extra, likely custom mimetypes, we should provide a service loading system to pull in the extra mimetypes. This allows for the regular mimetypes file to be used for most files, and the extra custom ones merged in as needed. It also allows for a custom parser to provide the mimetype detection for the specific custom formats it handles
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-746) Support custom mime types
Posted by "Chris A. Mattmann (Updated) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated TIKA-746:
-----------------------------------
Fix Version/s: (was: 1.0)
1.1
- push out to 1.1: prep for 1.0.
> Support custom mime types
> -------------------------
>
> Key: TIKA-746
> URL: https://issues.apache.org/jira/browse/TIKA-746
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 0.10
> Reporter: Nick Burch
> Assignee: Nick Burch
> Fix For: 1.1
>
>
> As discussed over the summer <http://lucene.472066.n3.nabble.com/Appending-Mime-Types-td3266434.html> there are legitimate cases for wanting to load in extra, custom mimetypes (and their matching rules) to Tika
> Discussions seem to conclude that the built in tika-mimetypes file should be used for common and public mimetypes, and people wanting to support additional public formats should open an issue to have them in the main file. People who want only a very restricted set of mimetypes can use a custom tika config with a limited, smaller mimetypes file
> For people who want to load one or two extra, likely custom mimetypes, we should provide a service loading system to pull in the extra mimetypes. This allows for the regular mimetypes file to be used for most files, and the extra custom ones merged in as needed. It also allows for a custom parser to provide the mimetype detection for the specific custom formats it handles
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-746) Support custom mime types
Posted by "Nick Burch (Commented) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122257#comment-13122257 ]
Nick Burch commented on TIKA-746:
---------------------------------
First pass at solving this committed in r1179829.
MimeTypesFactory has been changed to optionally accept multiple URLs or InputStreams. If given file paths, the first is treated as the main resource as before. If a second is given, all of these that can be found are loaded.
By default now, when asking for the default MimeTypes, Tika will load tika-mimetypes.xml as before, but will also load any custom-mimetypes.xml files it finds.
For people with application specific internal mimetypes, they can add a custom-mimetypes.xml file with their parser, which defines their custom mimetypes
(Leaving open for now, pending a review by other committers)
> Support custom mime types
> -------------------------
>
> Key: TIKA-746
> URL: https://issues.apache.org/jira/browse/TIKA-746
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 0.10
> Reporter: Nick Burch
> Assignee: Nick Burch
> Fix For: 1.0
>
>
> As discussed over the summer <http://lucene.472066.n3.nabble.com/Appending-Mime-Types-td3266434.html> there are legitimate cases for wanting to load in extra, custom mimetypes (and their matching rules) to Tika
> Discussions seem to conclude that the built in tika-mimetypes file should be used for common and public mimetypes, and people wanting to support additional public formats should open an issue to have them in the main file. People who want only a very restricted set of mimetypes can use a custom tika config with a limited, smaller mimetypes file
> For people who want to load one or two extra, likely custom mimetypes, we should provide a service loading system to pull in the extra mimetypes. This allows for the regular mimetypes file to be used for most files, and the extra custom ones merged in as needed. It also allows for a custom parser to provide the mimetype detection for the specific custom formats it handles
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (TIKA-746) Support custom mime types
Posted by "Jukka Zitting (Resolved) (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-746.
--------------------------------
Resolution: Fixed
Fix Version/s: (was: 1.1)
1.0
There was a backwards compatibility issue with existing
client binaries that used the previous create() methods in
MimeTypesFactory. I restored those methods as wrappers around
the new ones in revision 1189126.
Overall the solution looks good to me, so resolving as fixed for 1.0.
> Support custom mime types
> -------------------------
>
> Key: TIKA-746
> URL: https://issues.apache.org/jira/browse/TIKA-746
> Project: Tika
> Issue Type: Improvement
> Components: mime
> Affects Versions: 0.10
> Reporter: Nick Burch
> Assignee: Nick Burch
> Fix For: 1.0
>
>
> As discussed over the summer <http://lucene.472066.n3.nabble.com/Appending-Mime-Types-td3266434.html> there are legitimate cases for wanting to load in extra, custom mimetypes (and their matching rules) to Tika
> Discussions seem to conclude that the built in tika-mimetypes file should be used for common and public mimetypes, and people wanting to support additional public formats should open an issue to have them in the main file. People who want only a very restricted set of mimetypes can use a custom tika config with a limited, smaller mimetypes file
> For people who want to load one or two extra, likely custom mimetypes, we should provide a service loading system to pull in the extra mimetypes. This allows for the regular mimetypes file to be used for most files, and the extra custom ones merged in as needed. It also allows for a custom parser to provide the mimetype detection for the specific custom formats it handles
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira