You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2012/10/30 01:26:12 UTC
[jira] [Created] (TIKA-1012) Add additional fields to MimeType
reader
Ryan McKinley created TIKA-1012:
-----------------------------------
Summary: Add additional fields to MimeType reader
Key: TIKA-1012
URL: https://issues.apache.org/jira/browse/TIKA-1012
Project: Tika
Issue Type: New Feature
Components: mime
Reporter: Ryan McKinley
Priority: Minor
Currently the MimeType class exposes a description (_comment). It would be nice to also expose:
* Acronym (this is already in tika-mimetypes.xml, see <acronym>BMP</acronym>)
* Links, add helper docs for some formats
* UTI, http://en.wikipedia.org/wiki/Uniform_Type_Identifier
A sample entry would look like this:
{code:xml}
<mime-type type="image/x-ms-bmp">
<alias type="image/bmp"/>
<acronym>BMP</acronym>
<_comment>Windows bitmap</_comment>
<_link>http://en.wikipedia.org/wiki/BMP_file_format</_link>
<_uti>com.microsoft.bmp</_uti>
<magic priority="50">
...
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-1012) Add additional fields to MimeType
reader
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492428#comment-13492428 ]
Jukka Zitting commented on TIKA-1012:
-------------------------------------
Looks good, though it would be better if such custom fields were namespaced as the shared mime-info database spec says: "Applications may also define their own elements, provided they are namespaced to prevent collisions."
> Add additional fields to MimeType reader
> ----------------------------------------
>
> Key: TIKA-1012
> URL: https://issues.apache.org/jira/browse/TIKA-1012
> Project: Tika
> Issue Type: New Feature
> Components: mime
> Reporter: Ryan McKinley
> Priority: Minor
> Attachments: TIKA-1012-MimeMeta.patch
>
>
> Currently the MimeType class exposes a description (_comment). It would be nice to also expose:
> * Acronym (this is already in tika-mimetypes.xml, see <acronym>BMP</acronym>)
> * Links, add helper docs for some formats
> * UTI, http://en.wikipedia.org/wiki/Uniform_Type_Identifier
> A sample entry would look like this:
> {code:xml}
> <mime-type type="image/x-ms-bmp">
> <alias type="image/bmp"/>
> <acronym>BMP</acronym>
> <_comment>Windows bitmap</_comment>
> <_link>http://en.wikipedia.org/wiki/BMP_file_format</_link>
> <_uti>com.microsoft.bmp</_uti>
> <magic priority="50">
> ...
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-1012) Add additional fields to MimeType
reader
Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan McKinley updated TIKA-1012:
--------------------------------
Attachment: TIKA-1012-MimeMeta.patch
This updates the patch to use tika namespace for custom attributes:
{code:xml}
<mime-type type="image/x-ms-bmp">
<alias type="image/bmp"/>
<acronym>BMP</acronym>
<tika:description>Windows bitmap</tika:description>
<tika:link>http://en.wikipedia.org/wiki/BMP_file_format</tika:link>
<tika:uti>com.microsoft.bmp</tika:uti>
<magic priority="50">
...
{code}
I think we should replace the use of <_comment> with <tika:description> since the value ends up in a 'description' field, not a 'comment' field.
ryan
> Add additional fields to MimeType reader
> ----------------------------------------
>
> Key: TIKA-1012
> URL: https://issues.apache.org/jira/browse/TIKA-1012
> Project: Tika
> Issue Type: New Feature
> Components: mime
> Reporter: Ryan McKinley
> Priority: Minor
> Attachments: TIKA-1012-MimeMeta.patch, TIKA-1012-MimeMeta.patch
>
>
> Currently the MimeType class exposes a description (_comment). It would be nice to also expose:
> * Acronym (this is already in tika-mimetypes.xml, see <acronym>BMP</acronym>)
> * Links, add helper docs for some formats
> * UTI, http://en.wikipedia.org/wiki/Uniform_Type_Identifier
> A sample entry would look like this:
> {code:xml}
> <mime-type type="image/x-ms-bmp">
> <alias type="image/bmp"/>
> <acronym>BMP</acronym>
> <_comment>Windows bitmap</_comment>
> <_link>http://en.wikipedia.org/wiki/BMP_file_format</_link>
> <_uti>com.microsoft.bmp</_uti>
> <magic priority="50">
> ...
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-1012) Add additional fields to MimeType
reader
Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ryan McKinley updated TIKA-1012:
--------------------------------
Attachment: TIKA-1012-MimeMeta.patch
Here is a patch adding acronym, link, and UTI.
If people like this, I'll update tike-mimetypes.xml with more data
thanks
> Add additional fields to MimeType reader
> ----------------------------------------
>
> Key: TIKA-1012
> URL: https://issues.apache.org/jira/browse/TIKA-1012
> Project: Tika
> Issue Type: New Feature
> Components: mime
> Reporter: Ryan McKinley
> Priority: Minor
> Attachments: TIKA-1012-MimeMeta.patch
>
>
> Currently the MimeType class exposes a description (_comment). It would be nice to also expose:
> * Acronym (this is already in tika-mimetypes.xml, see <acronym>BMP</acronym>)
> * Links, add helper docs for some formats
> * UTI, http://en.wikipedia.org/wiki/Uniform_Type_Identifier
> A sample entry would look like this:
> {code:xml}
> <mime-type type="image/x-ms-bmp">
> <alias type="image/bmp"/>
> <acronym>BMP</acronym>
> <_comment>Windows bitmap</_comment>
> <_link>http://en.wikipedia.org/wiki/BMP_file_format</_link>
> <_uti>com.microsoft.bmp</_uti>
> <magic priority="50">
> ...
> {code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira