You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Ryan McKinley (JIRA)" <ji...@apache.org> on 2012/10/30 01:26:12 UTC

[jira] [Created] (TIKA-1012) Add additional fields to MimeType reader

Ryan McKinley created TIKA-1012:
-----------------------------------

             Summary: Add additional fields to MimeType reader
                 Key: TIKA-1012
                 URL: https://issues.apache.org/jira/browse/TIKA-1012
             Project: Tika
          Issue Type: New Feature
          Components: mime
            Reporter: Ryan McKinley
            Priority: Minor


Currently the MimeType class exposes a description (_comment).  It would be nice to also expose:
 * Acronym (this is already in tika-mimetypes.xml, see <acronym>BMP</acronym>)
 * Links, add helper docs for some formats
 * UTI, http://en.wikipedia.org/wiki/Uniform_Type_Identifier

A sample entry would look like this:
{code:xml}
 <mime-type type="image/x-ms-bmp">
    <alias type="image/bmp"/>
    <acronym>BMP</acronym>
    <_comment>Windows bitmap</_comment>
    <_link>http://en.wikipedia.org/wiki/BMP_file_format</_link>
    <_uti>com.microsoft.bmp</_uti>
    <magic priority="50">
      ...
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-1012) Add additional fields to MimeType reader

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13492428#comment-13492428 ] 

Jukka Zitting commented on TIKA-1012:
-------------------------------------

Looks good, though it would be better if such custom fields were namespaced as the shared mime-info database spec says: "Applications may also define their own elements, provided they are namespaced to prevent collisions."
                
> Add additional fields to MimeType reader
> ----------------------------------------
>
>                 Key: TIKA-1012
>                 URL: https://issues.apache.org/jira/browse/TIKA-1012
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>            Reporter: Ryan McKinley
>            Priority: Minor
>         Attachments: TIKA-1012-MimeMeta.patch
>
>
> Currently the MimeType class exposes a description (_comment).  It would be nice to also expose:
>  * Acronym (this is already in tika-mimetypes.xml, see <acronym>BMP</acronym>)
>  * Links, add helper docs for some formats
>  * UTI, http://en.wikipedia.org/wiki/Uniform_Type_Identifier
> A sample entry would look like this:
> {code:xml}
>  <mime-type type="image/x-ms-bmp">
>     <alias type="image/bmp"/>
>     <acronym>BMP</acronym>
>     <_comment>Windows bitmap</_comment>
>     <_link>http://en.wikipedia.org/wiki/BMP_file_format</_link>
>     <_uti>com.microsoft.bmp</_uti>
>     <magic priority="50">
>       ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TIKA-1012) Add additional fields to MimeType reader

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated TIKA-1012:
--------------------------------

    Attachment: TIKA-1012-MimeMeta.patch

This updates the patch to use tika namespace for custom attributes:
{code:xml}
<mime-type type="image/x-ms-bmp">
    <alias type="image/bmp"/>
    <acronym>BMP</acronym>
    <tika:description>Windows bitmap</tika:description>
    <tika:link>http://en.wikipedia.org/wiki/BMP_file_format</tika:link>
    <tika:uti>com.microsoft.bmp</tika:uti>
    <magic priority="50">
...
{code}

I think we should replace the use of <_comment> with <tika:description> since the value ends up in a 'description' field, not a 'comment' field.

ryan


                
> Add additional fields to MimeType reader
> ----------------------------------------
>
>                 Key: TIKA-1012
>                 URL: https://issues.apache.org/jira/browse/TIKA-1012
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>            Reporter: Ryan McKinley
>            Priority: Minor
>         Attachments: TIKA-1012-MimeMeta.patch, TIKA-1012-MimeMeta.patch
>
>
> Currently the MimeType class exposes a description (_comment).  It would be nice to also expose:
>  * Acronym (this is already in tika-mimetypes.xml, see <acronym>BMP</acronym>)
>  * Links, add helper docs for some formats
>  * UTI, http://en.wikipedia.org/wiki/Uniform_Type_Identifier
> A sample entry would look like this:
> {code:xml}
>  <mime-type type="image/x-ms-bmp">
>     <alias type="image/bmp"/>
>     <acronym>BMP</acronym>
>     <_comment>Windows bitmap</_comment>
>     <_link>http://en.wikipedia.org/wiki/BMP_file_format</_link>
>     <_uti>com.microsoft.bmp</_uti>
>     <magic priority="50">
>       ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TIKA-1012) Add additional fields to MimeType reader

Posted by "Ryan McKinley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley updated TIKA-1012:
--------------------------------

    Attachment: TIKA-1012-MimeMeta.patch

Here is a patch adding acronym, link, and UTI.

If people like this, I'll update tike-mimetypes.xml with more data

thanks

                
> Add additional fields to MimeType reader
> ----------------------------------------
>
>                 Key: TIKA-1012
>                 URL: https://issues.apache.org/jira/browse/TIKA-1012
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>            Reporter: Ryan McKinley
>            Priority: Minor
>         Attachments: TIKA-1012-MimeMeta.patch
>
>
> Currently the MimeType class exposes a description (_comment).  It would be nice to also expose:
>  * Acronym (this is already in tika-mimetypes.xml, see <acronym>BMP</acronym>)
>  * Links, add helper docs for some formats
>  * UTI, http://en.wikipedia.org/wiki/Uniform_Type_Identifier
> A sample entry would look like this:
> {code:xml}
>  <mime-type type="image/x-ms-bmp">
>     <alias type="image/bmp"/>
>     <acronym>BMP</acronym>
>     <_comment>Windows bitmap</_comment>
>     <_link>http://en.wikipedia.org/wiki/BMP_file_format</_link>
>     <_uti>com.microsoft.bmp</_uti>
>     <magic priority="50">
>       ...
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira