You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Keith R. Bennett (JIRA)" <ji...@apache.org> on 2007/09/12 23:32:32 UTC

[jira] Created: (TIKA-14) MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.

MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.
---------------------------------------------------------------------------------------

                 Key: TIKA-14
                 URL: https://issues.apache.org/jira/browse/TIKA-14
             Project: Tika
          Issue Type: New Feature
          Components: general
    Affects Versions: 0.1-incubator
            Reporter: Keith R. Bennett
             Fix For: 0.1-incubator


MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.

Because of this, it is not possible to parse OpenOffice files at this time.  I did some brief research, and could not find a mime type for Open Office files.  There was a comment that the mime type associated with these files is application/zip, since Open Office document files are zipped files.  That, of course, will not help us, since it would not be reasonable for us to assume that all zip files have Open Office content.

It is possible that there is now a mime type for Open Office documents, and I just could not find it.  (I hope so.)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-14) MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.

Posted by "Keith R. Bennett (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith R. Bennett updated TIKA-14:
---------------------------------

    Attachment: tika-14.patch

Adds the Open Office file extension to the list of support file types, with:

+        } else if (name.endsWith(".odt")) {
+            return "application/vnd.oasis.opendocument.text";

Also adds more thorough testing in the test class.


> MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.
> ---------------------------------------------------------------------------------------
>
>                 Key: TIKA-14
>                 URL: https://issues.apache.org/jira/browse/TIKA-14
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>         Attachments: tika-14.patch
>
>
> MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.
> Because of this, it is not possible to parse OpenOffice files at this time.  I did some brief research, and could not find a mime type for Open Office files.  There was a comment that the mime type associated with these files is application/zip, since Open Office document files are zipped files.  That, of course, will not help us, since it would not be reasonable for us to assume that all zip files have Open Office content.
> It is possible that there is now a mime type for Open Office documents, and I just could not find it.  (I hope so.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-14) MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.

Posted by "Bertrand Delacretaz (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-14?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bertrand Delacretaz resolved TIKA-14.
-------------------------------------

    Resolution: Fixed

Applied this last patch in revision 575896, and expanded MimeTypesUtilsTest as suggested.

Thanks for your contributions!

> MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.
> ---------------------------------------------------------------------------------------
>
>                 Key: TIKA-14
>                 URL: https://issues.apache.org/jira/browse/TIKA-14
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>         Attachments: tika-14.patch
>
>
> MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.
> Because of this, it is not possible to parse OpenOffice files at this time.  I did some brief research, and could not find a mime type for Open Office files.  There was a comment that the mime type associated with these files is application/zip, since Open Office document files are zipped files.  That, of course, will not help us, since it would not be reasonable for us to assume that all zip files have Open Office content.
> It is possible that there is now a mime type for Open Office documents, and I just could not find it.  (I hope so.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-14) MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.

Posted by "Thilo Goetz (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12526993 ] 

Thilo Goetz commented on TIKA-14:
---------------------------------

I'd say application/vnd.oasis.opendocument.text for .odt files, but I'm not an expert.  Here are some links that google turned up:

http://framework.openoffice.org/documentation/mimetypes/mimetypes.html
http://books.evc-cit.info/ch01.php#mimetype-table

Here's the IANA page listing the opendocument mime types (and everything else under the sun):

http://www.iana.org/assignments/media-types/application/

There is also quite a bit of discussion around this on the OOo forums, but some of it is quite old and predates the .odt days.

--Thilo





> MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.
> ---------------------------------------------------------------------------------------
>
>                 Key: TIKA-14
>                 URL: https://issues.apache.org/jira/browse/TIKA-14
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>
> MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.
> Because of this, it is not possible to parse OpenOffice files at this time.  I did some brief research, and could not find a mime type for Open Office files.  There was a comment that the mime type associated with these files is application/zip, since Open Office document files are zipped files.  That, of course, will not help us, since it would not be reasonable for us to assume that all zip files have Open Office content.
> It is possible that there is now a mime type for Open Office documents, and I just could not find it.  (I hope so.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-14) MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.

Posted by "Keith R. Bennett (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527536 ] 

Keith R. Bennett commented on TIKA-14:
--------------------------------------

I haven't researched the links Thilo provided, but I noticed that when I uploaded the sample .odt file to Jira (see TIKA-16), Jira itself assigned it a mime type of application/vnd.oasis.opendocument.text (as Thilo also suggested).  The folks at Atlassian (makers of Jira and Confluence) seem to really know what they're doing, and I'd consider their choice of mime type pretty reliable.

How about we use that for now unless and until we come up with a better one?

- Keith



> MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.
> ---------------------------------------------------------------------------------------
>
>                 Key: TIKA-14
>                 URL: https://issues.apache.org/jira/browse/TIKA-14
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 0.1-incubator
>            Reporter: Keith R. Bennett
>             Fix For: 0.1-incubator
>
>
> MimeTypeUtils.getMimeType() returns the default mime type for .odt (Open Office) files.
> Because of this, it is not possible to parse OpenOffice files at this time.  I did some brief research, and could not find a mime type for Open Office files.  There was a comment that the mime type associated with these files is application/zip, since Open Office document files are zipped files.  That, of course, will not help us, since it would not be reasonable for us to assume that all zip files have Open Office content.
> It is possible that there is now a mime type for Open Office documents, and I just could not find it.  (I hope so.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.