You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Antoni Mylka (JIRA)" <ji...@apache.org> on 2010/12/01 00:06:15 UTC

[jira] Created: (TIKA-562) In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent

In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent
----------------------------------------------------------------------------

                 Key: TIKA-562
                 URL: https://issues.apache.org/jira/browse/TIKA-562
             Project: Tika
          Issue Type: Bug
            Reporter: Antoni Mylka


A couple of file types have application/x-tika-msoffice as their parent, when they should have application/x-tika-ooxml. This error is exhibited when you try to identify those files with both name and data. The data is found to be x-tika-ooxml, while the type found with the name is correct, but since it's not a subtype of x-tika-ooxml - it is not returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-562) In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-562.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.9
         Assignee: Jukka Zitting

Thanks! Patch committed in revision 1040902.

For background, I recall getting the impression from somewhere that the macro-enabled formats would still have been based on the old OLE2 format which is why the MIME data was originally set up that way. Now that I actually looked at the files it's obvious that they're based on OOXML.

> In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-562
>                 URL: https://issues.apache.org/jira/browse/TIKA-562
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Antoni Mylka
>            Assignee: Jukka Zitting
>             Fix For: 0.9
>
>         Attachments: ooxml-children.patch
>
>
> A couple of file types have application/x-tika-msoffice as their parent, when they should have application/x-tika-ooxml. This error is exhibited when you try to identify those files with both name and data. The data is found to be x-tika-ooxml, while the type found with the name is correct, but since it's not a subtype of x-tika-ooxml - it is not returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-562) In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965501#action_12965501 ] 

Nick Burch commented on TIKA-562:
---------------------------------

Do you have some example files for these?

(If they're included in one of the other Tika issues you've filed but we've yet to apply, appologies, just point us at them!)

> In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-562
>                 URL: https://issues.apache.org/jira/browse/TIKA-562
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Antoni Mylka
>         Attachments: ooxml-children.patch
>
>
> A couple of file types have application/x-tika-msoffice as their parent, when they should have application/x-tika-ooxml. This error is exhibited when you try to identify those files with both name and data. The data is found to be x-tika-ooxml, while the type found with the name is correct, but since it's not a subtype of x-tika-ooxml - it is not returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-562) In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent

Posted by "Antoni Mylka (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Antoni Mylka updated TIKA-562:
------------------------------

    Attachment: ooxml-children.patch

> In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-562
>                 URL: https://issues.apache.org/jira/browse/TIKA-562
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Antoni Mylka
>         Attachments: ooxml-children.patch
>
>
> A couple of file types have application/x-tika-msoffice as their parent, when they should have application/x-tika-ooxml. This error is exhibited when you try to identify those files with both name and data. The data is found to be x-tika-ooxml, while the type found with the name is correct, but since it's not a subtype of x-tika-ooxml - it is not returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-562) In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent

Posted by "Antoni Mylka (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12965590#action_12965590 ] 

Antoni Mylka commented on TIKA-562:
-----------------------------------

Your unit tests test identification by name and by data. This problem is exhibited when you try to identify a file using both name and data (quite a common case). The patch modifies five mime type definitions. Four of them already have their examples in test-documents. These are: testEXCEL.xlsb, testPPT.pptm, testPPT.potm, testPPT.pptm. Only ppam is missing. Will see what I can do.

> In tika-mimetypes.xml OpenXML types should have x-tika-ooxml as their parent
> ----------------------------------------------------------------------------
>
>                 Key: TIKA-562
>                 URL: https://issues.apache.org/jira/browse/TIKA-562
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Antoni Mylka
>         Attachments: ooxml-children.patch
>
>
> A couple of file types have application/x-tika-msoffice as their parent, when they should have application/x-tika-ooxml. This error is exhibited when you try to identify those files with both name and data. The data is found to be x-tika-ooxml, while the type found with the name is correct, but since it's not a subtype of x-tika-ooxml - it is not returned.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.