You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Antoni Mylka (Created) (JIRA)" <ji...@apache.org> on 2011/12/20 16:51:30 UTC

[jira] [Created] (TIKA-821) Support detecting old MIcrosoft Works Word Processor formats

Support detecting old MIcrosoft Works Word Processor formats
------------------------------------------------------------

                 Key: TIKA-821
                 URL: https://issues.apache.org/jira/browse/TIKA-821
             Project: Tika
          Issue Type: Improvement
          Components: mime
    Affects Versions: 1.1
            Reporter: Antoni Mylka
            Assignee: Antoni Mylka


An issue similar to TIKA-812. This time it's about old Works Word Processor formats. They use an OLE2 structure, but the top-level entry is called "MatOST", they are not supported by the OfficeParser. I would like to:

 # Add a magic to tika-mimetypes.xml to mark the file as ms-works if "MatOST" is found. (After TIKA-806 we officially like those).
 # Add an 'if' to POIFSContainerDetector to look for MatOST.

I'm not creating a separate media type for this (like I did in TIKA-812) because no parser supports it anyway. In TIKA-812 it was necessary, because ExcelParser can't work with all vnd.ms-works files but can work with 7.0 spreadsheets. In this case there is no gain in a separate mime type.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-821) Support detecting old MIcrosoft Works Word Processor formats

Posted by "Antoni Mylka (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173267#comment-13173267 ] 

Antoni Mylka commented on TIKA-821:
-----------------------------------

Committed in r1221323
                
> Support detecting old MIcrosoft Works Word Processor formats
> ------------------------------------------------------------
>
>                 Key: TIKA-821
>                 URL: https://issues.apache.org/jira/browse/TIKA-821
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.1
>            Reporter: Antoni Mylka
>            Assignee: Antoni Mylka
>
> An issue similar to TIKA-812. This time it's about old Works Word Processor formats. They use an OLE2 structure, but the top-level entry is called "MatOST", they are not supported by the OfficeParser. I would like to:
>  # Add a magic to tika-mimetypes.xml to mark the file as ms-works if "MatOST" is found. (After TIKA-806 we officially like those).
>  # Add an 'if' to POIFSContainerDetector to look for MatOST.
> I'm not creating a separate media type for this (like I did in TIKA-812) because no parser supports it anyway. In TIKA-812 it was necessary, because ExcelParser can't work with all vnd.ms-works files but can work with 7.0 spreadsheets. In this case there is no gain in a separate mime type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira