You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Martijn van Groningen (JIRA)" <ji...@apache.org> on 2010/07/04 21:02:49 UTC

[jira] Updated: (TIKA-402) Support for iWork documents

     [ https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated TIKA-402:
---------------------------------------

    Attachment: iwork.patch

Jukka, I made some refactorings in the new attached patch in order the get rid of the IWorkRootElementDetectContentHandler class. Basically the IWorkParser only parses the relevant IWork xml files (i configured the xml documents to the parser with root-XML element). I created IWorkPackageParser class that deals with the container format file (*.keynote|pages|numbers). In this way if a IWork document is uncompressed or somehow put in a different archive file it can still be parsed.

> Support for iWork documents
> ---------------------------
>
>                 Key: TIKA-402
>                 URL: https://issues.apache.org/jira/browse/TIKA-402
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>         Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, iwork.patch, iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers, testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and Pages applications. Both file formats are described in http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html. I'm not sure if there already are open source parser libraries for these formats or if we'd need to directly process the XML content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.