You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/07/06 15:19:49 UTC

[jira] Reopened: (TIKA-402) Support for iWork documents

     [ https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting reopened TIKA-402:
--------------------------------


Reopening for a minor test failure on Java 5, see revision 960892. It looks like in some cases the parser loses whitespace between words. This is probably related to the way the XML parser works in the underlying Java version. Perhaps a distinction between characters() and ignorableWhitespace() calls.

> Support for iWork documents
> ---------------------------
>
>                 Key: TIKA-402
>                 URL: https://issues.apache.org/jira/browse/TIKA-402
>             Project: Tika
>          Issue Type: New Feature
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.8
>
>         Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, iwork.patch, iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers, testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and Pages applications. Both file formats are described in http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html. I'm not sure if there already are open source parser libraries for these formats or if we'd need to directly process the XML content.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.