You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/06/07 23:35:44 UTC
[jira] Commented: (TIKA-402) Support for iWork documents
[ https://issues.apache.org/jira/browse/TIKA-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12876431#action_12876431 ]
Jukka Zitting commented on TIKA-402:
------------------------------------
> XML root element detection
See the o.a.t.detect.XmlRootExtractor class and the <root-XML/> entries in the tika-mimetypes.xml configuration file.
> directory
My idea is that if you point a file system crawler to uncompressed iWork directories, we should still be able to produce reasonable output when the crawler feeds the XML file to Tika.
> Support for iWork documents
> ---------------------------
>
> Key: TIKA-402
> URL: https://issues.apache.org/jira/browse/TIKA-402
> Project: Tika
> Issue Type: New Feature
> Components: parser
> Reporter: Jukka Zitting
> Attachments: iwork.patch, iwork.patch, iwork.patch, iwork.patch, iwork.patch, testKeynote.key, testKeynote.key, testNumbers.numbers, testPages.pages
>
>
> It would be nice to have support for documents created by Apple's Keynote and Pages applications. Both file formats are described in http://developer.apple.com/mac/library/documentation/AppleApplications/Conceptual/iWork2-0_XML/Chapter01/Introduction.html. I'm not sure if there already are open source parser libraries for these formats or if we'd need to directly process the XML content.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.