You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2014/06/26 10:39:24 UTC

[jira] [Commented] (TIKA-1358) Add support for newer iWork file formats

    [ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044468#comment-14044468 ] 

Nick Burch commented on TIKA-1358:
----------------------------------

First thing we'd probably want is to re-create the current test documents in the new format

Then we need to identify the container / wrapper format that holds the text

Finally we can look at adding in a parser, either written ourselves, or based on a suitably licensed existing Java library for the formats

> Add support for newer iWork file formats
> ----------------------------------------
>
>                 Key: TIKA-1358
>                 URL: https://issues.apache.org/jira/browse/TIKA-1358
>             Project: Tika
>          Issue Type: Wish
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Jelle Kastelein
>              Labels: newbie
>
> IWork 2013 uses a revised file format which replaces the xml files that hold the content by .iwa files (a binary format). This file format is becoming increasingly relevant as more and more people are using apple products. However, it does not appear to work with the current IWorkPackageParser (tested with several of the example .pages files one can get from the iCloud). 



--
This message was sent by Atlassian JIRA
(v6.2#6252)