You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/06/10 20:21:21 UTC

[jira] [Comment Edited] (TIKA-1358) Add support for newer iWork file formats

    [ https://issues.apache.org/jira/browse/TIKA-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325219#comment-15325219 ] 

Tim Allison edited comment on TIKA-1358 at 6/10/16 8:20 PM:
------------------------------------------------------------

I _think_ I figured out how to build the protobuf classes from [obriensp's dump|https://github.com/obriensp/iWorkFileFormat/tree/master/iWorkFileInspector/iWorkFileInspector/Messages/Proto].  I _think_ if we add those to [evernote's iwana|https://github.com/evernote/iwana], we might actually have a working parser.  If the above are true, and if we don't have any other parsers available... how should we proceed.

1) Do we want to ask evernote if they're interested in ongoing maintenance and pushing a jar to maven?
2) Should I fork evernote add in the built protobuf classes and then push to maven?
3) Should we incorporate all of this directly into Tika?

Finally, might there be licensing issues with using obriensp's dump of the protobuf definitions from Apple's applications to build the java classes?


was (Author: tallison@mitre.org):
I _think_ I figured out how to build the protobuf classes from [obriensp's dump|https://github.com/obriensp/iWorkFileFormat/tree/master/iWorkFileInspector/iWorkFileInspector/Messages/Proto].  I _think_ if we add those to the [evernote's iwana|https://github.com/evernote/iwana], we might actually have a working parser.  If the above are true, and if we don't have any other parsers available... how should we proceed.

1) Do we want to ask evernote if they're interested in ongoing maintenance and pushing a jar to maven?
2) Should I fork evernote add in the built protobuf classes and then push to maven?
3) Should we incorporate all of this directly into Tika?

Finally, might there be licensing issues with using obriensp's dump of the protobuf definitions from Apple's applications to build the java classes?

> Add support for newer iWork file formats
> ----------------------------------------
>
>                 Key: TIKA-1358
>                 URL: https://issues.apache.org/jira/browse/TIKA-1358
>             Project: Tika
>          Issue Type: Wish
>          Components: parser
>    Affects Versions: 1.5
>            Reporter: Jelle Kastelein
>              Labels: new-parser, newbie
>         Attachments: iwork13-testdocs-zips.zip, iwork13-testfiles-2014-11.zip
>
>
> IWork 2013 uses a revised file format which replaces the xml files that hold the content by .iwa files (a binary format). This file format is becoming increasingly relevant as more and more people are using apple products. However, it does not appear to work with the current IWorkPackageParser (tested with several of the example .pages files one can get from the iCloud). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)