You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2016/05/04 12:06:12 UTC

[jira] [Commented] (TIKA-1966) Issue in parsing iWorksDocument with Apache Tika

    [ https://issues.apache.org/jira/browse/TIKA-1966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15270524#comment-15270524 ] 

Tim Allison commented on TIKA-1966:
-----------------------------------

[~sachin086], thank you for opening this issue and sharing test docs.  Our current iWorks parsers are expecting uncompressed xml.  These appear to contain .iwa files, which, I think, are compressed by Snappy.

With trunk, we're identifying these only as .zip files so they aren't even being routed through our IWorksParsers.

This won't be a quick fix (not making it into 1.13, I don't think), but this is a really important find.  Thank you!

> Issue in parsing iWorksDocument with Apache Tika
> ------------------------------------------------
>
>                 Key: TIKA-1966
>                 URL: https://issues.apache.org/jira/browse/TIKA-1966
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.12
>         Environment: Ubuntu 15
>            Reporter: Sachin Shaju
>         Attachments: budget.numbers, connors_20040127.key, pages.pages, sample code
>
>
> I was trying to parse iWorksDoc with Apache Tika. But am not getting parsed content as it is instead getting some other output from the content handler. Code snippet that I've used is attached with this.
> Output :-
> Contents of the file :
> Index/Document.iwa
> Index/ViewState.iwa
> Index/CalculationEngine.iwa
> Index/Tables/HeaderStorageBucket-2.iwa
> Index/Tables/Tile.iwa
> Index/Metadata.iwa
> Metadata/Properties.plist
> I'm able to detect the file type using Detector api correctly. But am not getting the useful content out of the document.
> I'm attaching the iWorks docs that I've tested with (made with latest version of iOS). I got it working when testing with older versions. Thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)