You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2022/01/10 14:23:00 UTC
[jira] [Commented] (TIKA-3634) Failed to Parser Apple related files
[ https://issues.apache.org/jira/browse/TIKA-3634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17472053#comment-17472053 ]
Tim Allison commented on TIKA-3634:
-----------------------------------
Thank you for submitting the bug and sharing triggering files.
A couple of items unrelated to the problem:
* AppleSingleFileParser does not handle iworks files. That is for a completely unrelated file format: [https://en.wikipedia.org/wiki/AppleSingle_and_AppleDouble_formats]
* You shouldn't need to add: tika-parser-zip-commons,tika-parser-apple-module. These should be included in tika-parsers-standard-package. If they're not, that's a serious problem. Please open a different ticket.
I regret I'm still not clear on what we need to fix.
With Tika 1.28, I get {{application/vnd.apple.unknown.13}} for the *.numbers file and *.pages file; I get {{application/vnd.apple.keynote.13}} for the .key file. No attachments or text are extracted from any of those.
With Tika 2.2.1, I get {{application/vnd.apple.unknown.13}} all three (*.pages, *.key , *.numbers files), but then the packageparser parses all embedded files that Tika supports.
What is the desired behavior?
> Failed to Parser Apple related files
> ------------------------------------
>
> Key: TIKA-3634
> URL: https://issues.apache.org/jira/browse/TIKA-3634
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 2.2.1
> Reporter: Tika User
> Assignee: Tim Allison
> Priority: Blocker
> Attachments: brochure.pages, keynotecreated.key, mortgagecalculator.numbers
>
>
> Unable to parse '.Number', '.key', '.pages' file using below class in xml file(org.apache.tika.parser.apple.AppleSingleFileParser)
> Getting unkown mimetype : application/vnd.apple.unknown.13
> Using all these modules :
> tika-core,tika-parsers-standard-package,tika-parser-microsoft-module,tika-parser-sqlite3-package,tika-parser-scientific-module,tika-parser-zip-commons,tika-parser-apple-module
--
This message was sent by Atlassian Jira
(v8.20.1#820001)