You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Chris A. Mattmann (JIRA)" <ji...@apache.org> on 2014/10/25 06:49:36 UTC
[jira] [Updated] (TIKA-1072) AIOOBE when handling embedded document
in .doc file
[ https://issues.apache.org/jira/browse/TIKA-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris A. Mattmann updated TIKA-1072:
------------------------------------
Fix Version/s: (was: 1.7)
1.8
- push to 1.8
> AIOOBE when handling embedded document in .doc file
> ---------------------------------------------------
>
> Key: TIKA-1072
> URL: https://issues.apache.org/jira/browse/TIKA-1072
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Michael McCandless
> Fix For: 1.8
>
> Attachments: 20-Force-on-a-current-S00.doc, Ole10NativeEntry.bin
>
>
> I have a Word (.doc) document that hits an exception when I run:
> {noformat}
> java -jar tika-app/target/tika-app-1.4-SNAPSHOT.jar /x/tmp/20-Force-on-a-current-S00.doc
> {noformat}
> Here's the exception:
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 40
> at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:225)
> at org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:139)
> at org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(Ole10Native.java:89)
> at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:149)
> at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135)
> at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
> at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> {noformat}
> It happens when we try to parse an OLE10 embedded object ... the code
> that does this parsing captures and ignores Ole10NativeException and
> skips the entry ... so I'm wondering if we should also catch AIOOBE
> and skip the entry? Ie, maybe this entry really is not OLE10, and the
> Ole10Native code is failing to throw Ole10NativeException for it?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)