You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2013/02/03 14:40:12 UTC
[jira] [Created] (TIKA-1072) AIOOBE when handling embedded document
in .doc file
Michael McCandless created TIKA-1072:
----------------------------------------
Summary: AIOOBE when handling embedded document in .doc file
Key: TIKA-1072
URL: https://issues.apache.org/jira/browse/TIKA-1072
Project: Tika
Issue Type: Bug
Reporter: Michael McCandless
Fix For: 1.4
Attachments: 20-Force-on-a-current-S00.doc
I have a Word (.doc) document that hits an exception when I run:
{noformat}
java -jar tika-app/target/tika-app-1.4-SNAPSHOT.jar /x/tmp/20-Force-on-a-current-S00.doc
{noformat}
Here's the exception:
{noformat}
Caused by: java.lang.ArrayIndexOutOfBoundsException: 40
at org.apache.poi.util.LittleEndian.getShort(LittleEndian.java:225)
at org.apache.poi.poifs.filesystem.Ole10Native.<init>(Ole10Native.java:139)
at org.apache.poi.poifs.filesystem.Ole10Native.createFromEmbeddedOleObject(Ole10Native.java:89)
at org.apache.tika.parser.microsoft.AbstractPOIFSExtractor.handleEmbeddedOfficeDoc(AbstractPOIFSExtractor.java:149)
at org.apache.tika.parser.microsoft.WordExtractor.parse(WordExtractor.java:135)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:186)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:161)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
{noformat}
It happens when we try to parse an OLE10 embedded object ... the code
that does this parsing captures and ignores Ole10NativeException and
skips the entry ... so I'm wondering if we should also catch AIOOBE
and skip the entry? Ie, maybe this entry really is not OLE10, and the
Ole10Native code is failing to throw Ole10NativeException for it?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira