You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Xiaohong Yang (Jira)" <ji...@apache.org> on 2020/06/04 13:42:00 UTC
[jira] [Created] (TIKA-3107) AutoDetectParser.parse failed with
error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes
remaining still to be read"
Xiaohong Yang created TIKA-3107:
-----------------------------------
Summary: AutoDetectParser.parse failed with error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read"
Key: TIKA-3107
URL: https://issues.apache.org/jira/browse/TIKA-3107
Project: Tika
Issue Type: Bug
Components: metadata, parser
Affects Versions: 1.24
Reporter: Xiaohong Yang
Attachments: SOJ.NW.00092712.xls
When I try to get the metadata of the sample excel file with the AutoDetectParser.parse method with the following Java code, I got an error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read".
InputStream input = new FileInputStream(localFilePath);
BodyContentHandler handler = = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
TikaConfig config = TikaConfigFactory.getTikaConfig();
Parser autoDetectParser = new AutoDetectParser(config);
ParseContext context = new ParseContext();
context.set(TikaConfig.class, config);
autoDetectParser.parse(input, handler, metadata, context);
Here is the stack trace:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2caa5ec
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
…
at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
at java.util.concurrent.FutureTask.run(FutureTask.java)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read.
at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:188)
at org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:233)
at org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57)
at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:158)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
... 15 more
--
This message was sent by Atlassian Jira
(v8.3.4#803005)