You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Xiaohong Yang (Jira)" <ji...@apache.org> on 2020/06/04 13:42:00 UTC

[jira] [Created] (TIKA-3107) AutoDetectParser.parse failed with error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read"

Xiaohong Yang created TIKA-3107:
-----------------------------------

             Summary: AutoDetectParser.parse failed with error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read"
                 Key: TIKA-3107
                 URL: https://issues.apache.org/jira/browse/TIKA-3107
             Project: Tika
          Issue Type: Bug
          Components: metadata, parser
    Affects Versions: 1.24
            Reporter: Xiaohong Yang
         Attachments: SOJ.NW.00092712.xls

When I try to get the metadata of the sample excel file with the AutoDetectParser.parse method with the following Java code, I got an error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read".

 

InputStream input = new FileInputStream(localFilePath);

BodyContentHandler handler = = new BodyContentHandler(-1);

Metadata metadata = new Metadata();

TikaConfig config = TikaConfigFactory.getTikaConfig();

Parser autoDetectParser = new AutoDetectParser(config);

ParseContext context = new ParseContext();

context.set(TikaConfig.class, config);

autoDetectParser.parse(input, handler, metadata, context);

 

Here is the stack trace:

 

org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2caa5ec

       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)

       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

       at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)

       …

       at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)

       at java.util.concurrent.FutureTask.run(FutureTask.java)

       at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

       at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

       at java.lang.Thread.run(Thread.java:748)

Caused by: org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read.

       at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:188)

       at org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:233)

       at org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57)

       at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:158)

       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183)

       at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)

       at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)

       ... 15 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)