You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Jira)" <ji...@apache.org> on 2020/06/05 02:59:00 UTC

[jira] [Commented] (TIKA-3107) AutoDetectParser.parse failed with error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read"

    [ https://issues.apache.org/jira/browse/TIKA-3107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126324#comment-17126324 ] 

Nick Burch commented on TIKA-3107:
----------------------------------

This is a bug in Apache POI, one of the libraries that Tika depends on. Any chance you could report it there? [https://bz.apache.org/bugzilla/enter_bug.cgi?product=POI]

It'd also be helpful to know where the file came from (what software generated it), if Excel gives any warnings when it opens it, and if the problem goes away if you do a Save-As from Excel?

> AutoDetectParser.parse failed with error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read"
> -----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3107
>                 URL: https://issues.apache.org/jira/browse/TIKA-3107
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata, parser
>    Affects Versions: 1.24
>            Reporter: Xiaohong Yang
>            Priority: Critical
>         Attachments: SOJ.NW.00092712.xls
>
>
> When I try to get the metadata of the sample excel file with the AutoDetectParser.parse method with the following Java code, I got an error "Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read".
>  
> InputStream input = new FileInputStream(localFilePath);
> BodyContentHandler handler = = new BodyContentHandler(-1);
> Metadata metadata = new Metadata();
> TikaConfig config = TikaConfigFactory.getTikaConfig();
> Parser autoDetectParser = new AutoDetectParser(config);
> ParseContext context = new ParseContext();
> context.set(TikaConfig.class, config);
> autoDetectParser.parse(input, handler, metadata, context);
>  
> Here is the stack trace:
>  
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@2caa5ec
>        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
>        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>        …
>        at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266)
>        at java.util.concurrent.FutureTask.run(FutureTask.java)
>        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>        at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.poi.hssf.record.RecordInputStream$LeftoverDataException: Initialisation of record 0x85(BoundSheetRecord) left 28 bytes remaining still to be read.
>        at org.apache.poi.hssf.record.RecordInputStream.hasNextRecord(RecordInputStream.java:188)
>        at org.apache.poi.hssf.extractor.OldExcelExtractor.getText(OldExcelExtractor.java:233)
>        at org.apache.tika.parser.microsoft.OldExcelParser.parse(OldExcelParser.java:57)
>        at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:158)
>        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183)
>        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131)
>        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>        ... 15 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)