You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Javier (Jira)" <ji...@apache.org> on 2020/03/24 17:12:00 UTC

[jira] [Updated] (TIKA-3076) Tika Parse Errors for application/vnd.ms-excel

     [ https://issues.apache.org/jira/browse/TIKA-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Javier updated TIKA-3076:
-------------------------
    Affects Version/s:     (was: 1.23)
                       1.24

> Tika Parse Errors for application/vnd.ms-excel
> ----------------------------------------------
>
>                 Key: TIKA-3076
>                 URL: https://issues.apache.org/jira/browse/TIKA-3076
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24
>            Reporter: Javier
>            Priority: Major
>         Attachments: test_dropbox_NO_selected.xls, test_dropbox_selected.xls
>
>
> We are triying to extract text from old Excel file using TIKA and we've found this bug. If the excel file has a dropbox WITH any element selected get this exception but if we unselect the item and save, Tika extract the content without any problem:
> h6. {color:#FF0000}Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@37ddb69aException in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@37ddb69a at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at testTika.ExtractContent(testTika.java:183) at testTika.main(testTika.java:170)Caused by: org.apache.poi.util.RecordFormatException: Leftover 7 bytes in subrecord data [15, 00, 12, 00, 12, 00, 01, 00, 11, 20, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00, 0C, 00, 14, 00, 00, 00, 00, 00, 00, 00, 00, 00, 01, 00, 01, 00, 06, 00, 00, 00, 10, 00, 01, 00, 13, 00, EE, 1F, 10, 00, 09, 00, 00, 00, 00, 00, 25, 04, 00, 0A, 00, 05, 00, 05, 00, 05, 07, 00, 00, 00, 18, 00, 00, 00, 00, 00, 00, 01, 00, 00, 00] at org.apache.poi.hssf.record.ObjRecord.<init>(ObjRecord.java:112) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62){color}
> I've attached 2 documents to test.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)