You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "spadezhang (Jira)" <ji...@apache.org> on 2020/04/07 17:01:00 UTC

[jira] [Commented] (TIKA-3072) Seeing org.apache.tika.exception.TikaException: Unexpected RuntimeException for an XLS file

    [ https://issues.apache.org/jira/browse/TIKA-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17077416#comment-17077416 ] 

spadezhang commented on TIKA-3072:
----------------------------------

i have tried this file with tika-1.24,and it parsed error {code}
Apache Tika was unable to parse the documentApache Tika was unable to parse the documentat D:\download\0000431.xls.
The full exception stack trace is included below:
java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOf(Arrays.java:3308) at java.util.BitSet.ensureCapacity(BitSet.java:337) at java.util.BitSet.expandTo(BitSet.java:352) at java.util.BitSet.set(BitSet.java:447) at de.l3s.boilerpipe.sax.BoilerpipeHTMLContentHandler.characters(BoilerpipeHTMLContentHandler.java:267) at org.apache.tika.parser.html.BoilerpipeContentHandler.characters(BoilerpipeContentHandler.java:155) at org.apache.tika.sax.TeeContentHandler.characters(TeeContentHandler.java:102) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146) at org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:47) at org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:83) at org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:141) at org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:288) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:284) at org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:311) at org.apache.tika.parser.microsoft.TextCell.render(TextCell.java:34) at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processSheet(ExcelExtractor.java:646) at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.internalProcessRecord(ExcelExtractor.java:416) at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processRecord(ExcelExtractor.java:367) at org.apache.poi.hssf.eventusermodel.FormatTrackingHSSFListener.processRecord(FormatTrackingHSSFListener.java:92) at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener$TikaFormatTrackingHSSFListener.processRecord(ExcelExtractor.java:689) at org.apache.poi.hssf.eventusermodel.HSSFRequest.processRecord(HSSFRequest.java:106) at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.genericProcessEvents(HSSFEventFactory.java:172) at org.apache.poi.hssf.eventusermodel.HSSFEventFactory.processEvents(HSSFEventFactory.java:129) at org.apache.tika.parser.microsoft.ExcelExtractor$TikaHSSFListener.processFile(ExcelExtractor.java:343) at org.apache.tika.parser.microsoft.ExcelExtractor.parse(ExcelExtractor.java:172) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:183) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
{code}

> Seeing org.apache.tika.exception.TikaException: Unexpected RuntimeException for an XLS file
> -------------------------------------------------------------------------------------------
>
>                 Key: TIKA-3072
>                 URL: https://issues.apache.org/jira/browse/TIKA-3072
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Muhammad Yasir Khan
>            Priority: Major
>         Attachments: 0000431.xls
>
>
> [^0000431.xls]
> {code:java}
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@5d216317
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:282)
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:159)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)