You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2020/08/10 17:53:00 UTC
[jira] [Commented] (TIKA-3154) Exception while extracting msg files
[ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174961#comment-17174961 ]
Tim Allison commented on TIKA-3154:
-----------------------------------
Opened: https://bz.apache.org/bugzilla/show_bug.cgi?id=64659
> Exception while extracting msg files
> ------------------------------------
>
> Key: TIKA-3154
> URL: https://issues.apache.org/jira/browse/TIKA-3154
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.24.1
> Reporter: Akash
> Priority: Major
>
> While parsing msg file containing some html text inside, we are getting exception from Tika.
> Command : java -jar tika-app-1.24.1.jar html_code.msg
> Exception coming :
> {code:java}
> /Aug 07, 2020 10:59:00 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
> at org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293 undefined)
> at org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)
> at org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143 undefined)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209 undefined)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
> Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 1326748, but 1000000 is the maximum for this record type.
> If the file is not corrupt, please open an issue on bugzilla to request
> increasing the maximum allowable size for this record type.
> As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
> at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
> at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
> at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610 undefined)
> at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596 undefined)
> at org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49 undefined)
> at org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328 undefined)
> at org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247 undefined)
> at org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199 undefined)
> at org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131 undefined)
> at org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)/
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)