You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2020/08/10 17:53:00 UTC

[jira] [Commented] (TIKA-3154) Exception while extracting msg files

    [ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174961#comment-17174961 ] 

Tim Allison commented on TIKA-3154:
-----------------------------------

Opened: https://bz.apache.org/bugzilla/show_bug.cgi?id=64659

> Exception while extracting msg files
> ------------------------------------
>
>                 Key: TIKA-3154
>                 URL: https://issues.apache.org/jira/browse/TIKA-3154
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24.1
>            Reporter: Akash
>            Priority: Major
>
> While parsing msg file containing some html text inside, we are getting exception from Tika.
> Command : java -jar tika-app-1.24.1.jar html_code.msg
> Exception coming : 
> {code:java}
> /Aug 07, 2020 10:59:00 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1
> 	at org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:293 undefined)
> 	at org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)
> 	at org.apache.tikar.AutoDetectParser.parse.parse(AutoDetectParser.java:143 undefined)
> 	at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209 undefined)
> 	at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496 undefined)
> 	at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149 undefined)
> Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 1326748, but 1000000 is the maximum for this record type.
> If the file is not corrupt, please open an issue on bugzilla to request 
> increasing the maximum allowable size for this record type.
> As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
> 	at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630 undefined)
> 	at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208 undefined)
> 	at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610 undefined)
> 	at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596 undefined)
> 	at org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49 undefined)
> 	at org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328 undefined)
> 	at org.apache.tikar.microsoft.OutlookExtractor.parse.parse(OutlookExtractor.java:247 undefined)
> 	at org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:199 undefined)
> 	at org.apache.tikar.microsoft.OfficeParser.parse.parse(OfficeParser.java:131 undefined)
> 	at org.apache.tikar.CompositeParser.parse.parse(CompositeParser.java:280 undefined)/ 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)