You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Akash (Jira)" <ji...@apache.org> on 2020/08/07 17:28:00 UTC

[jira] [Updated] (TIKA-3154) Exception while extracting msg files

     [ https://issues.apache.org/jira/browse/TIKA-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Akash updated TIKA-3154:
------------------------
    Description: 
While parsing msg file containing some html text inside, we are getting exception from Tika.

Command : java -jar tika-app-1.24.1.jar html_code.msg

Exception coming : 

See tika-parsers/pom.xml for the correct version.See tika-parsers/pom.xml for the correct version.Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 1326748, but 1000000 is the maximum for this record type.If the file is not corrupt, please open an issue on bugzilla to request increasing the maximum allowable size for this record type.As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
{code:java}
/at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630) at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208) at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610) at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596) at org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49) at org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328) at org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:247) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5 more/ 
{code}

  was:
While parsing msg file containing some html text inside, we are getting exception from Tika.

Command : java -jar tika-app-1.24.1.jar html_code.msg

Exception coming : 

See tika-parsers/pom.xml for the correct version.See tika-parsers/pom.xml for the correct version.Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 1326748, but 1000000 is the maximum for this record type.If the file is not corrupt, please open an issue on bugzilla to request increasing the maximum allowable size for this record type.As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride() at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630) at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208) at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610) at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596) at org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49) at org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328) at org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:247) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5 more


> Exception while extracting msg files
> ------------------------------------
>
>                 Key: TIKA-3154
>                 URL: https://issues.apache.org/jira/browse/TIKA-3154
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24.1
>            Reporter: Akash
>            Priority: Major
>
> While parsing msg file containing some html text inside, we are getting exception from Tika.
> Command : java -jar tika-app-1.24.1.jar html_code.msg
> Exception coming : 
> See tika-parsers/pom.xml for the correct version.See tika-parsers/pom.xml for the correct version.Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@7fcf2fc1 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209) at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496) at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)Caused by: org.apache.poi.util.RecordFormatException: Tried to allocate an array of length 1326748, but 1000000 is the maximum for this record type.If the file is not corrupt, please open an issue on bugzilla to request increasing the maximum allowable size for this record type.As a temporary workaround, consider setting a higher override value with IOUtils.setByteArrayMaxOverride()
> {code:java}
> /at org.apache.poi.util.IOUtils.throwRFE(IOUtils.java:630) at org.apache.poi.util.IOUtils.checkLength(IOUtils.java:208) at org.apache.poi.util.IOUtils.safelyAllocateCheck(IOUtils.java:610) at org.apache.poi.util.IOUtils.safelyAllocate(IOUtils.java:596) at org.apache.poi.hmef.attribute.MAPIRtfAttribute.<init>(MAPIRtfAttribute.java:49) at org.apache.tika.parser.microsoft.OutlookExtractor.handleBodyChunks(OutlookExtractor.java:328) at org.apache.tika.parser.microsoft.OutlookExtractor.parse(OutlookExtractor.java:247) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:199) at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:131) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) ... 5 more/ 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)