You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2015/12/10 19:48:10 UTC
[jira] [Updated] (TIKA-1799) Upgrade to POI 3.14-Beta1 when
available
[ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Allison updated TIKA-1799:
------------------------------
Attachment: 349008.ppt.json
349008.ppt
Might have found multithreading issue that I can't reproduce within JUnit.
When I ran the integration with rc1 of POI 3.14-beta1, there were ~460 new ppt exceptions with the stacktrace below. I reran tika-app with batch mode and found the same exception for at least the attached file.
However, I can't trigger the exception when I run straight Tika app or when I parse the file hundreds of times in multiple threads within Junit.
{noformat}
org.apache.poi.hslf.exceptions.HSLFException: Master styles not initialized
at org.apache.poi.hslf.usermodel.HSLFSlideMaster.setSlideShow(HSLFSlideMaster.java:144)
at org.apache.poi.hslf.usermodel.HSLFSlideShow.buildSlidesAndNotes(HSLFSlideShow.java:362)
at org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:152)
at org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:185)
at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:177)
at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:74)
at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
at org.apache.tika.batch.FileResourceConsumer.parse(FileResourceConsumer.java:407)
at org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
at org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
at org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
at org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:50)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{noformat}
> Upgrade to POI 3.14-Beta1 when available
> ----------------------------------------
>
> Key: TIKA-1799
> URL: https://issues.apache.org/jira/browse/TIKA-1799
> Project: Tika
> Issue Type: Improvement
> Reporter: Tim Allison
> Priority: Minor
> Attachments: 349008.ppt, 349008.ppt.json
>
>
> Should be out in the next week or two.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)