You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (JIRA)" <ji...@apache.org> on 2015/12/10 19:48:10 UTC

[jira] [Updated] (TIKA-1799) Upgrade to POI 3.14-Beta1 when available

     [ https://issues.apache.org/jira/browse/TIKA-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Allison updated TIKA-1799:
------------------------------
    Attachment: 349008.ppt.json
                349008.ppt

Might have found multithreading issue that I can't reproduce within JUnit.

When I ran the integration with rc1 of POI 3.14-beta1, there were ~460 new ppt exceptions with the stacktrace below.  I reran tika-app with batch mode and found the same exception for at least the attached file.

However, I can't trigger the exception when I run straight Tika app or when I parse the file hundreds of times in multiple threads within Junit.

{noformat}
org.apache.poi.hslf.exceptions.HSLFException: Master styles not initialized
	at org.apache.poi.hslf.usermodel.HSLFSlideMaster.setSlideShow(HSLFSlideMaster.java:144)
	at org.apache.poi.hslf.usermodel.HSLFSlideShow.buildSlidesAndNotes(HSLFSlideShow.java:362)
	at org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:152)
	at org.apache.poi.hslf.usermodel.HSLFSlideShow.<init>(HSLFSlideShow.java:185)
	at org.apache.tika.parser.microsoft.HSLFExtractor.parse(HSLFExtractor.java:61)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:149)
	at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:117)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
	at org.apache.tika.parser.ParserDecorator.parse(ParserDecorator.java:177)
	at org.apache.tika.parser.DigestingParser.parse(DigestingParser.java:74)
	at org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
	at org.apache.tika.batch.FileResourceConsumer.parse(FileResourceConsumer.java:407)
	at org.apache.tika.batch.fs.RecursiveParserWrapperFSConsumer.processFileResource(RecursiveParserWrapperFSConsumer.java:104)
	at org.apache.tika.batch.FileResourceConsumer._processFileResource(FileResourceConsumer.java:182)
	at org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:115)
	at org.apache.tika.batch.FileResourceConsumer.call(FileResourceConsumer.java:50)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{noformat}

> Upgrade to POI 3.14-Beta1 when available
> ----------------------------------------
>
>                 Key: TIKA-1799
>                 URL: https://issues.apache.org/jira/browse/TIKA-1799
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Minor
>         Attachments: 349008.ppt, 349008.ppt.json
>
>
> Should be out in the next week or two.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)