You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "David Weekly (JIRA)" <ji...@apache.org> on 2009/06/26 23:19:47 UTC
[jira] Created: (TIKA-255) Embedded Visio Content Crashes PPT
Parser
Embedded Visio Content Crashes PPT Parser
-----------------------------------------
Key: TIKA-255
URL: https://issues.apache.org/jira/browse/TIKA-255
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 0.4
Environment: Debian 5.0.1
Reporter: David Weekly
The attached PPT is a valid file but crashes Tika. It contains embedded Visio data, which may be the cause for the issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-255) Embedded Visio Content Crashes PPT
Parser
Posted by "David Weekly (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724728#action_12724728 ]
David Weekly commented on TIKA-255:
-----------------------------------
Note that the following patch to trunk resolves this issue. Please commit!
--- tika-parsers/pom.xml~ 2009-06-26 20:40:53.352092861 +0000
+++ tika-parsers/pom.xml 2009-06-26 21:34:41.380840576 +0000
@@ -38,7 +38,7 @@
<url>http://lucene.apache.org/tika/</url>
<properties>
- <poi.version>3.5-beta5</poi.version>
+ <poi.version>3.5-beta6</poi.version>
</properties>
> Embedded Visio Content Crashes PPT Parser
> -----------------------------------------
>
> Key: TIKA-255
> URL: https://issues.apache.org/jira/browse/TIKA-255
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.4
> Environment: Debian 5.0.1
> Reporter: David Weekly
> Attachments: extract-tika.ppt
>
>
> The attached PPT is a valid file but crashes Tika. It contains embedded Visio data, which may be the cause for the issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-255) Embedded Visio Content Crashes PPT
Parser
Posted by "David Weekly (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Weekly updated TIKA-255:
------------------------------
Attachment: extract-tika.ppt
This PPT file is valid but crashes Tika 0.4 nightly:
@sfx22001:~/tika-reactor# java -jar tika-app/target/tika-app-0.4-SNAPSHOT.jar /home/dew/extract-tika.ppt
Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@61c80b01
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:85)
at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:116)
at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:57)
Caused by: java.lang.NullPointerException
at org.apache.poi.hslf.model.SimpleShape.getClientRecords(SimpleShape.java:322)
at org.apache.poi.hslf.model.SimpleShape.getClientDataRecord(SimpleShape.java:307)
at org.apache.poi.hslf.model.TextShape.getPlaceholderAtom(TextShape.java:547)
at org.apache.poi.hslf.model.Sheet.getPlaceholder(Sheet.java:408)
at org.apache.poi.hslf.model.HeadersFooters.isVisible(HeadersFooters.java:244)
at org.apache.poi.hslf.model.HeadersFooters.isHeaderVisible(HeadersFooters.java:148)
at org.apache.poi.hslf.extractor.PowerPointExtractor.getText(PowerPointExtractor.java:173)
at org.apache.poi.hslf.extractor.PowerPointExtractor.getText(PowerPointExtractor.java:162)
at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:88)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
... 3 more
> Embedded Visio Content Crashes PPT Parser
> -----------------------------------------
>
> Key: TIKA-255
> URL: https://issues.apache.org/jira/browse/TIKA-255
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.4
> Environment: Debian 5.0.1
> Reporter: David Weekly
> Attachments: extract-tika.ppt
>
>
> The attached PPT is a valid file but crashes Tika. It contains embedded Visio data, which may be the cause for the issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-255) Embedded Visio Content Crashes PPT
Parser
Posted by "David Weekly (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12724723#action_12724723 ]
David Weekly commented on TIKA-255:
-----------------------------------
I note here https://issues.apache.org/bugzilla/show_bug.cgi?id=47068 which crashes in the same place. It is claimed that POI @746238 fixes this issue (comitted Feb 20, 2009) - http://svn.apache.org/viewvc?view=rev&revision=746238
When will this show up in Tika?
> Embedded Visio Content Crashes PPT Parser
> -----------------------------------------
>
> Key: TIKA-255
> URL: https://issues.apache.org/jira/browse/TIKA-255
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.4
> Environment: Debian 5.0.1
> Reporter: David Weekly
> Attachments: extract-tika.ppt
>
>
> The attached PPT is a valid file but crashes Tika. It contains embedded Visio data, which may be the cause for the issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (TIKA-255) Embedded Visio Content Crashes PPT
Parser
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-255.
--------------------------------
Resolution: Fixed
Fix Version/s: 0.4
Assignee: Jukka Zitting
Thanks for the report and the suggested fix! POI dependency upgraded in revision 789089.
> Embedded Visio Content Crashes PPT Parser
> -----------------------------------------
>
> Key: TIKA-255
> URL: https://issues.apache.org/jira/browse/TIKA-255
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.4
> Environment: Debian 5.0.1
> Reporter: David Weekly
> Assignee: Jukka Zitting
> Fix For: 0.4
>
> Attachments: extract-tika.ppt
>
>
> The attached PPT is a valid file but crashes Tika. It contains embedded Visio data, which may be the cause for the issue.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.