You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jonathan LI (JIRA)" <ji...@apache.org> on 2011/07/20 22:43:58 UTC
[jira] [Resolved] (TIKA-684) Partial/Incomplete text extraction for
certain Powerpoint files
[ https://issues.apache.org/jira/browse/TIKA-684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan LI resolved TIKA-684.
------------------------------
Resolution: Not A Problem
False Issue
> Partial/Incomplete text extraction for certain Powerpoint files
> ---------------------------------------------------------------
>
> Key: TIKA-684
> URL: https://issues.apache.org/jira/browse/TIKA-684
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 0.9
> Reporter: Jonathan LI
> Attachments: 2eebe3db1196aa8ea58c9be83965f0eb.ppt
>
>
> Example file with issue attached.
> Tika throws exception during text extraction of certain powerpoints. In this example file, the extracted text only goes up to slide 37. Text from slides 38-40 are missing.
> Tested via both tika library and tika GUI. Apache POI (3.8 beta 3 & 3.7) doesn't have any issues with text extraction of this file.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira