You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Michael McCandless (Commented) (JIRA)" <ji...@apache.org> on 2011/10/15 17:59:11 UTC

[jira] [Commented] (TIKA-712) Master slide text isn't extracted

    [ https://issues.apache.org/jira/browse/TIKA-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128227#comment-13128227 ] 

Michael McCandless commented on TIKA-712:
-----------------------------------------

I tested the current XSLFPowerPointExtraction on POI's trunk and it works great (preserves the footer text and no placeholder text for my PPTX test case).

But for PPT files (using PowerPointExtractor) we still pull the boiler plate text.  That's expected right?  (Ie we haven't fixed that case yet).
                
> Master slide text isn't extracted
> ---------------------------------
>
>                 Key: TIKA-712
>                 URL: https://issues.apache.org/jira/browse/TIKA-712
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>            Reporter: Michael McCandless
>         Attachments: TIKA-712-master-slide.xml, TIKA-712.patch, testPPT_masterFooter.ppt, testPPT_masterFooter.pptx, testPPT_masterFooter2.ppt, testPPT_masterFooter2.pptx
>
>
> It looks like we are not getting text from the master slide for PPT
> and PPTX.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira