You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2011/12/20 02:25:31 UTC

[jira] [Commented] (TIKA-819) Make Option to Exclude Embedded Files' Text for Text Content

    [ https://issues.apache.org/jira/browse/TIKA-819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172825#comment-13172825 ] 

Nick Burch commented on TIKA-819:
---------------------------------

You have to explicitly ask for embedded files to be parsed, by supplying a Parser in the ParseContext object

If you don't want recursion, don't supply the parser!
                
> Make Option to Exclude Embedded Files' Text for Text Content
> ------------------------------------------------------------
>
>                 Key: TIKA-819
>                 URL: https://issues.apache.org/jira/browse/TIKA-819
>             Project: Tika
>          Issue Type: New Feature
>          Components: general
>    Affects Versions: 1.0
>         Environment: Windows-7 + JDK 1.6 u26
>            Reporter: Albert L.
>             Fix For: 1.1
>
>
> It would be nice to be able to disable text content from embedded files.
> For example, if I have a DOCX with an embedded PPTX, then I would like the option to disable text from the PPTX from showing up when asking for the text content from DOCX.  In other words, it would be nice to have the option to get text content *only* from the DOCX instead of the DOCX+PPTX.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira