You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (Commented) (JIRA)" <ji...@apache.org> on 2012/03/09 22:12:57 UTC

[jira] [Commented] (TIKA-873) Tika --extract fails for DOC

    [ https://issues.apache.org/jira/browse/TIKA-873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13226461#comment-13226461 ] 

Nick Burch commented on TIKA-873:
---------------------------------

Tika has a number of unit tests for the extraction of embedded resources from Word documents, in POIContainerExtractionTest

Are you having this problem for only some files, or all? Do you get some, all or none of the embedded resources out?
                
> Tika --extract fails for DOC
> ----------------------------
>
>                 Key: TIKA-873
>                 URL: https://issues.apache.org/jira/browse/TIKA-873
>             Project: Tika
>          Issue Type: Bug
>          Components: general
>    Affects Versions: 1.0
>         Environment: Windows 7 + Java v1.6
>            Reporter: Albert L.
>             Fix For: 1.2
>
>         Attachments: embedded.doc
>
>
> A file that is embedded in an DOCfile doesn't get extracted to disk.
> To "embed" a file into an DOC, simply drag-drop it into an DOC document when using MS-Word 2010.  It will then create an EMF of the embedded file's preview.
> See attached file "embedded.doc" for an example input file that fails with Tika v1.0.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira