You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/03/04 01:43:07 UTC

[jira] [Commented] (TIKA-1057) document content property "Status" is not extracted for *.doc files

    [ https://issues.apache.org/jira/browse/TIKA-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346120#comment-14346120 ] 

Tyler Palsulich commented on TIKA-1057:
---------------------------------------

Can someone provide a .doc file with a status metadata field? Or, do they all have it?

> document content property "Status" is not extracted for *.doc files
> -------------------------------------------------------------------
>
>                 Key: TIKA-1057
>                 URL: https://issues.apache.org/jira/browse/TIKA-1057
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>         Environment: java 1.5/1.6 / Windows 7
>            Reporter: Thomas Stroeter
>            Priority: Minor
>
> I would like to use Tika to extract the document property "Status" from a word 97-2003 *.doc file.
>    
> Tika dumps the document status property correctly from the xml *.docx files as "Content-Status" and "cp:contentStatus", but I can not extract the metadata from a *.doc Word documents using Tika. 
> Nevertheless Word 2010 has no problem to set and extract that document meta 
> data from a *.doc file.
> Is there a way to extract these information by Tika for *.doc files, too?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)