You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tyler Palsulich (JIRA)" <ji...@apache.org> on 2015/03/14 01:54:38 UTC

[jira] [Commented] (TIKA-1082) Incorrect date in Doc metadata

    [ https://issues.apache.org/jira/browse/TIKA-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361429#comment-14361429 ] 

Tyler Palsulich commented on TIKA-1082:
---------------------------------------

I'm getting the following metadata with Tika 1.8-SNAPSHOT:
{code}
<meta name="Revision-Number" content="0"/>
<meta name="Last-Printed" content="1601-01-01T00:00:00Z"/>
<meta name="cp:revision" content="0"/>
<meta name="meta:print-date" content="1601-01-01T00:00:00Z"/>
<meta name="meta:creation-date" content="2011-10-05T11:32:21Z"/>
<meta name="dcterms:modified" content="1601-01-01T00:00:00Z"/>
<meta name="meta:save-date" content="1601-01-01T00:00:00Z"/>
<meta name="Content-Length" content="9216"/>
<meta name="Last-Modified" content="1601-01-01T00:00:00Z"/>
<meta name="dcterms:created" content="2011-10-05T11:32:21Z"/>
<meta name="date" content="1601-01-01T00:00:00Z"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser"/>
<meta name="X-Parsed-By" content="org.apache.tika.parser.microsoft.OfficeParser"/>
<meta name="modified" content="1601-01-01T00:00:00Z"/>
<meta name="Creation-Date" content="2011-10-05T11:32:21Z"/>
<meta name="Content-Type" content="application/msword"/>
<meta name="resourceName" content="test.doc"/>
<meta name="Last-Save-Date" content="1601-01-01T00:00:00Z"/>
{code}

There are several date related fields. Can someone with experience with the Doc Parser decide whether this issue needs more attention? Should we have all of these metadata fields? Should they have two different values?

> Incorrect date in Doc metadata
> ------------------------------
>
>                 Key: TIKA-1082
>                 URL: https://issues.apache.org/jira/browse/TIKA-1082
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 1.3
>            Reporter: Bernhard Berger
>            Priority: Minor
>         Attachments: EnglishDoc.doc
>
>
> I get the incorrect date "1601-01-01T00:00:00Z" from a MS Word document with the Tika 1.3 metadatas.
> The same document gives the correct date "2011-10-05T11:32:21Z" with Tika 1.2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)