You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Kai-Uwe Schmidt <ku...@bel-it.de> on 2014/07/10 12:28:20 UTC

Metadata at e.g. textfiles

Hi all,

is there a way to get e.g. creator and or creation date into the metadata dictionary?

When I extract from a file I get the following:

Content-Encoding: ISO-8859-1
Content-Length: 9
Content-Type: text/plain; charset=ISO-8859-1
resourceName: TikaTestText.txt

I am missing creator and creation time.

Any hints?

Mit freundlichen Grüßen

Kai-Uwe Schmidt
Geschäftsführer

BEL-IT GmbH
Bürgermeister-Georg-Hiltmair-Str. 5, 85630 Grasbrunn Registergericht München, HRB 176809
Geschäftsführer: Kai-Uwe Schmidt
Phone: + 49 (0) 89 / 207042-320
Fax: + 49 (0) 89 / 207042-323
Mobil: + 49 (0) 173 / 3900611
Kai-Uwe.Schmidt@bel-it.de<ma...@bel-it.de>


AW: Metadata at e.g. textfiles

Posted by Kai-Uwe Schmidt <ku...@bel-it.de>.
I was referring to metadata without having a technical view on this.  I understand from a tika-developer point of view it's not possible to offer metadata that might be stored in proprietary/system dependent stores. Information like last write time or creator or even creation time are mostly stored by the filesystem, not the document.

Java 7's BasicFileAttributes would be a very nice improvement for tika!

Kai-Uwe

-----Ursprüngliche Nachricht-----
Von: Allison, Timothy B. [mailto:tallison@mitre.org] 
Gesendet: Donnerstag, 10. Juli 2014 12:58
An: dev@tika.apache.org
Betreff: RE: Metadata at e.g. textfiles

Ditto what Nick said on "internal" metadata.  Are you referring to "external" metadata that we could get in Java 7 via BasicFileAttributes on OS's that support those?



-----Original Message-----
From: Nick Burch [mailto:apache@gagravarr.org]
Sent: Thursday, July 10, 2014 6:52 AM
To: dev@tika.apache.org
Subject: Re: Metadata at e.g. textfiles

On Thu, 10 Jul 2014, Kai-Uwe Schmidt wrote:
> is there a way to get e.g. creator and or creation date into the 
> metadata dictionary?

Only for file formats which store this information

> When I extract from a file I get the following:
>
> Content-Encoding: ISO-8859-1
> Content-Length: 9
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: TikaTestText.txt

Plain text files don't include creator or creation dates in them. If you try with a format that does, eg PDF or Word, you should see them reported

Nick

RE: Metadata at e.g. textfiles

Posted by "Allison, Timothy B." <ta...@mitre.org>.
Ditto what Nick said on "internal" metadata.  Are you referring to "external" metadata that we could get in Java 7 via BasicFileAttributes on OS's that support those?



-----Original Message-----
From: Nick Burch [mailto:apache@gagravarr.org] 
Sent: Thursday, July 10, 2014 6:52 AM
To: dev@tika.apache.org
Subject: Re: Metadata at e.g. textfiles

On Thu, 10 Jul 2014, Kai-Uwe Schmidt wrote:
> is there a way to get e.g. creator and or creation date into the 
> metadata dictionary?

Only for file formats which store this information

> When I extract from a file I get the following:
>
> Content-Encoding: ISO-8859-1
> Content-Length: 9
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: TikaTestText.txt

Plain text files don't include creator or creation dates in them. If you 
try with a format that does, eg PDF or Word, you should see them reported

Nick

Re: Metadata at e.g. textfiles

Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 10 Jul 2014, Kai-Uwe Schmidt wrote:
> is there a way to get e.g. creator and or creation date into the 
> metadata dictionary?

Only for file formats which store this information

> When I extract from a file I get the following:
>
> Content-Encoding: ISO-8859-1
> Content-Length: 9
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: TikaTestText.txt

Plain text files don't include creator or creation dates in them. If you 
try with a format that does, eg PDF or Word, you should see them reported

Nick