You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by Kai-Uwe Schmidt <ku...@bel-it.de> on 2014/07/10 12:28:20 UTC
Metadata at e.g. textfiles
Hi all,
is there a way to get e.g. creator and or creation date into the metadata dictionary?
When I extract from a file I get the following:
Content-Encoding: ISO-8859-1
Content-Length: 9
Content-Type: text/plain; charset=ISO-8859-1
resourceName: TikaTestText.txt
I am missing creator and creation time.
Any hints?
Mit freundlichen Grüßen
Kai-Uwe Schmidt
Geschäftsführer
BEL-IT GmbH
Bürgermeister-Georg-Hiltmair-Str. 5, 85630 Grasbrunn Registergericht München, HRB 176809
Geschäftsführer: Kai-Uwe Schmidt
Phone: + 49 (0) 89 / 207042-320
Fax: + 49 (0) 89 / 207042-323
Mobil: + 49 (0) 173 / 3900611
Kai-Uwe.Schmidt@bel-it.de<ma...@bel-it.de>
AW: Metadata at e.g. textfiles
Posted by Kai-Uwe Schmidt <ku...@bel-it.de>.
I was referring to metadata without having a technical view on this. I understand from a tika-developer point of view it's not possible to offer metadata that might be stored in proprietary/system dependent stores. Information like last write time or creator or even creation time are mostly stored by the filesystem, not the document.
Java 7's BasicFileAttributes would be a very nice improvement for tika!
Kai-Uwe
-----Ursprüngliche Nachricht-----
Von: Allison, Timothy B. [mailto:tallison@mitre.org]
Gesendet: Donnerstag, 10. Juli 2014 12:58
An: dev@tika.apache.org
Betreff: RE: Metadata at e.g. textfiles
Ditto what Nick said on "internal" metadata. Are you referring to "external" metadata that we could get in Java 7 via BasicFileAttributes on OS's that support those?
-----Original Message-----
From: Nick Burch [mailto:apache@gagravarr.org]
Sent: Thursday, July 10, 2014 6:52 AM
To: dev@tika.apache.org
Subject: Re: Metadata at e.g. textfiles
On Thu, 10 Jul 2014, Kai-Uwe Schmidt wrote:
> is there a way to get e.g. creator and or creation date into the
> metadata dictionary?
Only for file formats which store this information
> When I extract from a file I get the following:
>
> Content-Encoding: ISO-8859-1
> Content-Length: 9
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: TikaTestText.txt
Plain text files don't include creator or creation dates in them. If you try with a format that does, eg PDF or Word, you should see them reported
Nick
RE: Metadata at e.g. textfiles
Posted by "Allison, Timothy B." <ta...@mitre.org>.
Ditto what Nick said on "internal" metadata. Are you referring to "external" metadata that we could get in Java 7 via BasicFileAttributes on OS's that support those?
-----Original Message-----
From: Nick Burch [mailto:apache@gagravarr.org]
Sent: Thursday, July 10, 2014 6:52 AM
To: dev@tika.apache.org
Subject: Re: Metadata at e.g. textfiles
On Thu, 10 Jul 2014, Kai-Uwe Schmidt wrote:
> is there a way to get e.g. creator and or creation date into the
> metadata dictionary?
Only for file formats which store this information
> When I extract from a file I get the following:
>
> Content-Encoding: ISO-8859-1
> Content-Length: 9
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: TikaTestText.txt
Plain text files don't include creator or creation dates in them. If you
try with a format that does, eg PDF or Word, you should see them reported
Nick
Re: Metadata at e.g. textfiles
Posted by Nick Burch <ap...@gagravarr.org>.
On Thu, 10 Jul 2014, Kai-Uwe Schmidt wrote:
> is there a way to get e.g. creator and or creation date into the
> metadata dictionary?
Only for file formats which store this information
> When I extract from a file I get the following:
>
> Content-Encoding: ISO-8859-1
> Content-Length: 9
> Content-Type: text/plain; charset=ISO-8859-1
> resourceName: TikaTestText.txt
Plain text files don't include creator or creation dates in them. If you
try with a format that does, eg PDF or Word, you should see them reported
Nick