You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Peter Kronenberg <pe...@torch.ai> on 2020/12/28 20:00:53 UTC
Metadata
For the metadata that comes back from a parse (example below), clearly, the fields are dependent on the file type and information available. Are there any 'standard' fields that come back for all/any files? Such as Author, date, x-parsed-by, etc. Is there a list of these somewhere?
[cid:image001.png@01D6DD2A.3BD6DBF0]
Re: Metadata
Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 28 Dec 2020, Peter Kronenberg wrote:
> For the metadata that comes back from a parse (example below), clearly,
> the fields are dependent on the file type and information available.
> Are there any 'standard' fields that come back for all/any files? Such
> as Author, date, x-parsed-by, etc. Is there a list of these somewhere?
Main ones are taken from Dublin Core, see:
http://tika.apache.org/1.25/api/org/apache/tika/metadata/DublinCore.html
Other ones that a fair number use come from:
http://tika.apache.org/1.25/api/org/apache/tika/metadata/TikaMetadataKeys.html
http://tika.apache.org/1.25/api/org/apache/tika/metadata/HttpHeaders.html
The full set of properties is defined in the interfaces at:
http://tika.apache.org/1.25/api/org/apache/tika/metadata/package-summary.html
Nick