You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Peter Kronenberg <pe...@torch.ai> on 2020/12/28 20:00:53 UTC

Metadata

For the metadata that comes back from a parse (example below), clearly, the fields are dependent on the file type and information available.  Are there any 'standard' fields that come back for all/any files?  Such as Author, date, x-parsed-by, etc.  Is there a list of these somewhere?

[cid:image001.png@01D6DD2A.3BD6DBF0]

Re: Metadata

Posted by Nick Burch <ap...@gagravarr.org>.
On Mon, 28 Dec 2020, Peter Kronenberg wrote:
> For the metadata that comes back from a parse (example below), clearly, 
> the fields are dependent on the file type and information available. 
> Are there any 'standard' fields that come back for all/any files?  Such 
> as Author, date, x-parsed-by, etc.  Is there a list of these somewhere?

Main ones are taken from Dublin Core, see:
http://tika.apache.org/1.25/api/org/apache/tika/metadata/DublinCore.html

Other ones that a fair number use come from:
http://tika.apache.org/1.25/api/org/apache/tika/metadata/TikaMetadataKeys.html
http://tika.apache.org/1.25/api/org/apache/tika/metadata/HttpHeaders.html

The full set of properties is defined in the interfaces at:
http://tika.apache.org/1.25/api/org/apache/tika/metadata/package-summary.html

Nick