You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Sudheshna Iyer <iy...@gmail.com> on 2014/02/25 22:43:46 UTC

Extract metadata

Hello,

1. I have few questions about the extraction of metadata. So I wanted to join 
mailing list of Tika user group. Can you please provide the email address for 
it? 

2. How do I extract the metadata from a file? For eg:  I need author 
information. So for different files, author information is coming from 
different fields like: 
Author , meta:author , citation_author

Which one should I take?  Also I need to extract ~15 of predefined metadata 
fields like publication year , doi,.. from Metadata. 
What is the best way to extract these fields from Metadata object. 
Metadata.names() contains elements like "citation_doi". 
Should I say iterate thru metadata names and for each metadata, should I say

if(name.contains("doi") then DOI_CONST = name.getName(name)


Is there any better way to extract the metadata?