You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by KRIS MUSSHORN <mu...@comcast.net> on 2016/09/08 15:25:11 UTC

Tika and metadata/properties

How would I set Nutch 1.12 to extract author, title, keywords, description etc (ideally any metadata/properties) of doc/docx/pdf (any non webpage content) into discrete Solr 5.4.1 fields. 
I have already set up the metadata plugin for getting fields into solr but need to be able to do it for other content. 

Thanks, 
Kris