You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by KRIS MUSSHORN <mu...@comcast.net> on 2016/09/08 15:25:11 UTC
Tika and metadata/properties
How would I set Nutch 1.12 to extract author, title, keywords, description etc (ideally any metadata/properties) of doc/docx/pdf (any non webpage content) into discrete Solr 5.4.1 fields.
I have already set up the metadata plugin for getting fields into solr but need to be able to do it for other content.
Thanks,
Kris