You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by mabi <ma...@protonmail.ch> on 2017/11/19 21:06:46 UTC
Parsing/indexing Open Graph meta tags from HTML
Hi,
I am currently testing Nutch 2.3.1 and need to be able to parse and index Open Graph meta tags in HTML such as this one:
<meta property="og:title" content="The Rock" />
Unfortunately the parse-metatags and index-metadata only works by extracting meta tags with their name attribute and not property.
Does anyone have a workaround how I can still use Nutch to parse and index Open Graph meta tags from HTML?
Thanks for your help.
Best regards,
Mabi