You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by mabi <ma...@protonmail.ch> on 2017/11/19 21:06:46 UTC

Parsing/indexing Open Graph meta tags from HTML

Hi,

I am currently testing Nutch 2.3.1 and need to be able to parse and index Open Graph meta tags in HTML such as this one:

<meta property="og:title" content="The Rock" />

Unfortunately the parse-metatags and index-metadata only works by extracting meta tags with their name attribute and not property.

Does anyone have a workaround how I can still use Nutch to parse and index Open Graph meta tags from HTML?

Thanks for your help.

Best regards,
Mabi