You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Thielen <mt...@thiguten.de> on 2016/12/21 08:00:41 UTC

Parsing open graph tags with nutch

I have a running nutch 2.3.1/hbase installation that parses/indexes web pages just fine. Now I need to parse open graph tags (namely og:image, og:description). From several fragments found on the web I learned that tika basically supports parsing open graph tags, but I am lost trying to figure out how to integrate this into nutch.

Can someone point me into the right direction? Maybe an example?

Thanks

Markus