You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Felix von Zadow <Fe...@mgm-tp.com> on 2017/03/23 17:17:54 UTC

Headings plugin for 2.3.1?

Hi!

I found the headings plugin for Nutch 1.x which extracts content from <h1>, <h2>, ... in HTML pages. Is there a similar plugin for 2.3.1? Or is there another recommended way to go about extracting content from specific HTML tags?

Thanks!
Felix