You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Piccuirro <mi...@gmail.com> on 2008/07/09 17:20:51 UTC

HTML meta tags in index

I'm using nutch to crawl my site.  I've successfully gone through the
tutorial and can search the index it creates.  Now I want to be able to
include the meta tags from those pages in the documents in the index.  I
would like the standard "description" and "keyword" tags as well as a couple
custom ones like "thumbnail" to be in my search results page.

So I've been doing a lot of RTFM'ing and the closest thing I can find is the
plugin example which demonstrates how to get a "recommended" meta tag and
increase the boost.  So currently I'm prepared to write a plugin that reads
all the meta tags I need to use and add them to the index.

My question is, am I on the right track by building the plugin?  Or is there
a easier out-of-the-box way to include the meta tag information?

Thanks a lot in advance for any help.