You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Piccuirro <mi...@gmail.com> on 2008/07/09 17:20:51 UTC
HTML meta tags in index
I'm using nutch to crawl my site. I've successfully gone through the
tutorial and can search the index it creates. Now I want to be able to
include the meta tags from those pages in the documents in the index. I
would like the standard "description" and "keyword" tags as well as a couple
custom ones like "thumbnail" to be in my search results page.
So I've been doing a lot of RTFM'ing and the closest thing I can find is the
plugin example which demonstrates how to get a "recommended" meta tag and
increase the boost. So currently I'm prepared to write a plugin that reads
all the meta tags I need to use and add them to the index.
My question is, am I on the right track by building the plugin? Or is there
a easier out-of-the-box way to include the meta tag information?
Thanks a lot in advance for any help.