You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Sundaramoorthy Kannan <ka...@cognizant.com> on 2005/06/01 07:07:25 UTC

How to exclude content other than Script & Style from indexing

Hi,
If I have to exclude some parts of a web page from getting indexed, how
can I do it? As I understand, DOMContentUtils class of HTML parser
plugin currently ignores only SCRIPT, STYLE and comment text. Can I
configure it to exclude some other tags too?

Thanks,
Kannan