You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Elwin <ma...@gmail.com> on 2006/02/10 10:38:29 UTC

How to control contents to be indexed?

In the process of crawling and indexing, some pages are just used as
"temporary links " to the pages I want to index, so how can I control those
kinds of pages not being indexed? Or which part of nutch should I extend?