You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by qu...@webmail.co.za on 2005/06/03 22:44:56 UTC

Deleting urls/Recurring urls

Hi there

I'm experiencing a recurring url for this example lets call
it xyz.com.

I've added a regexfilter so that it would be excluded from
any crawls as well as added it to the banned-hosts file and
pruned the segments regularly for any reference to the
domain however which each and every fetch I'm seeing the
url reappear a few times. This is one of those sites that
have a "nocache tag" (xyz.com/adasdasd.asp?nc=329084723
etc) in the url which thus creates thousands of pages to
crawl for a 6 page site. 

Any ideas?

Thanks
_____________________________________________________________________
For super low premiums, click here http://www.dialdirect.co.za/quote