You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by SravanS <sr...@gmail.com> on 2010/06/29 06:19:49 UTC

Crawls more urls than specified

Hey guys,

So I previously crawled/indexed (nutched?!) two urls together at the same
time. Then I got rid of the crawl file, and tried to re-crawl with just one
url. However, it still seems to crawl both the urls.

I changed my urls file, as well as my crawlurl-filter.txt to limit the
domain to that one url.

I tried re-downloading nutch and resetting all my settings, and using only
that one url, but regardless it seems to crawl those two urls.

I know this is very poor amount of information, so I'll just give the specs
of what I'm running.

I've used nutch 0.9, nutch 1.0 on centos 5.2. I run the nutch web server in
tomcat 6.0. Same results everytime.

Sincerely,
Sravan Suryadevara
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Crawls-more-urls-than-specified-tp929785p929785.html
Sent from the Nutch - User mailing list archive at Nabble.com.