You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Michael Ji <fj...@yahoo.com> on 2005/09/21 04:14:15 UTC

crawl-urlfilter.txt VS regex-urlfiter.txt

Hi,

I found I can use crawl-urlfilter.txt to define the
domain limitation by 
"
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
"

But, I found when I didn't use bin/nutch crawl...,
crawl-urlfilter.txt won't help me to filter out the
domain I don't want.

Can I use regex-urlfiter.txt to define the domain as
crawl-urlfiter.txt does? 

thanks,

Michael Ji


		
__________________________________ 
Yahoo! Mail - PC Magazine Editors' Choice 2005 
http://mail.yahoo.com