You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Michael Ji <fj...@yahoo.com> on 2005/09/21 04:14:15 UTC
crawl-urlfilter.txt VS regex-urlfiter.txt
Hi,
I found I can use crawl-urlfilter.txt to define the
domain limitation by
"
# accept hosts in MY.DOMAIN.NAME
+^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
"
But, I found when I didn't use bin/nutch crawl...,
crawl-urlfilter.txt won't help me to filter out the
domain I don't want.
Can I use regex-urlfiter.txt to define the domain as
crawl-urlfiter.txt does?
thanks,
Michael Ji
__________________________________
Yahoo! Mail - PC Magazine Editors' Choice 2005
http://mail.yahoo.com