You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by ajaxtrend <te...@yahoo.com> on 2007/12/04 03:12:49 UTC
Nutch URL filter help
Hello Group,
I am trying to fell all URLs from http://xyz.org, where url structure is http://xyz.org/2007/12/23 pattern.
In urls/my.txt file contains : http://xyz.org
in conf/crawl-urlfilter.txt has filder crawl-urlfilter.txt : +^http://indianeconomy.org/[0-9]{4}/[0-9]{2}/[0-9]{2}/\\w*
But nutch still fetches other URL from http://xyz.org too like http://abc.com etc....
I am not sure, whether I am doing anything wrong and I would appreciate your help on this.
regards,
Ranjan
---------------------------------
Get easy, one-click access to your favorites. Make Yahoo! your homepage.