You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Harmesh <ha...@in.v2solutions.com> on 2007/03/07 08:05:22 UTC

How to configured crawl-urlfilters.txt

How to exclude certain site in crawl-urlfilters.txt as i had tried with the
below configuration but it was not working

-^http://*profile.php*
-^http://*posting.php*
+^http://forums.pressconnects.com/*

any one can sugest me some more better idea then please do...
-- 
View this message in context: http://www.nabble.com/How-to-configured-crawl-urlfilters.txt-tf3360454.html#a9347736
Sent from the Nutch - User mailing list archive at Nabble.com.


Re: How to configured crawl-urlfilters.txt

Posted by Gal Nitzan <ga...@gmail.com>.
-^http://([a-z0-9]*?\.)*site.com/1200 #disallow all path starting with 1200
on all machines in domain

HTH,

Gal

On 3/7/07, Harmesh <ha...@in.v2solutions.com> wrote:
>
>
> How to exclude certain site in crawl-urlfilters.txt as i had tried with
> the
> below configuration but it was not working
>
> -^http://*profile.php*
> -^http://*posting.php*
> +^http://forums.pressconnects.com/*
>
> any one can sugest me some more better idea then please do...
> --
> View this message in context:
> http://www.nabble.com/How-to-configured-crawl-urlfilters.txt-tf3360454.html#a9347736
> Sent from the Nutch - User mailing list archive at Nabble.com.
>
>