You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by Martin Gutbrod <gu...@ifalt.de> on 2006/02/23 11:04:36 UTC

(AW) About regex in the crawl-urlfilter.txt config file

nutch-user@lucene.apache.org schrieb:
> # accept hosts in MY.DOMAIN.NAME
> +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
> 
> Will this pattern accept url like this
http://MY.DOMAIN.NAME/([a-z0-9]*\.)*/?

Yes. 
The regex in crawl-urlfilter.txt has only a start delimiter (^) but no
end delimtiter ($). So only the start part (left part) of the url 
is compared.