You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Martin Gutbrod <gu...@ifalt.de> on 2006/02/23 11:04:36 UTC
(AW) About regex in the crawl-urlfilter.txt config file
nutch-user@lucene.apache.org schrieb:
> # accept hosts in MY.DOMAIN.NAME
> +^http://([a-z0-9]*\.)*MY.DOMAIN.NAME/
>
> Will this pattern accept url like this
http://MY.DOMAIN.NAME/([a-z0-9]*\.)*/?
Yes.
The regex in crawl-urlfilter.txt has only a start delimiter (^) but no
end delimtiter ($). So only the start part (left part) of the url
is compared.