You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kevin MacDonald <ke...@hautesecure.com> on 2008/09/12 00:39:59 UTC
Allowing http and https crawling
What would be the regex that allows both http and https? The regex I am
currently using in crawl-urlfilter.txt is
# accept all hosts
+^http://([a-z0-9]*\.)*\S*
Re: Allowing http and https crawling
Posted by Kevin MacDonald <ke...@hautesecure.com>.
Oh. duh. There we go.
+^https?://([a-z0-9]*\.)*\S*
On Thu, Sep 11, 2008 at 3:39 PM, Kevin MacDonald <ke...@hautesecure.com>wrote:
> What would be the regex that allows both http and https? The regex I am
> currently using in crawl-urlfilter.txt is
>
> # accept all hosts
> +^http://([a-z0-9]*\.)*\S*
>
>