You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Kevin MacDonald <ke...@hautesecure.com> on 2008/09/12 00:39:59 UTC

Allowing http and https crawling

What would be the regex that allows both http and https? The regex I am
currently using in crawl-urlfilter.txt is

# accept all hosts
+^http://([a-z0-9]*\.)*\S*

Re: Allowing http and https crawling

Posted by Kevin MacDonald <ke...@hautesecure.com>.
Oh. duh. There we go.
+^https?://([a-z0-9]*\.)*\S*


On Thu, Sep 11, 2008 at 3:39 PM, Kevin MacDonald <ke...@hautesecure.com>wrote:

> What would be the regex that allows both http and https? The regex I am
> currently using in crawl-urlfilter.txt is
>
> # accept all hosts
> +^http://([a-z0-9]*\.)*\S*
>
>