You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lyndon Maydwell <ma...@gmail.com> on 2007/09/06 06:55:05 UTC

Re: fetch errors?

That did fix the problem thank you.

On 7/13/07, Karol Rybak <ka...@gmail.com> wrote:
> Make sure that you configured proper file, if you are using crawl tool
> crawl-urlfilter is used. If you use fetch or fetch2 regex-urlfiter is used.
>
> On 7/13/07, Lyndon Maydwell <ma...@gmail.com> wrote:
> >
> > Hi list,
> >
> > I'm running a crawl over a site, but it seems to be fetching pages
> > outside of the regex domain.
> >
> > +^http://([a-z0-9]*\.)*curtin.edu.au/
> >
> > ie.
> >
> > fetching http://www.environment.sa.gov.au/epa/used_packaging.html
> > fetching http://abc.net.au/triplej/hottest100/ringtones/default.htm
> > fetching http://dmoz.org/News/Newspapers/
> >
> > This seems wrong to me, is there some way make sure I haven't made any
> > stupid mistakes?
> >
>