You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Edward Quick <ed...@hotmail.com> on 2008/09/18 16:35:11 UTC

RegexURLNormalizer warnings

Does anyone know what I can do about these warnings? 
I have set db.max.outlinks.per.page to -1 in nutch-site.xml.

WARN  regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default
WARN  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
WARN  regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default
WARN  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default

Thanks,

Ed.

_________________________________________________________________
Make a mini you and download it into Windows Live Messenger
http://clk.atdmt.com/UKM/go/111354029/direct/01/

Re: RegexURLNormalizer warnings

Posted by Doğacan Güney <do...@gmail.com>.
On Thu, Sep 18, 2008 at 5:35 PM, Edward Quick <ed...@hotmail.com> wrote:
>
> Does anyone know what I can do about these warnings?
> I have set db.max.outlinks.per.page to -1 in nutch-site.xml.
>
> WARN  regex.RegexURLNormalizer - can't find rules for scope 'outlink', using default
> WARN  regex.RegexURLNormalizer - can't find rules for scope 'partition', using default
> WARN  regex.RegexURLNormalizer - can't find rules for scope 'crawldb', using default
> WARN  regex.RegexURLNormalizer - can't find rules for scope 'generate_host_count', using default
>

They don't mean anything. A URLNormalizer plugin can define different
rules for different 'scopes' such as
during generation, parsing, etc. So that warning is just reporting
that RegexURLNormalizer does not define
different rules for different scopes and always uses the "default"
rules. So, it is OK :)

> Thanks,
>
> Ed.
>
> _________________________________________________________________
> Make a mini you and download it into Windows Live Messenger
> http://clk.atdmt.com/UKM/go/111354029/direct/01/



-- 
Doğacan Güney