You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2011/12/13 20:08:42 UTC

Bug in o.a.n.n.URLNormalizerChecker?

Hi,

When working on the URLFIlterChecker class I've come across a related
problem with the above class. Please see my log output and if we can
confirm this then there is a bug.


lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
org.apache.nutch.net.URLNormalizerChecker
Checking combination of all URLNormalizers available

N.B. The command above, should print the System.err.println(usage) but
instead logs that checkAll() has been called then simply hangs.

lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
org.apache.nutch.net.URLNormalizerChecker -normalizer
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
	at org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:97)

lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
org.apache.nutch.net.URLNormalizerChecker -scope
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
	at org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:99)

The above two result in OOB exceptions!

Finally, the additional logging as below (in URLNormalizerChecker)
NEVER gets printed no mater what. This is consistent across both of
the aforementioned classes, where the second half of the "Usage"
logging never gets printed to stdout.

"\n\tscope can be one of:
default,partition,generate_host_count,fetcher,crawldb,linkdb,inject,outlink";


-- 
Lewis

Re: Bug in o.a.n.n.URLNormalizerChecker?

Posted by Markus Jelsma <ma...@openindex.io>.
> Hi,
> 
> When working on the URLFIlterChecker class I've come across a related
> problem with the above class. Please see my log output and if we can
> confirm this then there is a bug.
> 
> 
> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
> org.apache.nutch.net.URLNormalizerChecker
> Checking combination of all URLNormalizers available
> 
> N.B. The command above, should print the System.err.println(usage) but
> instead logs that checkAll() has been called then simply hangs.
> 
> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
> org.apache.nutch.net.URLNormalizerChecker -normalizer
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
> 	at
> org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:9
> 7)
> 
> lewis@lewis-01:~/ASF/trunk/runtime/local$ bin/nutch
> org.apache.nutch.net.URLNormalizerChecker -scope
> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
> 	at
> org.apache.nutch.net.URLNormalizerChecker.main(URLNormalizerChecker.java:9
> 9)
> 
> The above two result in OOB exceptions!

Ah, i've never used either with custom parameters so i've no idea what 
parameter to give other than -allCombined. You should check the sources to be 
sure what names are requested.

> 
> Finally, the additional logging as below (in URLNormalizerChecker)
> NEVER gets printed no mater what. This is consistent across both of
> the aforementioned classes, where the second half of the "Usage"
> logging never gets printed to stdout.

This may be because it is not listed in your log4j.properties. I think we 
added it locally but not added it to trunk.

> 
> "\n\tscope can be one of:
> default,partition,generate_host_count,fetcher,crawldb,linkdb,inject,outlink
> ";