You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rajendra Patil <Ra...@KPITCummins.com> on 2005/09/12 08:26:59 UTC
intrantet crawling with nutch-0.7
I was trying to execute intranet crawling with nutch-0.7 code.
I did all the settings in config files (similar to nutch-0.6)
Inspite of these configurations it's giving me the following error:
Exception in thread "main" java.lang.ExceptionInInitializerError
at
org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
at
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
at
org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter
not found.
at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
Has any one faced such issue??? Plz let me know if you have solution for
this..or do I need to modify any more configuration file other than
crawl-urlfilter.txt & nutch-site.xml ???
Re: intrantet crawling with nutch-0.7
Posted by cf-auto <cf...@folge2.de>.
Hi Rajendra
I had this when I did forget to add "urlfilter-regex" to the value of
property "plugin.includes" in nutch-site.xml
Christoph
PS:
btw, I think "urlfilter-regex" should appear in the example for
"plugin.includes" in "How do I index my local file system?" on
http://wiki.apache.org/nutch/FAQ
As a newbie I copied the value from there and promptly ran into this
error.
I tried to correct it in the wiki but failed gaining write permissions.
Maybe someone can do it?
PPS:
And if he/she does, there is a another small but time-consuming typo in
http://wiki.apache.org/nutch/GettingNutchRunningWithResin
The system property for jdk1.4 logging is
"java.util.logging.config.file" not "java.util.logging.config.files".
Thanks
Am Montag, den 12.09.2005, 11:56 +0530 schrieb Rajendra Patil:
> I was trying to execute intranet crawling with nutch-0.7 code.
>
> I did all the settings in config files (similar to nutch-0.6)
>
> Inspite of these configurations it's giving me the following error:
>
>
>
> Exception in thread "main" java.lang.ExceptionInInitializerError
>
> at
> org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
>
> at
> org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
>
> at
> org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
>
> at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
>
> Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter
> not found.
>
> at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
>
>
>
>
>
> Has any one faced such issue??? Plz let me know if you have solution for
> this..or do I need to modify any more configuration file other than
> crawl-urlfilter.txt & nutch-site.xml ???
>
>
>