You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Rajendra Patil <Ra...@KPITCummins.com> on 2005/09/12 08:26:59 UTC

intrantet crawling with nutch-0.7

I was trying to execute intranet crawling with nutch-0.7 code.

I did all the settings in config files (similar to nutch-0.6)

Inspite of these configurations it's giving me the following error:

 

Exception in thread "main" java.lang.ExceptionInInitializerError

        at
org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)

        at
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)

        at
org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)

        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)

Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter
not found.

        at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)

 

 

Has any one faced such issue??? Plz let me know if you have solution for
this..or do I need to modify any more configuration file other than
crawl-urlfilter.txt & nutch-site.xml ???

 


Re: intrantet crawling with nutch-0.7

Posted by cf-auto <cf...@folge2.de>.
Hi Rajendra

I had this when I did forget to add "urlfilter-regex" to the value of
property "plugin.includes" in nutch-site.xml

Christoph

PS: 
btw, I think "urlfilter-regex" should appear in the example for
"plugin.includes" in "How do I index my local file system?" on
http://wiki.apache.org/nutch/FAQ
As a newbie I copied the value from there and promptly ran into this
error.
I tried to correct it in the wiki but failed gaining write permissions.
Maybe someone can do it?


PPS:
And if he/she does, there is a another small but time-consuming typo in 
http://wiki.apache.org/nutch/GettingNutchRunningWithResin
The system property for jdk1.4 logging is 
"java.util.logging.config.file" not "java.util.logging.config.files". 

Thanks




Am Montag, den 12.09.2005, 11:56 +0530 schrieb Rajendra Patil:
> I was trying to execute intranet crawling with nutch-0.7 code.
> 
> I did all the settings in config files (similar to nutch-0.6)
> 
> Inspite of these configurations it's giving me the following error:
> 
>  
> 
> Exception in thread "main" java.lang.ExceptionInInitializerError
> 
>         at
> org.apache.nutch.db.WebDBInjector.addPage(WebDBInjector.java:437)
> 
>         at
> org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:378)
> 
>         at
> org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
> 
>         at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
> 
> Caused by: java.lang.RuntimeException: org.apache.nutch.net.URLFilter
> not found.
> 
>         at org.apache.nutch.net.URLFilters.<clinit>(URLFilters.java:44)
> 
>  
> 
> 
> 
> Has any one faced such issue??? Plz let me know if you have solution for
> this..or do I need to modify any more configuration file other than
> crawl-urlfilter.txt & nutch-site.xml ???
> 
>  
>