You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Michael Levy <Lu...@gmail.com> on 2006/04/14 15:40:29 UTC

NullPointerException due to nonexistent (mis-pointed) segments directory

Just in case this is helpful to anyone else just getting started with Nutch:

When I tried to run a Nutch search I got an HTTP Status 500 error 
message in my browser.  The server log file entry indicating 
NullPointerException (copied below) was not particularly helpful to me 
in understanding what happened.  Fortunately I happened to notice the 
innocuous looking line below in the catalina log file:

INFO: opening segment indexes in 
/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/segments

...and I knew that when I had crawled my site it had not created a 
directory named 'segments'.  Renaming the directory fixed this problem.

Below is the contents of the localhost.2006-04-14.log   log file.  I 
wasn't able to interpret this into anything meaningful.

Apr 14, 2006 8:51:00 AM org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NullPointerException
    at org.apache.nutch.searcher.NutchBean.init(NutchBean.java:96)
    at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:82)
    at org.apache.nutch.searcher.NutchBean.<init>(NutchBean.java:72)
    at org.apache.nutch.searcher.NutchBean.get(NutchBean.java:64)
    at org.apache.jsp.search_jsp._jspService(search_jsp.java:112)
    at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:97)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
    at 
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:332)
    at 
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314)
    at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
    at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:252)
    at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)
    at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)
    at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)
    at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)
    at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)
    at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)
    at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)
    at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)
    at 
org.apache.coyote.http11.Http11BaseProtocol$Http11ConnectionHandler.processConnection(Http11BaseProtocol.java:664)
    at 
org.apache.tomcat.util.net.PoolTcpEndpoint.processSocket(PoolTcpEndpoint.java:527)
    at 
org.apache.tomcat.util.net.LeaderFollowerWorkerThread.runIt(LeaderFollowerWorkerThread.java:80)
    at 
org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:684)
    at java.lang.Thread.run(Thread.java:595)


Re: NullPointerException due to nonexistent (mis-pointed) segments directory

Posted by Michael Levy <Lu...@gmail.com>.
I hope someone can help me with this problem.

This works fine:
#bin/nutch crawl urls.txt
and it creates a directory named something like crawl-20060418105008, 
with a working index.

However if I try to add any parameters beyond the root_url_file 
parameter I get the output below.  I'm really stumped.  The following 
does not create a directory named FOO, but it does create a directory 
named something like crawl-20060418105500, so apparently it ignores the 
-dir FOO parameter.

Help, anyone?  This happens under Solaris.  This works fine on my PC 
using cygwin but I want to run this on Solaris.  TIA!

## bin/nutch crawl urls.txt -dir FOO
060418 105308 parsing 
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-default.xml
060418 105308 parsing 
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/crawl-tool.xml
060418 105308 parsing 
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-site.xml
060418 105308 No FS indicated, using default:local
060418 105308 crawl started in: crawl-20060418105308
060418 105308 rootUrlFile = urls.txt -dir FOO
060418 105308 threads = 10
060418 105308 depth = 5
060418 105310 Created webdb at 
LocalFS,/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/crawl-20060418105308/db
Exception in thread "main" java.io.FileNotFoundException: urls.txt -dir 
FOO (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileReader.<init>(FileReader.java:55)
        at 
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)
        at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)