You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@nutch.apache.org by sub paul <su...@gmail.com> on 2006/04/13 16:54:19 UTC

nutch 0.7.2 webapp on resin3

I tried out nutch webapp on resin3, it had issues.

First issue was that I would get nothing but  500 Servlet Error, and the
world null when I tried to search.

I didn't follow my own suggestions that I had posted here :
http://wiki.apache.org/nutch/GettingNutchRunningWithResin

After I changed the the system properties, it was fine.

It took me a while to realize that it was the xml parser that was causing
the issues.. I tried debugging the OnlineClustererFactory's getOnlineCluster
but the problem is, it never gets there.. Issues comes up when
OnlineClustereFactory's X_POINT static member is being loaded. This meant
that search.jsp's servlet never loaded and it was always trying to compile.
Since I didn't have a java 1.4 logging configured properly, I didn't see
many error message either.

However, I was to get it to run by adding just following two lines in the
resin conf (they ask resin to use xercres vs resin's own xml parser)

    <system-property javax.xml.parsers.DocumentBuilderFactory="
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl"/>
    <system-property javax.xml.parsers.SAXParserFactory="
org.apache.xerces.jaxp.SAXParserFactoryImpl"/>
and this line gave me a little more information abotu what was going on:
    <system-property
java.util.logging.config.files='/home/paul/java1.4logging.conf'/>

Another issue I ran into was that most of the language directories are
missing footer.html, and in search.jsp expects the footer.jsp to be in
languages directory..

towards the end you see:
<jsp:include page="<%= language + "/include/footer.html"%>"/>

I had to change it to
<jsp:include page="/include/footer.html"/>

which seems to be the right thing  to do regardless as footer.html only
exists in that directory and does not seem to have language specific "stuff"
in it.

FileNotFoundException on crawl

Posted by Michael Levy <Lu...@gmail.com>.

I'm running Nutch 0.7.2 under Solaris 9, java 1.5.0_06.  I followed the 
Nutch version 0.8 tutorial and am getting a FileNotFoundException as 
below.  Any ideas?  Thanks.

# bin/nutch crawl urls -dir crawl -depth 3 -topN 50
060413 150039 parsing 
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-default.xml
060413 150040 parsing 
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/crawl-tool.xml
060413 150041 parsing 
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-site.xml
060413 150041 No FS indicated, using default:local
060413 150041 crawl started in: crawl-20060413150041
060413 150041 rootUrlFile = urls -dir crawl -depth 3 -topN 50
060413 150041 threads = 10
060413 150041 depth = 5
060413 150043 Created webdb at 
LocalFS,/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/crawl-20060413150041/db
Exception in thread "main" java.io.FileNotFoundException: urls -dir 
crawl -depth 3 -topN 50 (No such file or directory)
        at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:106)
        at java.io.FileReader.<init>(FileReader.java:55)
        at 
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)
        at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
        at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
# Exception in thread "main" java.io.FileNotFoundException