You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by sub paul <su...@gmail.com> on 2006/04/13 16:54:19 UTC
nutch 0.7.2 webapp on resin3
I tried out nutch webapp on resin3, it had issues.
First issue was that I would get nothing but 500 Servlet Error, and the
world null when I tried to search.
I didn't follow my own suggestions that I had posted here :
http://wiki.apache.org/nutch/GettingNutchRunningWithResin
After I changed the the system properties, it was fine.
It took me a while to realize that it was the xml parser that was causing
the issues.. I tried debugging the OnlineClustererFactory's getOnlineCluster
but the problem is, it never gets there.. Issues comes up when
OnlineClustereFactory's X_POINT static member is being loaded. This meant
that search.jsp's servlet never loaded and it was always trying to compile.
Since I didn't have a java 1.4 logging configured properly, I didn't see
many error message either.
However, I was to get it to run by adding just following two lines in the
resin conf (they ask resin to use xercres vs resin's own xml parser)
<system-property javax.xml.parsers.DocumentBuilderFactory="
org.apache.xerces.jaxp.DocumentBuilderFactoryImpl"/>
<system-property javax.xml.parsers.SAXParserFactory="
org.apache.xerces.jaxp.SAXParserFactoryImpl"/>
and this line gave me a little more information abotu what was going on:
<system-property
java.util.logging.config.files='/home/paul/java1.4logging.conf'/>
Another issue I ran into was that most of the language directories are
missing footer.html, and in search.jsp expects the footer.jsp to be in
languages directory..
towards the end you see:
<jsp:include page="<%= language + "/include/footer.html"%>"/>
I had to change it to
<jsp:include page="/include/footer.html"/>
which seems to be the right thing to do regardless as footer.html only
exists in that directory and does not seem to have language specific "stuff"
in it.
FileNotFoundException on crawl
Posted by Michael Levy <Lu...@gmail.com>.
I'm running Nutch 0.7.2 under Solaris 9, java 1.5.0_06. I followed the
Nutch version 0.8 tutorial and am getting a FileNotFoundException as
below. Any ideas? Thanks.
# bin/nutch crawl urls -dir crawl -depth 3 -topN 50
060413 150039 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-default.xml
060413 150040 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/crawl-tool.xml
060413 150041 parsing
file:/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/conf/nutch-site.xml
060413 150041 No FS indicated, using default:local
060413 150041 crawl started in: crawl-20060413150041
060413 150041 rootUrlFile = urls -dir crawl -depth 3 -topN 50
060413 150041 threads = 10
060413 150041 depth = 5
060413 150043 Created webdb at
LocalFS,/export/home/www/virtual/wiki/doc_root/nutch-0.7.2/crawl-20060413150041/db
Exception in thread "main" java.io.FileNotFoundException: urls -dir
crawl -depth 3 -topN 50 (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at java.io.FileReader.<init>(FileReader.java:55)
at
org.apache.nutch.db.WebDBInjector.injectURLFile(WebDBInjector.java:372)
at org.apache.nutch.db.WebDBInjector.main(WebDBInjector.java:535)
at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:134)
# Exception in thread "main" java.io.FileNotFoundException