You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Daniel Garcia <ga...@yahoo.com> on 2008/06/11 01:50:37 UTC
No results on sites other than www.apache.org
I've followed the tutorial on the Wiki site and have successfuly indexed a few pages on www.apache.com with the command
bin/nutch crawl /etc/opt/nutch/urls -dir /var/lib/nutch-crawls/test1 -depth 3 -topN 50
a query for "apache" on my local nutch/tomcat installation gives me 52 matching pages. Next I changed
/usr/local/nutch/conf/crawl-urlfilter.txt
to allow to www.circuitcity.com with +^http://www.circuitcity.com/. I also added the root page to /etc/opt/nutch/urls/circuitcity. I clear out my test run with
rm /var/lib/nutch-crawls/test1/* -Rf
and rerun my crawl
bin/nutch crawl /etc/opt/nutch/urls -dir /var/lib/nutch-crawls/test1 -depth 3 -topN 50
I looks like it downloads plenty of pages (all from circuitcity). When I try searching for anything on the tomcat/nutch app I get 0 results all the time. I can switch back to apache and the index turns up results. Is there a config file I missed somewhere?
Regards,
Daniel Garcia