You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Daniel Garcia <ga...@yahoo.com> on 2008/06/11 01:50:37 UTC

No results on sites other than www.apache.org

I've followed the tutorial on the Wiki site and have successfuly indexed a few pages on www.apache.com with the command

bin/nutch crawl /etc/opt/nutch/urls -dir /var/lib/nutch-crawls/test1 -depth 3 -topN 50

a query for "apache" on my local nutch/tomcat installation gives me  52 matching pages. Next I changed

/usr/local/nutch/conf/crawl-urlfilter.txt

to allow to www.circuitcity.com with +^http://www.circuitcity.com/. I also added the root page to /etc/opt/nutch/urls/circuitcity. I clear out my test run with

rm /var/lib/nutch-crawls/test1/* -Rf

and rerun my crawl

bin/nutch crawl /etc/opt/nutch/urls -dir /var/lib/nutch-crawls/test1 -depth 3 -topN 50

I looks like it downloads plenty of pages (all from circuitcity). When I try searching for anything on the tomcat/nutch app I get 0 results all the time. I can switch back to  apache and the index turns up results. Is there a  config file I missed somewhere?

Regards,
Daniel Garcia