You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Chip Calhoun <cc...@aip.org> on 2017/02/03 16:45:46 UTC

Failing to index from Nutch 1.12 to Solr 5.5.3

I'm switching to more recent Nutch/Solr, after years of using Nutch 1.4 and Solr 3.3.0. I get no results when I index into Solr. I can't tell where this breaks down.

I use these commands:
cd /opt/apache-nutch-1.12/runtime/local
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.121.x86_64
export NUTCH_CONF_DIR=/opt/apache-nutch-1.12/runtime/local/conf/phfaws
bin/crawl urls/phfaws crawl/phfaws 1
bin/nutch solrindex http://localhost:8983/solr/phfaws/ crawl/phfaws/crawldb -linkdb crawl/phfaws/linkdb crawl/phfaws/segments/*

I believe that Nutch is crawling properly, but I do find that the crawl folders end up about 25% as large as what I produced with Nutch 1.4. I suspect that the problem is with the Nutch/Solr integration. My Solr core didn't create a schema.xml, instead having a managed scheme. I've copied my Nutch local conf's schema.xml into Solr, but I haven't seen that I'm supposed to do anything more with that.


Chip Calhoun
Digital Archivist
Niels Bohr Library & Archives
American Institute of Physics
One Physics Ellipse
College Park, MD  20740
301-209-3180
https://www.aip.org/history-programs/niels-bohr-library