You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Daniel Holmes <no...@gmail.com> on 2015/09/27 16:36:41 UTC

Difference between nutch fetch list and number of indexed documents

Hi,
I am using apache Nutch 1.7 to crawl and apache Solr 4.7.2 for indexing. In
my tests there is a gap between number of fetched results of Nutch and
number of indexed documents in Solr. For example one of the crawls is
fetched 23343 pages and 1146 images successfully while in the Solr 19250
docs is indexed and 500 of them is image urls.

My question is that what kind of pages are indexed is solr and why?
Does Solr index pages whit other status or not?
what kind of images does Solr index?

Thanks.