You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by h b <hb...@gmail.com> on 2013/07/03 17:46:09 UTC

Stepwise nutch execution order

On most documents and email list, I have seen that the order of crawl for
nutch-solr is

inject
loop
  generate
  fetch
  updatedb
  parse
end loop
solr

When I follow this path I always see solr has 0 docs, even if i run solr
inside the loop, i still get 0 docs in solr.

However, if I switch the order of updatedb and parse, then it works as I
expect it to.

Would be nice to know what could be going on here.

Re: Stepwise nutch execution order

Posted by Tejas Patil <te...@gmail.com>.
The correct order is:

inject
loop
  generate
  fetch
  parse
  updatedb
end loop
solr

The nutch tutorial [0] and the crawl script are using the same.

[0] : http://wiki.apache.org/nutch/NutchTutorial
[1] : http://svn.apache.org/viewvc/nutch/trunk/src/bin/crawl?view=markup


On Wed, Jul 3, 2013 at 8:46 AM, h b <hb...@gmail.com> wrote:

> On most documents and email list, I have seen that the order of crawl for
> nutch-solr is
>
> inject
> loop
>   generate
>   fetch
>   updatedb
>   parse
> end loop
> solr
>
> When I follow this path I always see solr has 0 docs, even if i run solr
> inside the loop, i still get 0 docs in solr.
>
> However, if I switch the order of updatedb and parse, then it works as I
> expect it to.
>
> Would be nice to know what could be going on here.
>