You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by h b <hb...@gmail.com> on 2013/07/03 17:46:09 UTC
Stepwise nutch execution order
On most documents and email list, I have seen that the order of crawl for
nutch-solr is
inject
loop
generate
fetch
updatedb
parse
end loop
solr
When I follow this path I always see solr has 0 docs, even if i run solr
inside the loop, i still get 0 docs in solr.
However, if I switch the order of updatedb and parse, then it works as I
expect it to.
Would be nice to know what could be going on here.
Re: Stepwise nutch execution order
Posted by Tejas Patil <te...@gmail.com>.
The correct order is:
inject
loop
generate
fetch
parse
updatedb
end loop
solr
The nutch tutorial [0] and the crawl script are using the same.
[0] : http://wiki.apache.org/nutch/NutchTutorial
[1] : http://svn.apache.org/viewvc/nutch/trunk/src/bin/crawl?view=markup
On Wed, Jul 3, 2013 at 8:46 AM, h b <hb...@gmail.com> wrote:
> On most documents and email list, I have seen that the order of crawl for
> nutch-solr is
>
> inject
> loop
> generate
> fetch
> updatedb
> parse
> end loop
> solr
>
> When I follow this path I always see solr has 0 docs, even if i run solr
> inside the loop, i still get 0 docs in solr.
>
> However, if I switch the order of updatedb and parse, then it works as I
> expect it to.
>
> Would be nice to know what could be going on here.
>