You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Muhamad Muchlis <tr...@gmail.com> on 2015/10/01 05:26:32 UTC

Re-Crawling Basic Syntax - newbie

Hi,

I have manual script for my first crawl, anyone can explain this command
step by step:

*Initialize the crawldb*
bin/nutch inject urls/
*Generate URLs from crawldb*
bin/nutch generate -topN 80
*Fetch generated URLs*
bin/nutch fetch -all
*Parse fetched URLs*
bin/nutch parse -all
*Update database from parsed URLs*
bin/nutch updatedb -all
*Index parsed URLs*
bin/nutch index -all

anyone can help me  how re-crawling script.



Thanks


Regard's

Muchlis