You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Muhamad Muchlis <tr...@gmail.com> on 2015/10/01 05:26:32 UTC
Re-Crawling Basic Syntax - newbie
Hi,
I have manual script for my first crawl, anyone can explain this command
step by step:
*Initialize the crawldb*
bin/nutch inject urls/
*Generate URLs from crawldb*
bin/nutch generate -topN 80
*Fetch generated URLs*
bin/nutch fetch -all
*Parse fetched URLs*
bin/nutch parse -all
*Update database from parsed URLs*
bin/nutch updatedb -all
*Index parsed URLs*
bin/nutch index -all
anyone can help me how re-crawling script.
Thanks
Regard's
Muchlis