You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2007/08/20 21:08:21 UTC
[Nutch Wiki] Trivial Update of "Crawl" by susam
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The following page has been changed by susam:
http://wiki.apache.org/nutch/Crawl
------------------------------------------------------------------------------
== Steps ==
The complete job of this script has been divided broadly into 8 steps.
- # Inject URLs
+ 1. Inject URLs
- # Generate, Fetch, Parse, Update Loop
+ 2. Generate, Fetch, Parse, Update Loop
- # Merge Segments
+ 3. Merge Segments
- # Invert Links
+ 4. Invert Links
- # Index
+ 5. Index
- # Dedup
+ 6. Dedup
- # Merge Indexes
+ 7. Merge Indexes
- # Reload index
+ 8. Reload index
== Modes of Execution ==
The script can be executed in two modes:-