You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2012/02/11 19:36:01 UTC

[Nutch Wiki] Trivial Update of "bin/nutch webgraph" by LewisJohnMcgibbney

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/nutch webgraph" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch%20webgraph?action=diff&rev1=1&rev2=2

  The Inlink database is created from the Outlink database and is regenerated when the WebGraph is updated. The Node database is created from both the Inlink and Outlink databases. Because the Node database is overwritten when the WebGraph is updated and because the Node database holds current scores for urls it is recommended that a crawl-cyle (one or more full crawls) should be fully complete before the WebGraph is updated and some type of analysis, such as LinkRank, is run to update scores in the Node database in a stable fashion.
  
  Usage: 
+ 
  {{{
- bin/nutch webgraph 
+ bin/nutch webgraph (-segment <segment> | -segmentDir <segmentDir> | -webgraphdb <webgraphdb>) [-filter -normalize] | -help
- }}}
+ }}} 
+ 
+ '''-segment <segment>''': The location of a segment(s) we wish to read and obtain information from.
+ 
+ '''-segmentDir <segmentDir>''': The location of a segment(s) directory we wish to read and obtain information from.
+ 
+ '''-webgraphdb <webgraphdb>''': The web graph database to use.
+ 
+ '''[-filter]''': Whether to use URLFilters on the URLs in the segment(s).
+ 
+ '''[-normalize]''': Whether to use URLNormalizers on the URLs in the segment(s).
+ 
+ ''' -help''': Prints the above output message, even if this is not included help message is still displayed to cmdout.
  
  
- CommandLineOptions
+ <<< CommandLineOptions