You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2011/07/02 17:17:57 UTC

[Nutch Wiki] Update of "bin/nutch_invertlinks" by LewisJohnMcgibbney

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/nutch_invertlinks" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch_invertlinks

Comment:
Update to reflect Nutch 1.3 API

New page:
Invertlinks is an alias for org.apache.nutch.crawl.LinkDb

This class maintains an inverted link map, listing incoming links for each url. Public class LinkDb extends Configured implements Tool, Mapper<Text, ParseData, Text, Inlinks>

Usage:

{{{
bin/nutch invertlinks <linkdb> (-dir <segmentsDir> | <seg1> <seg2> ...) [-force] [-noNormalize] [-noFilter]
}}}

'''<linkdb>''': This should be the path the the output linkdb to create or update.

'''-dir <segmentsDir>''': This corresponds to the parent directory containing several segments, OR

'''-dir <seg1 <seg2> ...''': A list of segment directories to create a inverted linkdb from.

'''[-force]: This arguement forces an update even if linkdb appears to be locked /!\ :(CAUTION advised: /!\

'''[-noNormalize]''': We pass this if we don't normalize link URLs. This obtains us a true representation of incoming links within the linkdb.

'''[-noFilter]''': This parameter avoids and doesn't apply any of our current URLFilters to link URLs.


CommandLineOptions