You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2011/07/02 17:17:57 UTC
[Nutch Wiki] Update of "bin/nutch_invertlinks" by LewisJohnMcgibbney
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "bin/nutch_invertlinks" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/bin/nutch_invertlinks
Comment:
Update to reflect Nutch 1.3 API
New page:
Invertlinks is an alias for org.apache.nutch.crawl.LinkDb
This class maintains an inverted link map, listing incoming links for each url. Public class LinkDb extends Configured implements Tool, Mapper<Text, ParseData, Text, Inlinks>
Usage:
{{{
bin/nutch invertlinks <linkdb> (-dir <segmentsDir> | <seg1> <seg2> ...) [-force] [-noNormalize] [-noFilter]
}}}
'''<linkdb>''': This should be the path the the output linkdb to create or update.
'''-dir <segmentsDir>''': This corresponds to the parent directory containing several segments, OR
'''-dir <seg1 <seg2> ...''': A list of segment directories to create a inverted linkdb from.
'''[-force]: This arguement forces an update even if linkdb appears to be locked /!\ :(CAUTION advised: /!\
'''[-noNormalize]''': We pass this if we don't normalize link URLs. This obtains us a true representation of incoming links within the linkdb.
'''[-noFilter]''': This parameter avoids and doesn't apply any of our current URLFilters to link URLs.
CommandLineOptions