You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/07/17 02:54:20 UTC

[Nutch Wiki] Update of "bin/nutch segread" by RobPettengill

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by RobPettengill:
http://wiki.apache.org/nutch/bin/nutch_segread

New page:
segread is an alias for net.nutch.segment.SegmentReader

This class holds together all data readers for an existing segment. Some convenience methods are also provided, to read from the segment and to reposition the current pointer.

Usage: SegmentReader [-fix] [-dump] [-dumpsort] [-list] [-nocontent] [-noparsedata] [-noparsetext] (-dir segments | seg1 seg2 ...)[[BR]]
NOTE: at least one segment dir name is required, or '-dir' option.[[BR]]
-fix[[BR]]
  automatically fix corrupted segments[[BR]]
-dump[[BR]]
  dump segment data in human-readable format[[BR]]
-dumpsort[[BR]]
  dump segment data in human-readable format, sorted by URL[[BR]]
-list[[BR]]
  print useful information about segments[[BR]]
-nocontent[[BR]]
  ignore content data[[BR]]
-noparsedata[[BR]]
  ignore parse_data data[[BR]]
-nocontent[[BR]]
  ignore parse_text data[[BR]]
-dir segments[[BR]]
  directory containing multiple segments[[BR]]
seg1 seg2 ...[[BR]]
  segment directories[[BR]]

[CommandLineOptions]