You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2018/07/26 12:17:02 UTC

[Nutch Wiki] Update of "bin/nutch fetch" by SebastianNagel

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/nutch fetch" page has been changed by SebastianNagel:
https://wiki.apache.org/nutch/bin/nutch%20fetch?action=diff&rev1=3&rev2=4

Comment:
Remove -noParsing (removed from Fetcher by NUTCH-1102)

  == Nutch 1.x ==
  
  {{{
- Usage: bin/nutch fetch <segment> [-threads n] [-noParsing]
+ Usage: bin/nutch fetch [-D...] <segment> [-threads n]
  }}}
  
  '''<segment>''': This is the path to the previously generated segment directory we wish to fetch.
  
  '''[-threads n]''': This argument invokes the number of threads we wish to work concurrently on fetching URLs in the desired segment e.g. the number of fetcher threads the fetcher should use. This is also determines the maximum number of requests that are made at once (each fetcher thread handles one connection).
  
- '''[-noParsing]''': If no argument is passed this value is the default, as set in nutch-default.xml. This is the case due to errors which can occur when parsing segments. If parsing errors occur then the results of the whole fetching process can be corrupted. Note that parsing will only follow meta-redirects coming from the original URL.
+ '''[-D...]''': overwrite a Nutch/Hadoop property from command-line, e.g.
+ 
+ '''[-Dfetcher.parse=true]''': Make fetcher parse documents, overwriting the default value defined in nutch-default.xml or the setting in nutch-site.xml.
  
  == Nutch 2.x ==