You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2014/06/18 13:15:27 UTC

[Nutch Wiki] Update of "bin/nutch inject" by JulienNioche

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The "bin/nutch inject" page has been changed by JulienNioche:
https://wiki.apache.org/nutch/bin/nutch%20inject?action=diff&rev1=2&rev2=3

  
  '''<url_dir>''': The directory containing our seed list (referred to above as 'flat file'), usually a text document containing URLs, one URL per line.
  
+ The injector uses the following configurations (see https://issues.apache.org/jira/browse/NUTCH-1405)
+ 
+ * db.injector.overwrite = [true|false] : replace the entries in the crawldb with the corresponding ones from the seed data. Will set the status to UNFETCHED.
+ 
+ * db.injector.update = [true|false] : Keeps the existing entries in the crawldb but replaces the score and fetch interval with the values found for the corresponding entries in the seed data. Any metadata found for the seed entry are added. The status remains what it was in the original version of the crawldb, e.g. FETCHED.
+ 
  === Nutch 2.x ===
  {{{
  Usage: InjectorJob <url_dir> [-crawlId <id>]