You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nutch.apache.org by Apache Wiki <wi...@apache.org> on 2005/11/16 01:12:34 UTC

[Nutch Wiki] Trivial Update of "OverviewDeploymentConfigs" by PaulBaclace

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by PaulBaclace:
http://wiki.apache.org/nutch/OverviewDeploymentConfigs

The comment on the change is:
fixed up where rsync happens

------------------------------------------------------------------------------
   1. start_all.sh or stop_all.sh - start and stop whole ensemble.
   2. nutch_daemons.sh - run a Nutch command on all slave hosts.
   3. slaves.sh - run a shell command on all slave hosts.
-  4. nutch_daemon.sh - run a Nutch command as a daemon with a start|stop argument like a regular Unix/Linux /etc/rc.local script; the process pid is stored during start and used during stop.  Runs rsync at start.
+  4. nutch_daemon.sh - run a Nutch command as a daemon with a start|stop argument like a regular Unix/Linux /etc/rc.local script; the process pid is stored during start and used during stop.  At start, runs rsync to master initiated on slave.
-  5. nutch - run a Nutch command using the JVM.
+  5. nutch - run a Nutch command, specified either as a command name or full path to a class, using the JVM.
  
  Depending upon the context of use, any level of these scripts can be handy on the command line.
  
@@ -56, +56 @@

  
   A. Cluster deployment with too many machines to customize (probably more than 4; 1000 machines should be possible):
  
-   6. bin/slaves.sh rsync-command is used as needed to update jars and conf files from master.
-   7. the ensemble starts by running bin/start-all.sh on the master.
+   6. the ensemble starts by running bin/start-all.sh on the master.
-   8. start-all.sh uses bin/nutch-daemons.sh run one datanode process on each slave (in the background without waiting, one daemon thread is started per comma-separated storage device, non-existent storage devices in the list are ignored).
+   7. start-all.sh uses bin/nutch-daemons.sh which uses nutch-daemon.sh to run rsync (to update jars and conf files from master) and then run one datanode process on each slave (in the background without waiting, one daemon thread is started per comma-separated storage device, non-existent storage devices in the list are ignored).
-   9. start-all.sh runs one namenode and one jobtracker on the master.
+   8. start-all.sh runs one namenode and one jobtracker on the master.
-   10. start-all.sh uses bin/nutch-daemons.sh run one tasktracker process on each slave (in the background without waiting).
+   9. start-all.sh uses bin/nutch-daemons.sh run one tasktracker process on each slave (in the background without waiting).
  
  
   B. Cluster of a few machines: