You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by ch...@apache.org on 2013/06/11 22:16:14 UTC

svn commit: r1491938 - in /uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook: part1/ part2/ part2/cli/ part4/admin/

Author: challngr
Date: Tue Jun 11 20:16:14 2013
New Revision: 1491938

URL: http://svn.apache.org/r1491938
Log:
UIMA-2682 Duccbook updates.

Modified:
    uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex
    uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/terminology.tex
    uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-cancel.tex
    uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-reserve.tex
    uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-submit.tex
    uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex
    uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-nodepools.tex

Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex?rev=1491938&r1=1491937&r2=1491938&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex Tue Jun 11 20:16:14 2013
@@ -29,7 +29,7 @@
     order to achieve good scale-out these components must be constructed in a specific way.
 
     The Collection Reader builds input CASs and forwards them to the UIMA pipelines.  In the DUCC
-    model, the CR is run in a process separate from the rest of the pipeline. In face, in all but the
+    model, the CR is run in a process separate from the rest of the pipeline. In fact, in all but the
     smallest clusters it is run on a different physical machine than the rest of the pipeline.  To
     achieve scalability, the CR must create very small CASs that do not contain application data,
     but which contain references to data; for instance, file names.  Ideally, the CR should be
@@ -43,9 +43,8 @@
     cluster.
 
     DUCC does not provide any mechanism for receiving output CASs.  Each application must
-    supply its own CAS Consumer which serializes the output of the Analytic Engines for 
-    consumption by other entities (as serialized CASs, perhaps, or as some other form of
-    data, depending on what the other entities are.).
+    supply its own CAS Consumer which serializes the output of the pipelines for 
+    consumption by other entities (as serialized CASs, for example).
 
     A DUCC job therefore consists of a small specification containing the following items:
     
@@ -94,9 +93,9 @@
 
     \paragraph{UIMA-AS  Scaled Pipeline}
     With UIMA-AS the CR is separated into a discrete process and a CAS Multiplier is introduced 
-    into the analytic pipeline as an interface between the CR and the pipeline, as shown in
+    into the pipeline as an interface between the CR and the pipeline, as shown in
     \hyperref[fig:UIMA-AS-pipeline]{Figure ~\ref{fig:UIMA-AS-pipeline}} below.
-    Multiple analytic pipelines are serviced by the 
+    Multiple pipelines are serviced by the 
     CR and are scaled-out over a computing cluster.  The difficulty with this model is that each
     user is individually responsible for finding and scheduling computing nodes, installing
     communication software such as ActiveMQ, and generally managing the distributed job and
@@ -120,7 +119,7 @@
     threads per process (as indicated in the DUCC job parameters), and generates job-unique queues.
 
     Under DUCC, the Collection Reader is executed in a process called the Job Driver (or JD). The 
-    analytic pipelines are executed in one or more processes called Job Processes (or JPs). The JD 
+    pipelines are executed in one or more processes called Job Processes (or JPs). The JD 
     process provides a thin wrapper over the CR to enable communication with DUCC.  The JD uses the
     CR to implement a UIMA-AS client delivering CASs to the multiple (scaled-out) pipelines, 
     shown in \hyperref[fig:UIMA-AS-pipeline-DUCC]{Figure ~\ref{fig:UIMA-AS-pipeline-DUCC}} below.
@@ -152,33 +151,36 @@
       \item DUCC uses the UIMA-AS error-handling facilities to reflect errors from the Job Processes
         to the Job Drivers. The JD wrappers implement logic to enforce error thresholds, to identify
         and log errors, and to reflect job problems in the DUCC Web Server.  All error thresholds are
-        configurable globally, and on a per-job basis.
+        configurable both globally and on a per-job basis.
 
       \item Error and timeout thresholds are implemented for both the initialization phase of a pipeline
         and the execution phase.
     
       \item Retry-after-error is supported: if a process has a failure on some CAS after
-        initialization is successful, the process is terminated and the CAS retried, up to some
+        initialization is successful, the process is terminated and all affected CASs are retried, up to some
         configurable threshold.
 
       \item DUCC insures that processes can successfully initialize before fully scaling out a job,
         to insure a cluster is not overwhelmed with errant processes.
+
+      \item Various error conditions encountered  while a job is running will prevent the errant job
+        from continuing scale out, and can result in termination of the job.
       \end{itemize}
       
     \section{Cluster and Job Management}
-    DUCC provides significant support for managing multiple jobs and multiple users in a distributed cluster:
+    DUCC supports  management of multiple jobs and multiple users in a distributed cluster:
 
     \begin{description}
-        \item[Multiple User Support] DUCC runs all work under the identity of the submitting user to
-          provide security and privacy for each user and job. Logs are written with the
-          user's credentials into the user's file space designated at job submission.
+        \item[Multiple User Support] DUCC runs all work under the identity of the submitting user. Logs
+          are written with the user's credentials into the user's file space designated at job
+          submission.
 
         \item[Fair-Share Scheduling] DUCC provides a Fair-Share scheduler to equitably share
           resources among multiple users.  The scheduler also supports semi-permanent reservation of
           full or partial machines.
 
         \item[Service Management] DUCC provides a Service Manager capable of automatically starting, stopping, and
-          otherwise managing and querying services in support of jobs.
+          otherwise managing and querying both UIMA-AS and non-UIMA-AS services in support of jobs.
 
         \item[Job Lifetime Management and Orchestration] DUCC includes an Orchestrator to manage the
           lifetimes of all entities in the system.
@@ -188,7 +190,7 @@
           \begin{itemize}
             \item Monitors and reports node capabilities (memory, etc) and performance data (CPU busy,
               swap, etc).
-            \item Starts and stops all processes on behalf of users.
+            \item Starts, stops, and monitors all processes on behalf of users.
             \item Patrols the node for ``foreign'' (non-DUCC) processes, reporting them to the
               Web Server, and optionally reaping them.
             \item Insures job processes to not exceed their declared memory requirements
@@ -213,7 +215,7 @@
           \end{itemize}
 
 
-        \item[Cluster Management Support] DUCC provides rich scripting support to:
+        \item[Cluster Management Support] DUCC provides system management support to:
           \begin{itemize}
               \item Start, stop, and query full DUCC systems.
  

Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/terminology.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/terminology.tex?rev=1491938&r1=1491937&r2=1491938&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/terminology.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/terminology.tex Tue Jun 11 20:16:14 2013
@@ -6,7 +6,7 @@
 \chapter{Glossary}
 
 \begin{description}
-\item[Autostart Service] An autostart service is a registered service that is started automatically
+\item[Autostarted Service] An autostarted service is a registered service that is started automatically
   by DUCC when the DUCC system is booted.
 
 \item[Dependent service or job] A dependent service or job is a service or job that specifies one
@@ -15,7 +15,7 @@
 
 \item[DUCC] Distributed UIMA Cluster Computing.
 
-\item[External service] An external service is a service that is started externally to DUCC but
+\item[Implicit service] An implicit service is a service that is started externally to DUCC but
   referenced by some dependent service or job.  DUCC will attempt to contact the service using
   the dependency string.  If contact is successful the job is started, otherwise it is 
   terminated before resources are allocated to it.
@@ -24,34 +24,27 @@
   saves the service specification and fully manages the service, insuring it is running when needed,
   and shutdown when not.
 
-\item[Start-by-Reference Service] An on-demand service is a registered service that is not started when DUCC
-  is started. Instead, the service is started when referenced in some job or services service
-  dependency, and stopped when the referencing entity exits.
-
 \item[Service Instance] A service instance is one physical process which runs a CUSTOM or UIMA-AS
-  service.  Note that UIMA-AS services may be scaled-out to comprise more than one service instance.
+  service.  UIMA-AS services are usually scaled-out with multiple instances implementing the
+  same underlying service logic.
 
-\item[Orchestrator (OR)] The Orchestrator manages the lifecycle of all entities within DUCC.
+\item[Orchestrator (OR)] The Orchestrator manages the life cycle of all entities within DUCC.
 
 \item[Process Manager (PM) ] The Process Manager coordinates distribution of work among the Agents.
 
 \item[Resource Manager (RM) ] The Resource Manager schedules physical resources for DUCC work.
 
-\item[Service Endpoint] In DUCC, the service endpoint provides a unique identifier for a service
-  and in the case of UIMA-AS services, a well-known address for contacting the service. For CUSTOM
-  services, the endpoint is of the form CUSTOM:string where string is any alphanumeric string
-  provided by the service owner. For UIMA-AS services, the endpoint is of the form UIMA-AS:queue
-  name:ActiveMQ-broker-URL.
+\item[Service Endpoint] In DUCC, the service endpoint provides a unique identifier for a service. In
+  the case of UIMA-AS services, the endpoint also serves as a well-known address for contacting the
+  service. 
 
 \item[Service Manager (SM)] The Service Manager manages the life-cycles of UIMA-AS and CUSTOM
   services. It coordinates registration of services, starting and stopping of services, and ensures
-  that services are available and remain available for the lifetime of the jobs.  Note that the
-  Orchestrator manages the individual service instances; the Service Manager manages the collection
-  of instances which comprise a service.
+  that services are available and remain available for the lifetime of the jobs.
 
 \item[Agent] DUCC Agent processes run on every node in the system. The Agent receives orders to
   start and stop processes on each node. Agents monitors nodes, sending heartbeat packets with node
-  statistics to interested components (such as the RM and web-server). If CGroups are intstalled in
+  statistics to interested components (such as the RM and web-server). If CGroups are installed in
   the cluster, the Agent is responsible for managing the CGroups for each job process. All processes
   other than the DUCC management processes are are managed as children of the agents.
 
@@ -65,15 +58,15 @@
 
 \item[Job specification] The Job Specification is a collection of properties that describe work to be
   scheduled and deployed by DUCC. It
-  identifies the UIMA components (CR, AE, etc) that comprise the job and the ystem-wide
-  properties of the job (classpaths, RAM requirements, etc). 
+  identifies the UIMA components (CR, AE, etc) that comprise the job and the system-wide
+  properties of the job (CLASSPATHs, RAM requirements, etc). 
 
 \item[Job] A DUCC job consists of the components required to deploy and execute a UIMA pipeline over
   a computing cluster. It consist of a JD to run the Collection Reader, a set of JPs to run the UIMA
   AEs, and a Job Specification to describe how the parts fit together.
 
 \item[Share Quantum] The DUCC scheduler abstracts the nodes in the cluster as a single large
-  congomerate of resources: memory, processor cores, etc.  The scheduler logically decomposes 
+  conglomerate of resources: memory, processor cores, etc.  The scheduler logically decomposes 
   the collection of resources into some number of equal-sized atomic units.  Each unit of work requiring
   resources is apportioned one or more of these atomic units.  The smallest possible atomic 
   unit is called the {\em share quantum}, or simply, {\em share}.
@@ -85,13 +78,13 @@
 \item[Weighted Fair Share] A weighted fair share calculation is used to apportion resources
   equitably to the outstanding work in the system.  In a non-weighted fair-share system, all
   work requests are given equal consideration to all resources.  To provide some (``more important'')
-  work more than equal resources, weights are used to give larger proportions of the resources to
+  work more than equal resources, weights are used to bias the allotment of shares in favor of
   some classes of work.
 
-\item[Work Items] A DUCC work item is one unit of work to be completed in a single DUCC process. It is
-  usually initiated by the submission of a single CAS from the CR to a UIMA service. It could be
-  thought of as a single "question" to be answered by a UIMA analytic. Usually each DUCC JP executes
-  many work items per job.
+\item[Work Items] A DUCC work item is one unit of work to be completed in a single DUCC process. It
+  is usually initiated by the submission of a single CAS from the CR to a UIMA service. It could be
+  thought of as a single ``question'' to be answered by a UIMA analytic, or a single ``task'' to
+  complete. Usually each DUCC JP executes many work items per job.
 \end{description}
 
 

Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-cancel.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-cancel.tex?rev=1491938&r1=1491937&r2=1491938&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-cancel.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-cancel.tex Tue Jun 11 20:16:14 2013
@@ -24,7 +24,7 @@
         \item[--id {[jobid]}]
           The ID is the id of the job to cancel.
         \item[--reason {[quoted string]}]
-          Optional. This specifies the reason the job is canceled, for display in the web server. Note that
+          Optional. This specifies the reason the job is canceled for display in the web server. Note that
           the shell requires a quoted string.  Example:
 \begin{verbatim}
 ducc_cancel --id 12 --reason "This is a pretty good reason."
@@ -38,7 +38,7 @@ ducc_cancel --id 12 --reason "This is a 
           Prints the usage text to the console. 
         \item[--role\_administrator] The command is being issued in the role of a DUCC administrator.
           If the user is not also a registered administrator this flag is ignored.  (This helps to
-          protect administrators from inadvertantly canceling jobs they do not own.)
+          protect administrators from accidentally canceling jobs they do not own.)
      \end{description}
         
     \paragraph{Notes:}

Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-reserve.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-reserve.tex?rev=1491938&r1=1491937&r2=1491938&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-reserve.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-reserve.tex Tue Jun 11 20:16:14 2013
@@ -12,7 +12,7 @@
     performeed on an "all-or-nothing" basis: either the entire set of requested resources is reserved, 
     or the reservation request fails. 
 
-    All forms of the ducc\_reserve block until the reservation is complete at which point the DUCC
+    All forms of ducc\_reserve block until the reservation is complete (or fails) at which point the DUCC
     ID of the reservation and the names of the reserved nodes are printed to the console and the
     command returns.
 

Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-submit.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-submit.tex?rev=1491938&r1=1491937&r2=1491938&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-submit.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/cli/ducc-submit.tex Tue Jun 11 20:16:14 2013
@@ -22,56 +22,53 @@
         \paragraph{Options:}
            \begin{description}
 
-           \item[--all\_in\_one $<$local $|$ remote $>$]
+           \item[$--$all\_in\_one $<$local $|$ remote $>$]
                Run driver and pipeline in single process.  If {\em local} is specified, the
                process is executed on the local machine, for example, in the current Eclipse session.
                If {\em remote} is specified, the jobs is submitted to DUCC as a {\em manged reservation}
                and run on some (presumably larger) machine allocated by DUCC.
 
-           \item[--cancel\_job\_on\_interrupt].  If the job is started with --wait\_for\_completion, this
+           \item[$--$cancel\_on\_interrupt].  If the job is started with $--$wait\_for\_completion, this
              option causes the job to be canceled if the submit command is terminated,
-             e.g., with CTL-C. If --cancel\_job\_on\_interrupt is not
+             e.g., with CTL-C. If $--$cancel\_job\_on\_interrupt is not
              specified, the job monitor will be terminated but the job will continue to run.
 
-             If --wait\_for\_completion is not specified this option is ignored. 
+             If $--$wait\_for\_completion is not specified this option is ignored. 
 
-           \item[--classpath] The CLASSPATH used for the job.  If specified, this is used
-             for both the driver and each process. If not specified the classpath found by the underlying
+           \item[$--$classpath] The CLASSPATH used for the job.  If specified, this is used
+             for both the Job Driver and each Job Process. If not specified the CLASSPATH found by the underlying
              {\tt DuccJobSubmit.main()} method is used.
 
-           \item[--classpath\_order {[UserBeforeDucc | DuccBeforeUser]} ]
-             When DUCC deploys a process, set the user-supplied classpath before DUCC-supplied
-             classpath, or the reverse.
+           \item[$--$classpath\_order {[UserBeforeDucc $|$ DuccBeforeUser]} ]
+             When DUCC deploys a process, set the user-supplied CLASSPATH before DUCC-supplied
+             CLASSPATH, or the reverse.
              
-           \item[--debug] Enable debugging messages. This is primarily for debugging DUCC itself.
+           \item[$--$debug] Enable debugging messages. This is primarily for debugging DUCC itself.
 
-           \item[--description {[text]}] The text is any string used to describe the job. It is
-             displayed in the Web Server.             
+           \item[$--$description {[text]}] The text is any string used to describe the job. It is
+             displayed in the Web Server. When specified on a command-line the text usually 
+             must be surrounded by quotes to protect it from the shell.
 
-           \item[--driver\_attach\_console] If specified, redirect remote job driver stdout and stderr
+           \item[$--$driver\_attach\_console] If specified, redirect remote job driver stdout and stderr
              to the local submitting console.
 
-           \item[--driver\_classpath {[classpath]}]
-             This is the classpath for the Job Driver, necessary for DUCC to find the Collection Reader. 
-
-           \item[--driver\_debug {[debugger-address]}] Append JVM debug flags to the JVM arguments
+           \item[$--$driver\_debug {[debugger-address]}] Append JVM debug flags to the JVM arguments
              to start the JobDriver in remote debug mode.  The remote process debugger will attempt
              to contact the specified port. The address is of the form {\tt host:port}.
 
-           \item[--driver\_descriptor\_CR {[descriptor.xml]} ] This is the XML descriptor for the
+           \item[$--$driver\_descriptor\_CR {[descriptor.xml]} ] This is the XML descriptor for the
              Collection Reader.  This descriptor is a resource that is searched for in the CLASSPATH
              and data path as described in the ~\hyperref[par:cli.submit.notes]{notes below}.
 
-           \item[--driver\_descriptor\_CR\_overrides {[list]} ]
-             
+           \item[$--$driver\_descriptor\_CR\_overrides {[list]} ]             
              This is the Job Driver collection reader configuration overrides. They are specified as 
-             name/value pairs in a comma-delimeted list. For example: 
+             name/value pairs in a comma-delimited list. For example: 
              \begin{verbatim}
---driver\_descriptor\_CR\_overrides name1=value1,name2=value2...
+--driver_descriptor_CR_overrides name1=value1,name2=value2...
              \end{verbatim}
              
              
-%           \item[--driver\_environment {[list]} ]
+%           \item[$--$driver\_environment {[list]} ]
 %
 %             This specifies environment parameters for the Job Driver. If present, they are added to the 
 %             Job Driver's environment as the process is spawned. It must be a quoted, blank-delimeted 
@@ -90,46 +87,45 @@
 %"--process\_environment TERM=xterm DISPLAY=:1.0 DUCC\_LD\_LIBRARY\_PATH=/my/own/
 %            \end{verbatim}
 
-           \item[--driver\_exception\_handler {[classname]}] This specifies a developer-supplied
-             exception handler for the Job Driver.  It must
-             implement org.apache.uima.ducc.common.jd.plugin.IJdProcessExceptionHandler.
+%           \item[$--$driver\_exception\_handler {[classname]}] This specifies a developer-supplied
+%             exception handler for the Job Driver.  It must
+%             implement org.apache.uima.ducc.common.jd.plugin.IJdProcessExceptionHandler.
 
-           \item[--driver\_jvm\_args {[list]} ]
+           \item[$--$driver\_jvm\_args {[list]} ]
 
-             This specifes extra JVM arguments to be provided to the Job Driver process. It is a blank delimeted 
+             This specifies extra JVM arguments to be provided to the Job Driver process. It is a blank delimited 
              list of strings. Example: 
              \begin{verbatim}
---driver\_jvm\_args -Xmx100M -Xms50M 
-             \end{verbatim}
-             
-           \item[--driver\_memory\_size {[size-in-GB]} ]
-
-             This specifies the size of memory for the Job Driver, in GB. Example: 
-             \begin{verbatim}
---driver\_memory\_size 16 
+--driver_jvm_args -Xmx100M -Xms50M 
              \end{verbatim}
 
-           \item[--environment {[env vars]}] Blank-delimeted list of environment variables. If
+             Note: When used as a CLI option, the environment string must usually be
+             quoted to protect it from the shell.
+             
+           \item[$--$environment {[env vars]}] Blank-delimited list of environment variables. If
              specified, this is used for all DUCC processes in the job.Example:
 \begin{verbatim}
-             --environment "TERM=xterm DISPLAY=me.org.net:1.0". 
+             $--$environment TERM=xterm DISPLAY=me.org.net:1.0 
 \end{verbatim}
              
-             Note: On Secure Linux systems, the environemnt variable 
+             Note: On Secure Linux systems, the environment variable 
              LD\_LIBRARY\_PATH may not be passed to the user's program. If it is 
              necessary to pass LD\_LIBRARY\_PATH to the JP or JD processes, it must be 
              specified as DUCC\_LD\_LIBRARY\_PATH. Ducc (securely) passes this as 
              LD\_LIBRARY\_PATH, after the JP or JD has assumed the user's identity. For 
              example: 
              \begin{verbatim}
--environment TERM=xterm DISPLAY=:1.0 DUCC\_LD\_LIBRARY\_PATH=/my/own/path
+--environment TERM=xterm DISPLAY=:1.0 DUCC\_LD\_LIBRARY\_PATH=/my/own/path
             \end{verbatim}
 
-           \item[--help ]
+             Note: When used as a CLI option, the environment string must usually be
+             quoted to protect it from the shell.
+
+           \item[$--$help ]
 
              Prints the usage text to the console. 
 
-           \item[--jvm {[path-to-java]}  ]
+           \item[$--$jvm {[path-to-java]}  ]
 
              States the JVM to use. If not specified, the same JVM used by the Agents is used.  This is
              the full path to the JVM, not the JAVA\_HOME.
@@ -138,211 +134,185 @@
 --jvm /share/jdk1.6/bin/java 
 \end{verbatim}
              
-           \item[--log\_directory {[path-to-log directory]} ]
+           \item[$--$log\_directory {[path-to-log-directory]} ]
 
              This specifies the path to the directory for the user logs. If not specified, the default is the 
              user's home directory. Example: 
              \begin{verbatim}
---log\_directory /home/bob 
+--log_directory /home/bob 
              \end{verbatim}
              
-             Within this directory DUCC creates a subdirectory for each job, using the numerical 
+             Within this directory DUCC creates a sub-directory for each job, using the unique numerical 
              ID of the job. The format of the generated log file names as described
              \hyperref[chap:job-logs]{here}.
              
-             Note: Note that --log\_directory specifies only the path to a directory where 
+             Note: Note that $--$log\_directory specifies only the path to a directory where 
              logs are to be stored. In order to manage multiple processes running in multiple 
-             machines DUCC, sub-directory and file names are generated by DUCC and may 
+             machines, sub-directory and file names are generated by DUCC and may 
              not be directly specified. 
 
-           \item[--process\_attach\_console] If specified, redirect remote process (as
+           \item[$--$process\_attach\_console] If specified, redirect remote process (as
              opposed to driver) stdout and stderr to the local submitting console.
-
-           \item[--process\_classpath {[ClASSPATH]} ]
-
-             This specifies the Java CLASSPATH to use in each Job Process (JP) and must be 
-             specified. Example: 
-             \begin{verbatim}
---process\_classpath a.jar:b.jar 
-             \end{verbatim}
              
-           \item[--process\_DD {[DD descriptor]}  ]
+           \item[$--$process\_DD {[DD descriptor]}  ]
 
              This specifies a UIMA Deployment Descriptor for the job processes for DD-style jobs. 
-             This is mutually exclusive with --process\_descriptor\_AE, --process\_descriptor\_CM, 
-             and --process\_descriptor\_CC. This descriptor is a resource that is searched for in the 
+             This is mutually exclusive with $--$process\_descriptor\_AE, $--$process\_descriptor\_CM, 
+             and $--$process\_descriptor\_CC. This descriptor is a resource that is searched for in the 
              CLASSPATH and data path as described in the ~\hyperref[par:cli.submit.notes]{notes below}.
              For example:
              \begin{verbatim}
---process\_DD /home/billy/resource/DD\_foo.xml 
+--process_DD /home/billy/resource/DD_foo.xml 
              \end{verbatim}
 
-           \item[--process\_debug {[debugger-address]}] Append JVM debug flags to the JVM
-             arguments to start the Job Process in remobe debug mode.  The remote process will
-             start its debugger and attempt to contact the (Eclipse) debugger on the specified port.
-             The address is of the form {\tt host:port}.
+           \item[$--$process\_debug {[debugger-address]}] Append JVM debug flags to the JVM
+             arguments to start the Job Process in remote debug mode.  The remote process will start
+             its debugger and attempt to contact the debugger (usually Eclipse) on the specified
+             port.  The address is of the form {\tt host:port}.
              
-           \item[--process\_deployments\_max {[integer]} ]
+           \item[$--$process\_deployments\_max {[integer]} ]
 
-             This specifies the maximum nunber of Job Processes to deploy at any given time. If not 
-             specified, DUCC will attempt to provide the largest number of processes, within the 
-             constraints of fair\_share scheduling and the number of pending work items still to be done 
+             This specifies the maximum number of Job Processes to deploy at any given time. If not 
+             specified, DUCC will attempt to provide the largest number of processes within the 
+             constraints of fair\_share scheduling and the amount of work remaining.
              in the job. Example:
              \begin{verbatim}
---process\_deployments\_max 66 
+--process_deployments_max 66 
              \end{verbatim}
 
 
-           \item[--process\_descriptor\_AE {[descriptor]}  ]
+           \item[$--$process\_descriptor\_AE {[descriptor]}  ]
 
-             This specifies Analysis Engine descriptor to be deployed in the Job Processes. This 
+             This specifies the Analysis Engine descriptor to be deployed in the Job Processes. This 
              descriptor is a resource that is searched for in the CLASSPATH and data path as described 
              in the ~\hyperref[par:cli.submit.notes]{notes below}.
-             It is mutually exclusive with --process\_DD For example: 
+             It is mutually exclusive with $--$process\_DD For example: 
              \begin{verbatim}
---process\_descriptor\_AE /home/billy/resource/AE\_foo.xml 
+--process_descriptor_AE /home/billy/resource/AE_foo.xml 
              \end{verbatim}
 
 
-           \item[--process\_descriptor\_AE\_overrides {[list]}  ]
+           \item[$--$process\_descriptor\_AE\_overrides {[list]}  ]
 
-             This specifies AE overrides. It is a comma-delimeted list of name/value pairs. Example: 
+             This specifies AE overrides. It is a comma-delimited list of name/value pairs. Example: 
              \begin{verbatim}
---process\_descriptor\_AE\_Overrides name1=value1,name2=value2 
+--process_descriptor_AE_Overrides name1=value1,name2=value2 
              \end{verbatim}
              
-           \item[--process\_descriptor\_CC {[descriptor]}  ]
+           \item[$--$process\_descriptor\_CC {[descriptor]}  ]
 
              This specifies the CAS Consumer descriptor to be deployed in the Job Processes. This 
              descriptor is a resource that is searched for in the CLASSPATH and data path as described 
              in the ~\hyperref[par:cli.submit.notes]{notes below}.
-             It is mutually exclusive with --process\_DD For example: 
+             It is mutually exclusive with $--$process\_DD For example: 
              \begin{verbatim}
---process\_descriptor\_CC /home/billy/resourceCCE\_foo.xml 
+--process_descriptor_CC /home/billy/resourceCCE_foo.xml 
              \end{verbatim}
              
-           \item[--process\_descriptor\_CC\_overrides {[list]}  ]
+           \item[$--$process\_descriptor\_CC\_overrides {[list]}  ]
 
-             This specifies CC overrides. It is a comma-delimeted list of name/value pairs. Example: 
+             This specifies CC overrides. It is a comma-delimited list of name/value pairs. Example: 
              \begin{verbatim}
---process\_descriptor\_CC\_overrides name1=value1,name2=value2 
+--process_descriptor_CC_overrides name1=value1,name2=value2 
              \end{verbatim}
              
-           \item[--process\_descriptor\_CM {[descriptor]} ]
+           \item[$--$process\_descriptor\_CM {[descriptor]} ]
 
              This specifies the CAS Multiplier descriptor to be deployed in the Job Processes. This 
              descriptor is a resource that is searched for in the CLASSPATH and data path as described 
              in the ~\hyperref[par:cli.submit.notes]{notes below}.
-             It is mutually exclusive with --process\_DD For example: 
+             It is mutually exclusive with $--$process\_DD For example: 
              \begin{verbatim}             
---process\_descriptor\_CM /home/billy/resource/CM\_foo.xml 
+--process_descriptor_CM /home/billy/resource/CM_foo.xml 
              \end{verbatim}
 
-           \item[--process\_descriptor\_CM\_overrides {[list]}  ]
+           \item[$--$process\_descriptor\_CM\_overrides {[list]}  ]
 
-             This specifies CM overrides. It is a comma-delimeted list of name/value pairs. Example: 
+             This specifies CM overrides. It is a comma-delimited list of name/value pairs. Example: 
              \begin{verbatim}
---process\_descriptor\_CM\_overrides name1=value1,name2=value2 
+--process_descriptor_CM_overrides name1=value1,name2=value2 
 \end{verbatim}
              
-           \item[--process\_environment {[environment]} ]
+           \item[$--$process\_failures\_limit {[integer]} ]
 
-             This specifies environment parameters for the Job Processes. If present, they are added 
-             to the Job Process environment as the process is spawned. It must be a quoted, blankdelimeted 
-             lsit of name-value pairs. For example: 
-             \begin{verbatim}
-
---process\_environment TERM=xterm DISPLAY=:1.0
-             \end{verbatim}
-  
-             Note: On Secure Linux systems, the environemnt variable 
-             LD\_LIBRARY\_PATH may not be passed to the user's program. If it is 
-             necessary to pass LD\_LIBRARY\_PATH to the JP or JD processes, it must be 
-             specified as DUCC\_LD\_LIBRARY\_PATH. Ducc (securely) passes this as 
-             LD\_LIBRARY\_PATH, after the JP or JD has assumed the user's identity. For 
-             example: 
-
-             \begin{verbatim}
---process\_environment TERM=xterm DISPLAY=:1.0 DUCC\_LD\_LIBRARY\_PATH=/my/own/
-             \end{verbatim}
-
-           \item[--process\_failures\_limit {[integer]} ]
-
-             This specifies the maximum number of individual Job Process (JP) failures that are to be 
-             tolerated before killing the job. The default is 15. If this limit is exceeded over the lifetime 
+             This specifies the maximum number of individual Job Process (JP) failures allowed
+             before killing the job. The default is fifteen(15). If this limit is exceeded over the lifetime 
              of a job DUCC terminates the entire job. 
              \begin{verbatim}
---process\_failures\_limit 23
+--process_failures_limit 23
 \end{verbatim}
                           
-           \item[--process\_initialization\_failures\_cap {[integer]} ] This specifies the maximum
-             number of independent Job Process initialization failures (i.e.  System.exit(), kill
-             -9, Java Exceptions, etc.) before the number of Job Processes is capped at the number
-             in state Running currently.  One this limit is reached, the system will allow processes
-             which are already running to continue, but will assign no new processes to the job.
-             The default is 99. Example:
+           \item[$--$process\_initialization\_failures\_cap {[integer]} ] This specifies the maximum
+             number of failures during a UIMA process's initialization phase.  If the number is
+             exceeded the system will allow processes which are already running to continue, but
+             will assign no new processes to the job.  The default is ninety-nine(99). Example:
              \begin{verbatim}
---process\_initialization\_failures\_cap 62 
+--process_initialization_failures_cap 62 
              \end{verbatim}
              
              Note that the job is NOT killed if there are processes that have passed initialization and are 
              running. If this limit is reached, the only action is to not start new processes for the job. 
 
-           \item[--process\_initialization\_time\_max {[integer]}] This is the maximimum time a process
-             is allowed to remain in the ``initializing'' state, before DUCC terminates it.
+           \item[$--$process\_initialization\_time\_max {[integer]}] This is the maximum time a process
+             is allowed to remain in the ``initializing'' state, before DUCC terminates it.  The error
+             counts as an initialization error towards the initialization failure cap.
 
-           \item[--process\_jvm\_args {[list]} ] This specifies additinal arguments to be passed to
-             the Job Process JVM as a blank-delimeted list of strings. Example:
+           \item[$--$process\_jvm\_args {[list]} ] This specifies additional arguments to be passed to
+             the Job Process JVM as a blank-delimited list of strings. Example:
              \begin{verbatim}
---process\_jvm\_args -Xmx400M -Xms100M 
+--process_jvm_args -Xmx400M -Xms100M 
              \end{verbatim}
-             
-           \item[--process\_memory\_size {[size]} ] This specifies the maximum amount of RAM in GB
+
+             Note: When used as a CLI option, the environment string must usually be
+             quoted to protect it from the shell.
+                          
+           \item[$--$process\_memory\_size {[size]} ] This specifies the maximum amount of RAM in GB
              to be allocated to each Job Process.  This value is used by the Resource Manager to
              allocate resources.
 
-           \item[--process\_per\_item\_time\_max {[integer]} ] This specifies the maximum time in
+           \item[$--$process\_per\_item\_time\_max {[integer]} ] This specifies the maximum time in
              minutes that the Job Driver will wait for a Job Processes to process a CAS. If a
              timeout occurs the process is terminated and the CAS marked in error (not retried). If
              not specified, the default is 1 minute. Example:
              \begin{verbatim}
---process\_per\_item\_time\_max 60 
+--process_per_item_time_max 60 
              \end{verbatim}
              
-           \item[--process\_thread\_count {[integer]} ] This specifies the number of threads per
+           \item[$--$process\_thread\_count {[integer]} ] This specifies the number of threads per
              process to be deployed. It is used by the Resource Manager to determine how many
-             processes are needed, by the Agent to determine howmany threads to spawn, and by the
-             Job Driver to determine how many CASs to dispatch. If not specified, the default is
-             4. Example:
+             processes are needed, by the Job Process wrapper to determine how many threads to
+             spawn, and by the Job Driver to determine how many CASs to dispatch. If not specified,
+             the default is 4. Example:
              \begin{verbatim}
---process\_thread\_count 7 
+--process_thread_count 7 
              \end{verbatim}
              
-           \item[--scheduling\_class {[classname]} ] This specifies the name of the scheuling class
+           \item[$--$scheduling\_class {[classname]} ] This specifies the name of the scheduling class
              the RM will use to determine the resource allocation for each process. The names of the
              classes are installation dependent. If not specified, the default is taken from the
              global DUCC configuration ducc.properties.  Example:
              \begin{verbatim}
---schedling\_class normal 
+--schedling_class normal 
              \end{verbatim}
           
 
-           \item[--service\_dependency{[list]}] This specifies a comma-delimeted list of services the job
+           \item[$--$service\_dependency{[list]}] This specifies a comma-delimited list of services the job
              processes are dependent upon. Service dependencies are discussed in detail
              \hyperref[sec:service.endpoints]{here}. Example:
 \begin{verbatim}
 --service_dependency UIMA-AS:RandomSleepAE:tcp:bluej682:61616 UIMA-AS:OtherEp:tcp:bluej123:123 
 \end{verbatim}
 
-           \item[--specifiecaiton {[file]}  ]
+           \item[$--$specification, $-$f {[file]}  ]
 
              All the parameters used to submit a job may be placed in a standard Java properties file. 
              This file may then be used to submit the job (rather than providing all the parameters 
-             directory to submit). 
+             directory to submit). The leading $--$ is omitted from the keywords.
 
              For example, 
 \begin{verbatim}
-ducc\_submit --specification job.props 
+ducc_submit --specification job.props 
+ducc_submit -f job.props 
 \end{verbatim}
 
              where the job.props contains: 
@@ -370,17 +340,17 @@ process_memory_size                 = 15
              Note that properties in a specifications file may be overridden by other command-line
              parameters, as discussed \hyperref[chap:cli]{here}.
 
-           \item[--timestamp ]
+           \item[$--$time-stamp ]
 
              If specified, messages from the submit process are timestamped. This is intended primarily 
              for use with a monitor with --wait\_for\_completion. 
 
-           \item[--wait\_for\_completion ]             
+           \item[$--$wait\_for\_completion ]             
              If specified, the submit command monitors the job and prints periodic
              state and progress information to the console.  When the job completes, the monitor
              is terminated and the submit command returns.
              
-           \item[--working\_directory ]             
+           \item[$--$working\_directory ]             
              This specifies the working directory to be set by the Job Driver and Job Process processes. 
              If not specified, the current directory is used.
   \end{description}
@@ -388,7 +358,7 @@ process_memory_size                 = 15
   \paragraph{Notes:}
   \phantomsection\label{par:cli.submit.notes}
   When searching for UIMA XML resource files such as descriptors, DUCC searches both the 
-  classpath and the data path according to the following rules: 
+  CLASSPATH and the data path according to the following rules: 
   
   \begin{enumerate}
   \item If the resource ends in .xml it is assumed the resource is a file and the path is either an 
@@ -405,8 +375,10 @@ process_memory_size                 = 15
     fails and the job is terminated. 
     
     The resource search-order rules apply to all of the following submit parameters: 
-    ¥ --driver\_descriptor\_CR 
-    ¥ --process\_descriptor\_AE 
-    ¥ --process\_descriptor\_CC 
-    ¥ --process\_descriptor\_CM 
-  \end{enumerate}
+    \begin{itemize}
+    \item[]$--$driver\_descriptor\_CR 
+    \item[]$--$process\_descriptor\_AE 
+    \item[]$--$process\_descriptor\_CC 
+    \item[]$--$process\_descriptor\_CM 
+   \end{itemize}
+ \end{enumerate}

Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex?rev=1491938&r1=1491937&r2=1491938&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/ducc-uguide.tex Tue Jun 11 20:16:14 2013
@@ -12,8 +12,8 @@
     canceled, work is monitored, and work is queried with this interface.
 
     All parameters may be passed to all the CLI commands in the form of Unix-like ``long-form''
-    (key, value) pairs, in which the key is proceeded by the characters ``--''.  As well, the
-    parameters may be saved in a standard Java Properties file, without the leading ``--''
+    (key, value) pairs, in which the key is proceeded by the characters ``$--$".  As well, the
+    parameters may be saved in a standard Java Properties file, without the leading ``$--$''
     characters.  Both a properties file and command-line parameters may be passed to each CLI.  When
     both are present, the parameters on the command line take precedence.  Take, for example
     the following simple job properties file, call it {\tt 1.job}.
@@ -33,13 +33,13 @@ process_jvm_args               -Xmx100M 
 process_thread_count           2
 process_per_item_time_max      5
 process_get_meta_time_max      5
-process_environment            AE_INIT_TIME=5 AE_INIT_RANGE=5 INIT_ERROR=0 LD_LIBRARY_PATH=/yet/a/nother/dumb/path
+process_environment            AE_INIT_TIME=5 AE_INIT_RANGE=5 LD_LIBRARY_PATH=/a/nother/path
 process_deployments_max        999
 
 scheduling_class               normal
 \end{verbatim}
 
-    This can be submitted, overriding the scheduling class and memory thus:
+    This can be submitted, overriding the scheduling class and memory, thus:
 \begin{verbatim}
 ducc_job_submit --specification 1.job --process_memory_size 16 --scheduling_class high
 \end{verbatim}    
@@ -52,8 +52,8 @@ ducc_job_submit --specification 1.job --
     \begin{itemize}
       \item References to the various UIMA components required by the job (CR, CM, AE, CC, and maybe DD)
       \item Scale-out requirements: number of processes, number of threads per process, etc
-      \item Environment requirments: log directory, working directory, environment variables, etc,
-      \item JVM paramenters
+      \item Environment requirements: log directory, working directory, environment variables, etc,
+      \item JVM parameters
       \item Scheduling class
       \item Error-handling preferences: acceptable failure counts, timeouts, etc
       \item Debugging and monitoring requirements and preferences
@@ -66,17 +66,17 @@ ducc_job_submit --specification 1.job --
     \begin{enumerate}
       \item A Java executable jar.
       \item A wrapper script around the executable jar.
-      \item Direct invocation of each commands's {\tt main} with the {\tt java} command.
+      \item Direct invocation of each command's {\tt main} with the {\tt java} command.
     \end{enumerate}
 
-    When using the executable jars and scripts the full execution environment is estableshed
+    When using the executable jars and scripts the full execution environment is established
     silently.  When directly invoking a command's {\tt main} one must set the java {\tt CLASSPATH} to
     specify the appropriate jar for the command, as described in subsequent sections.
 
     \paragraph{Provided Commands}
     The following commands are provided:
     \begin{description}
-    \item[ducc\_submit] Submit a job for ececution.
+    \item[ducc\_submit] Submit a job for execution.
     \item[ducc\_cancel] Cancel a job in progress.
     \item[ducc\_reserve] Request a reservation of full or partial machines.
     \item[ducc\_unreserve] Cancel a reservation.
@@ -86,7 +86,7 @@ ducc_job_submit --specification 1.job --
     \item[ducc\_service\_submit] Submit a (non-registered) service instance for execution.
     \item[ducc\_service\_cancel] Cancel a (non-registered) service instance.
     \item[ducc\_services] Register, unregister, start, stop, modify, and query a service.
-    \item[ducc\_view\_perf]y Fetch performance data from the log and history files for analysis
+    \item[ducc\_view\_perf] Fetch performance data from the log and history files for analysis
       by spreadsheets, etc.
     \end{description}
     
@@ -95,7 +95,7 @@ ducc_job_submit --specification 1.job --
     %% These all input sections
     \input{part2/cli/ducc-submit.tex}
     \input{part2/cli/ducc-cancel.tex}
-    \input{part2/cli/ducc-monitor.tex}
+    % \input{part2/cli/ducc-monitor.tex}
     \input{part2/cli/ducc-reserve.tex}
     \input{part2/cli/ducc-unreserve.tex}
     \input{part2/cli/ducc-service-submit.tex}

Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-nodepools.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-nodepools.tex?rev=1491938&r1=1491937&r2=1491938&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-nodepools.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-nodepools.tex Tue Jun 11 20:16:14 2013
@@ -1,3 +1,3 @@
-\chapter{DUCC Nodepool Definitions}
+\section{DUCC Nodepool Definitions}
 
-   \todo This section got corrupted, rewrite
+    REWRITE