You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by ea...@apache.org on 2015/06/24 22:27:40 UTC
svn commit: r1687363 - in
/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook:
images/ part1/
Author: eae
Date: Wed Jun 24 20:27:39 2015
New Revision: 1687363
URL: http://svn.apache.org/r1687363
Log:
UIMA-4109
Added:
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-parallel.png (with props)
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-sequential.png (with props)
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/uima-as-pipeline.png (with props)
Removed:
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-parallel.jpg
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-sequential.jpg
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/uima-as-pipeline.jpg
Modified:
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex
Added: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-parallel.png
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-parallel.png?rev=1687363&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-parallel.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-sequential.png
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-sequential.png?rev=1687363&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/ducc-sequential.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/uima-as-pipeline.png
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/uima-as-pipeline.png?rev=1687363&view=auto
==============================================================================
Binary file - no diff available.
Propchange: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/images/uima-as-pipeline.png
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex?rev=1687363&r1=1687362&r2=1687363&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part1/overview.tex Wed Jun 24 20:27:39 2015
@@ -39,12 +39,12 @@
\section{DUCC Job Model}
The Job Model defines the steps necessary to scale-up a UIMA pipeline using DUCC. The goal of
- DUCC is to allow the application logic to be unchanged.
+ DUCC is to scale-up any UIMA pipeline, including pipelines that must be deployed across multiple
+ machines using shared services.
The DUCC Job model consists of standard UIMA components: a Collection Reader (CR), a CAS
Multiplier (CM), application logic as implemented one or more Analysis Engines (AE), and a CAS
- Consumer (CC). In theory, any CR, or CM will work with DUCC, but DUCC is all about scale-out. In
- order to achieve good scale-out these components must be constructed in a specific way.
+ Consumer (CC).
The Collection Reader builds input CASs and forwards them to the UIMA pipelines. In the DUCC
model, the CR is run in a process separate from the rest of the pipeline. In fact, in all but the
@@ -60,10 +60,6 @@
AE(s), and CC into a single process, multiple instances of which are then deployed over the
cluster.
- DUCC does not provide any mechanism for receiving output CASs. Each application must
- supply its own CAS Consumer which serializes the output of the pipelines for
- consumption by other entities (as serialized CASs, for example).
-
A DUCC job therefore consists of a small specification containing the following items:
\begin{itemize}
@@ -71,15 +67,13 @@
\item The name of a resource containing the CM descriptor.
\item The name of a resource containing the AE descriptor.
\item The name of a resource containing the CC descriptor.
- \item Other information required to parametrize the above and identify the job
- such as log directory, working directory, desired scale-out, etc. These are
- described in detail in subsequent sections.
+ \item Other information required to parameterize the above and identify the job
+ such as log directory, working directory, desired scale-out, classpath, etc.
+ These are described in detail in subsequent sections.
\end{itemize}
- On job submission, DUCC examines the job specification and automatically creates a scaled-out
- UIMA-AS service with a single process executing the CR as a UIMA-AS client and and as many
- processes as possible executing the combined CM, AE, and CC pipeline as UIMA-AS service
- instances.
+ On job submission, DUCC creates a single process executing the CR and
+ one or more processes containing the analysis pipeline.
DUCC provides other facilities in support of scale-out:
\begin{itemize}
@@ -99,19 +93,18 @@
\paragraph{UIMA Pipelines}
A normal UIMA pipeline
- contains a Collection Reader, one or more Analysis Engines connected in a pipeline, and a CAS
- Consumer as shown in \hyperref[fig:UIMA-pipeline]{Figure ~\ref{fig:UIMA-pipeline}}.
+ contains a Collection Reader (CR), one or more Analysis Engines (AE) connected in a pipeline, and a CAS
+ Consumer (CC) as shown in \hyperref[fig:UIMA-pipeline]{Figure ~\ref{fig:UIMA-pipeline}}.
\begin{figure}[H]
\centering
-% \includegraphics[bb=0 0 575 310, width=5.5in]{images/uima-pipeline.jpg}
\includegraphics[width=5.5in]{images/uima-pipeline.jpg}
\caption{Standard UIMA Pipeline}
\label{fig:UIMA-pipeline}
\end{figure}
\paragraph{UIMA-AS Scaled Pipeline}
- With UIMA-AS the CR is separated into a discrete process and a CAS Multiplier is introduced
+ With UIMA-AS the CR is separated into a discrete process and a CAS Multiplier (CM) is introduced
into the pipeline as an interface between the CR and the pipeline, as shown in
\hyperref[fig:UIMA-AS-pipeline]{Figure ~\ref{fig:UIMA-AS-pipeline}} below.
Multiple pipelines are serviced by the
@@ -122,46 +115,46 @@
\begin{figure}[H]
\centering
-% \includegraphics[bb=0 0 584 341, width=5.5in]{images/uima-as-pipeline.jpg}
- \includegraphics[width=5.5in]{images/uima-as-pipeline.jpg}
+ \includegraphics[width=5.5in]{images/uima-as-pipeline.png}
\caption{UIMA Pipeline As Scaled by UIMA-AS}
\label{fig:UIMA-AS-pipeline}
\end{figure}
- \paragraph{UIMA-AS Pipeline Scaled By DUCC}
+ \paragraph{UIMA Pipeline Scaled By DUCC}
DUCC is a UIMA and UIMA-AS-aware cluster manager. To scale out work under DUCC the developer
tells DUCC what the parts of the application are, and DUCC does the work to build the
scale-out via UIMA/AS, to find and schedule resources, to deploy the parts of the application
over the cluster, and to manage the jobs while it executes.
- On job submission, the DUCC Command Line Interface (CLI) inspects the XML defining the analytic
- and generates a UIMA-AS Deployment Descriptor (DD). The DD establishes some number of pipeline
- threads per process (as indicated in the DUCC job parameters), and generates job-unique queues.
-
- Under DUCC, the Collection Reader is executed in a process called the Job Driver (or JD). The
- pipelines are executed in one or more processes called Job Processes (or JPs). The JD
- process provides a thin wrapper over the CR to enable communication with DUCC. The JD uses the
- CR to implement a UIMA-AS client delivering CASs to the multiple (scaled-out) pipelines,
- shown in \hyperref[fig:UIMA-AS-pipeline-DUCC]{Figure ~\ref{fig:UIMA-AS-pipeline-DUCC}} below.
+ On job submission, the CR is wrapped with a DUCC main class and launched as a Job Driver (or JD).
+ The DUCC main class establishes communication with other DUCC components and instantiates the CR.
+ If the CR initializes successfully, and indicates that there are greater than 0 work items to process,
+ the specified CM, AE and CC components are assembled into an aggregate, wrapped with a DUCC main
+ class, and launched as a Job Process (or JP).
+
+ The JP will replicate the aggregate as many times as specified, each aggregate instance running
+ in a single thread. When the aggregate initializes, and whenever an aggregate thread needs work,
+ the JP wrapper will fetch the next work item from the JD, as shown in
+ \hyperref[fig:UIMA-AS-pipeline-DUCC]{Figure ~\ref{fig:UIMA-AS-pipeline-DUCC}} below.
\begin{figure}[H]
\centering
-% \includegraphics[bb=0 0 571 311, width=5.5in]{images/ducc-sequential.jpg}
- \includegraphics[width=5.5in]{images/ducc-sequential.jpg}
+ \includegraphics[width=5.5in]{images/ducc-sequential.png}
\caption{UIMA Pipeline As Automatically Scaled Out By DUCC}
\label{fig:UIMA-AS-pipeline-DUCC}
\end{figure}
- \paragraph{UIMA-AS Pipeline with User-Supplied DD Scaled By DUCC}
+ \paragraph{UIMA Pipeline with User-Supplied DD Scaled By DUCC}
Application programmers may supply their own Deployment Descriptors to control intra-process
- threading and scale-out. If a DD is supplied in the job parameters, DUCC will use this instead
- of generating one as depicted in \hyperref[fig:UIMA-AS-pipeline-DUCC-DD]{Figure ~\ref{fig:UIMA-AS-pipeline-DUCC-DD}} below.
+ threading and scale-out. If a DD is specified in the job parameters, DUCC will launch each
+ JP with the specified UIMA-AS service instantiated in-process,
+ as depicted in \hyperref[fig:UIMA-AS-pipeline-DUCC-DD]{Figure ~\ref{fig:UIMA-AS-pipeline-DUCC-DD}} below.
+ In this case the user can still specify how many work items to deliver to the service concurrently.
\begin{figure}[H]
\centering
-% \includegraphics[bb=0 0 571 316,width=5.5in]{images/ducc-parallel.jpg}
- \includegraphics[width=5.5in]{images/ducc-parallel.jpg}
+ \includegraphics[width=5.5in]{images/ducc-parallel.png}
\caption{UIMA Pipeline With User-Supplied DD as Automatically Scaled Out By DUCC}
\label{fig:UIMA-AS-pipeline-DUCC-DD}
\end{figure}
@@ -171,9 +164,9 @@
DUCC provides a number of facilities to assist error management:
\begin{itemize}
- \item DUCC uses the UIMA-AS error-handling facilities to reflect errors from the Job Processes
+ \item DUCC captures exceptions in the JPs and delivers them
to the Job Drivers. The JD wrappers implement logic to enforce error thresholds, to identify
- and log errors, and to reflect job problems in the DUCC Web Server. All error thresholds are
+ and log errors, and to reflect job problems in the DUCC Web Server. Error thresholds are
configurable both globally and on a per-job basis.
\item Error and timeout thresholds are implemented for both the initialization phase of a pipeline
@@ -183,10 +176,10 @@
initialization is successful, the process is terminated and all affected CASs are retried, up to some
configurable threshold.
- \item DUCC ensures that processes can successfully initialize before fully scaling out a job,
- to ensure a cluster is not overwhelmed with errant processes.
+ \item To avoid disrupting existing workloads by a job that will fail to run,
+ DUCC ensures that JD and JP processes can successfully initialize before fully scaling out a job.
- \item Various error conditions encountered while a job is running will prevent the errant job
+ \item Various error conditions encountered while a job is running will prevent a problematic job
from continuing scale out, and can result in termination of the job.
\end{itemize}