You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by ch...@apache.org on 2013/09/20 14:37:47 UTC
svn commit: r1524983 - in
/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook:
part2/services.tex part4/admin/ducc-classes.tex
Author: challngr
Date: Fri Sep 20 12:37:47 2013
New Revision: 1524983
URL: http://svn.apache.org/r1524983
Log:
UIMA-2682 Updates for new RM configuration.
Modified:
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex
uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-classes.tex
Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex?rev=1524983&r1=1524982&r2=1524983&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part2/services.tex Fri Sep 20 12:37:47 2013
@@ -308,7 +308,7 @@ public class CustomPing
{
String host;
String port;
- public void init(String endpoint) throws Exception {
+ public void init(String args, String endpoint) throws Exception {
// Parse the service endpoint, which is a String of the form
// host:port
String[] parts = endpoint.split(":");
Modified: uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-classes.tex
URL: http://svn.apache.org/viewvc/uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-classes.tex?rev=1524983&r1=1524982&r2=1524983&view=diff
==============================================================================
--- uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-classes.tex (original)
+++ uima/sandbox/uima-ducc/trunk/uima-ducc-duccdocs/src/site/tex/duccbook/part4/admin/ducc-classes.tex Fri Sep 20 12:37:47 2013
@@ -1,185 +1,196 @@
-\section{DUCC Class Definitions}
+\section{Scheduler Configuration: Classes and Nodepools}
\label{sec:ducc.classes}
- The class configuration file is used by the Resource Manager configure the rules used for job
- scheduling. See the Resource Manager chapter for a detailed description of the DUCC schedueler.
-
- The name of class configuration file is specified in ducc.properties. The default name is
- ducc.classes [105] and is specified by the property ducc.rm.class.definitions property.
-
- This file configures the classes and the associate scheduling rules of each class. It contains
- properties to declare the following:
+The class configuration file is used by the Resource Manager configure the rules used for job
+scheduling. See the \hyperref[sec:]{Resource Manager chapter} for a detailed description of the DUCC
+schedueler, scheduling classes, and how classes are used to configure the scheduling process.
+
+The scheduler configuration file is specified in ducc.properties. The default name is
+ducc.classes and is specified by the property {\em ducc.rm.class.definitions}.
+
+\subsection{Nodepools}
+
+\subsubsection{Overview}
+ A {\em nodepool} is a grouping of a subset of the physical nodes to allow differing
+ scheduling policies to be applied to different nodes in the system. Some typical
+ nodepool groupings might include:
\begin{enumerate}
- \item The names of each class.
- \item The default class to use if none is specified with the job.
- \item The names of all the nodepools.
- \item For each nodepool, the name of the file containing member nodes.
- \item A set of properties for each class, declaring the rules enforced by that class.
+ \item Group Intel and Power nodes separately so that users may submit jobs that run
+ only in Intel architecture, or only Power, or ``don't care''.
+ \item Designate a group of nodes with large locally attached disks such that users
+ can run jobs that require thos disks.
+ \item Designate a specific set of nodes with specialized hardware such as high-speed
+ network, such that jobs can be scheduled to run only on those nodes.
\end{enumerate}
- The general properties are as follows. The default values are the defaults in the system as initially
- installed.
-
- \begin{description}
+ A Nodepool is a subset of some larger collection of nodes. Nodepools themselves may be
+ further subdivided. Nodepools may not overlap: every node belongs to one and exactly
+ one nodepool. During system start-up the consistency of nodepool definition is checked
+ and the system will refuse to start if the configuration is incorrect.
+
+ For example, the diagram below is an abstract representation of all the nodes in a
+ system. There are five nodepools defined:
+ \begin{itemize}
+ \item Nodepool ``Default'' is subdivided into three pools, NP1, NP2, and NP3. All
+ the nodes not contained in NP1, NP2, and NP3 belong to the pool called ``Default''.
+ \item Nodepool NP1 is not further subdivided.
+ \item Nodepool NP2 is not firther subdivided.
+ \item Nodepool NP3 is further subdivided to form NP4. All nodes within NP3 but
+ not in NP4 are contained in NP3.
+ \item Nodepool NP4 is not further subdivided.
+ \end{itemize}
+
+ \begin{figure}[H]
+ \centering
+ \includegraphics[bb=0 0 241 161, width=5.5in]{images/Nodepool1.jpg}
+ \caption{Nodepool Example}
+ \label{fig:Nodepools1}
+ \end{figure}
- \item[scheduling.class.set] \hfill \\
- This defines the set of class names for the installation. The names themselves are arbitrary
- and correspond to the rules defined in subsequent properties.
-
- \begin{description}
- \item[Default Value] background low normal high urgent weekly fixed reserve JobDriver
- \end{description}
-
- \item[scheduling.default.name] \hfill \\
- This is the default class that jobs are assigned to, when not otherwise designated in their
- submission properties.
- \begin{description}
- \item[Default Value] normal
- \end{description}
- \end{description}
+ In the figure below the Nodepools are incorrectly defined for two reasons:
+ \begin{enumerate}
+ \item NP1 and NP2 overlap.
+ \item NP4 overlaps both nodepool ``Default'' and NP3.
+ \end{enumerate}
- Nodepools are declared with a set of properties to name each nodepool and to name a file for
- each pool that declares membership in the nodepool. For each nodepool a property of the form
- scheduling.nodepool.NODEPOOLNAME is declared, where NODEPOOLNAME is one of the
- declared nodepools.
-
- The property to declare nodepool names is as follows:
-
- \begin{description}
- \item[scheduling.nodepool] \hfill \\
- This is the list of nodepool names. For example:
-\begin{verbatim}
- scheduling.nodepool = res res1 res2
-\end{verbatim}
- \begin{description}
- \item[Default Value] reserve
- \end{description}
- \end{description}
+ \begin{figure}[H]
+ \centering
+ \includegraphics[bb=0 0 241 161, width=5.5in]{images/Nodepool2.jpg}
+ \caption{Nodepools: Overlapping Pools are Incorrect}
+ \label{fig:Nodepools2}
+ \end{figure}
+
+ Multiple ``top-level'' nodepools are allowed. A ``top-level'' nodepool has no containing
+ pool. Multiple top-level pools logically divide a cluster of machines into {\em multiple
+ independent clusters} from the standpoint of the scheduler. Work scheduled over one
+ pool in no way affects work scheduled over the other pool. The figure below shows an
+ abstract nodepool configuration with two top-level nodepools, ``Top-NP1'' and ``Top-NP2''.
+ \begin{figure}[H]
+ \centering
+ \includegraphics[bb=0 0 496 161, width=5.5in]{images/Nodepool3.jpg}
+ \caption{Nodepools: Multiple top-level Nodepools}
+ \label{fig:Nodepools3}
+ \end{figure}
+
+\subsubsection{Scheduling considerations}
+ A primary goal of the scheduler is to insure that no resources are left idle if there
+ is pending work that is able to use those resources. Therefore, work scheduled to
+ a class defined over a specific nodepool (say, NpAllOfThem), may be scheduled on nodes
+ in any of the nodepools contained within NpAllOfThem. If work defined over a
+ subpool (such as NP1) arrives, processes on nodes in NP1 that were scheduled for
+ NpAllOfThem are considered ``squatters'' and are the most likely candidates for
+ eviction. (Processes assigned to their proper nodepools are considered ``residents''
+ and are evicted only after all ``squatters'' have been evicted.) The scheduler strives
+ to avoid creating ``squatters''.
+
+ Because non-preemptable process can't be preempeted, work submitted to a class
+ implementing one of the non-preemptable policies (FIXED or RESERVE) are never allowed
+ to ``squat'' in other nodepools and will scheduled only on the nodes in their
+ proper nodepool.
+
+ In the case of multiple top-level nodepools: these nodepools and their subpools
+ form independent scheduling groups. Specifically, fair-share allocations over any
+ nodepool in one top-level pool does NOT affect the fair-share allocations for jobs
+ in any other top-level nodepool.
+
+\subsubsection{Configuration}
+ DUCC uses a simplified JSON-like structure to define nodepools.
+
+ At least one nodepool definition is required. This nodepool need not have any subpools or node
+ definitions. The first top-level nodepool is considered the ``default'' nodepool. A node not
+ named specifically within one of the node files which checks in with DUCC is assigned to this
+ first, or ``default'' nodepool.
+
+ Thus, if only one nodepool is defined with no other attributes, all nodes are
+ assigned to that pool.
+
+ A nodepool definition consists of the token ``Nodepool'' followed by its
+ name, followed by a block delimeted with ``curly'' braces \{ and \}. This
+ block contains the attributes of the nodepool as key/value pairs.
+ Lineneds are ignored. A semicolon $;$ may optionally be used to
+ delimit key/value pairs for readability, and an equals sign ``='' may optinally
+ be used to delimit keys from values, also just for readability.
+
+ The attributes of a Nodepool are:
+ \begin{definition}
+ \item[domain] This is valid only in the ``default'' nodepool. Any node
+ in any nodfile which does not have a doman, and any node which checks
+ in with the schedule without a domain name is assigned this domain name
+ in order that the scheduler may deal entirely with full-qualified node names.
+ \item[nodefile] This is the name of a file containing the names of the nodes
+ which are members of this nodepool.
+ \item[parent] This is used to indicate which nodepool is the logical parent.
+ Any nodepool without a ``parent'' is considered a top-level nodepool.
+ \end{definition}
- This is an example of a declaration of three nodepools.
-
-\begin{verbatim}
-scheduling.nodepool = res res1 res1
-scheduling.nodepool.res = res.nodes
-scheduling.nodepool.res1 = res1.nodes
-scheduling.nodepool.res2 = res2.nodes
-\end{verbatim}
-
- There is no way to enforce priority assignment to any given nodepool. It is possible to declare a
- "preference", such that the resources in a given nodepool are considered first when searching for
- nodes. To configure a preference, use the order decorattion on a nodepool specificaion.
-
- To declare nodepool order, specify the property {\tt scheduling.nodepool.[poolname].order}. The
- nodepools are sorted numerically according to their order, and pools with lower order are
- searched before pools with higher order. The global nodepool always order "0" so it is usally
- searched first. For example, the pool configuration below establishes a search order of
-
+ The following example defines six nodepools,
\begin{enumerate}
- \item global
- \item res2
- \item res
- \item res1
+ \item A top-level nodepool called ``--default--'',
+ \item A top-level nodepool called ``jobdriver'',
+ \item A subpool of ``--default--'' called ``intel'',
+ \item A subpool of ``--default--'' called ``power'',
+ \item A subpool of ``intel'' called ``nightly-test'',
+ \item And a subpool of ``power'' called ``testing-p7'',
\end{enumerate}
- This is an example of a declaration of three nodepools.
-
\begin{verbatim}
-scheduling.nodepool = res res1 res1
-scheduling.nodepool.res = res.nodes
-scheduling.nodepool.res.order = 4
-scheduling.nodepool.res1 = res1.nodes
-scheduling.nodepool.res1.order = 7
-scheduling.nodepool.res2 = res2.nodes
-scheduling.nodepool.res2.order = 2
-\end{verbatim}
+ Nodepool --default-- { domain bluej.net }
+ Nodepool jobdriver { nodefile jobdriver.nodes }
- For each class named in scheduling.class.set a set of properties is specified, defining the rules
- implemented by that class. Each such property is of the form
+ Nodepool intel { nodefile intel.nodes ; parent --default-- }
+ Nodepool power { nodefile power.nodes ; parent --default-- }
-\begin{verbatim}
-scheduling.class.CLASSNAME.RULE = VALUE
+ Nodepool nightly-test { nodefile nightly-test.nodes ; parent intel }
+ Nodepool timing-p7 { nodefile timing-p7.nodes ; parent power }
\end{verbatim}
- where
- \begin{description}
- \item[CLASSNAME] specifies is the name of the class.
- \item[RULE] specifies rule. Rules are described below.
- \item[VALUE] specifies the value of the rule, as described below.
- \end{description}
-
- The rules are:
- \begin{description}
+\subsection{Class Definitions}
- \item[policy] \hfill \\
- This is the scheduling policy, required, and must be one of:
- \begin{itemize}
- \item[] FAIR\_SHARE
- \item[] FIXED\_SHARE
- \item[] RESERVE
- \end{itemize}
-
- \item[share\_weight] \hfill \\
- This is any integer. This is the weighted-fair-share weight for the class as discussed above. It is
- only used when policy = FAIR\_SHARE.
-
- \item[priority] \hfill \\
- This is the evaluation priority for the class as discussed above. This is used for all scheduling
- policies.
-
- \item[cap] \hfill \\
- This is an integer, or an integer with "\%" appended to denote a percentage. It is used for all
- scheduling classes.
-
- This is the class cap as discussed above. It may be an absolute value, in processes (which may
- comprise more than one share quanta), or it may be specified as a percentage by appending
- "\%" to the end. When specified as a percentage, it caps the shares allocated to this class as
- that percentage of the total shares remaining when the class is evaluated.. It does not consider
- shares that may have been available and assigned to higher-priority classes.
-
- \item[nodepool] \hfill \\
- This is the name of the nodepool associated with this class. It must be one of the names
- declared in the property scheduling.nodepool.
-
- \item[prediction] \hfill \\
- Acceptable values are true and false. When set to true the scheduler uses prediction when
- allocating shares. It is only used when policy = FAIR\_SHARE.
-
- \item[prediction.fudge] \hfill \\
- Acceptable values are any integer, denoting milliseconds. This is the prediction fudge as
- discussed above. It is only used when policy = FAIR\_SHARE.
-
- \item[expand.by.doubling] \hfill \\
- Acceptable values are true and false. When set to true the scheduler doubles a job's shares
- up to it's fair-share when possible, as discussed above. It is only used when policy =
- FAIR\_SHARE.
-
- \item[expand.by.doubling] \hfill \\
- Acceptable values are true and false. When set to true the scheduler doubles a job's shares up
- to it's fair-share when possible, as discussed above. When set in ducc.classes it overrides the
- defaults from ducc.properties. It is only used when policy = FAIR\_SHARE.
-
- \item[initialization.cap] \hfill \\
- Acceptable values are any integer. This is the maximum number of processes assigned to a job
- until the first process has successfully completed initialization. To disable the cap, set it to zero
- 0. It is only used when policy = FAIR\_SHARE.
-
- \item[max\_processes] \hfill \\
- Acceptable values are any integer. This is the maximum number of processes assigned to a
- FIXED\_SHARE request. If more are requested, the request is canceled. It is only used when
- policy = FIXED\_SHARE. If set to 0 or not specified, there is no enforced maximum.
-
- \item[max\_machines] \hfill \\
- Acceptable values are any integer. This is the maximum number of machines assigned to a
- RESERVE request. If more are requested, the request is canceled. It is only used when policy =
- RESERVE. If set to 0 or not specified, there is no enforced maximum.
-
- \item[enforce.memory] \hfill \\
- Acceptable values are true and false. When set to true the scheduler requires that any machine
- selected for a reservation matches the reservation's declared memory. The declared memory
- is converted to a number of quantum shares. Only machines whose memory, when converted
- to share quanta are selected. When set to false, any machine in the configured nodepool is
- selected. It is only used when policy = RESERVE.
- \end{description}
-
+ Scheduler classes are defined in the same simplified JSON-like language as
+ nodepools.
-
+ A simple inheritance (or ``template'') scheme is supported for classes. Any
+ class may be configured to ``derive'' from any other class. In this case, the
+ child class acquires all the attributes of the parent class, any of which may
+ be selectively overridden. Multiple inheritance is not supported but
+ nested inheritance is; that is, class A may inherit from class B which inherits
+ from class C and so on. In this way, generalized templates for the site's
+ class structure may be defined.
+
+ The general form of a class definition consists of the keword Class, followed
+ by the name of the class, and then optionally by the name of a ``parent'' class
+ whose characteristics it inherits. Following the name (and optionally parent class
+ name) are the attributes of the class, also within a \{ \} block.
+
+ The attributes defined for classes are:
+ \begin{description}
+ \item[abstract] If specified, this indicates this class is a template ONLY. It is used
+ as a model for oher classes. Values are ``true'' or ``false''. The default is
+ ``false''.
+ \item[cap] This specifies the largest number of shares any job in this class
+ may be assigned. It may be an absolute number or a percentage. If specified as
+ a percentage (i.e. it contains a trailing \%), it specifies a percentage of the
+ total nodes in the containing nodepool.
+ \item[debug] FAIR_SHARE only. This specifies the name of a class to substitute
+ for jobs submitted for debug.
+ \item[expand-by-doubling] FAIR_SHARE only. If ``true'', and the ``initialization-cap'' is
+ set, then after any process has initialized, the job will expand to its maximum allowable
+ shares by doubling in size each scheduling cycle.
+ \item[initialization-cap] FAIR_SHARE only. If specified, this is the largest number of processes this job
+ may be assigned until at least one process has successfully completed initialization.
+ \item[max-processes] FIXED-SHARE only. This is the largest number of FIXED-SHARE,
+ non-preemptable shares any single job may be assigned.
+ \item[prediction-fudge] FAIR_SHARE only. When the scheduler is considering expanding the
+ number of processes for a job it tries to determine if the job may complete before those
+ processes are allocated and initialized. The ``prediction-fudge'' adds some amount of
+ time (in milliseconds) to the projected completion time. This allows installations to
+ prevent jobs from expanding when they were otherwise going to end in a few minutes
+ anyway.
+ \item[nodepool] If specified, jobs for this class are assigned to nodes in this nodepool.
+ \item[policy] This is the scheduling policy, one of FAIR_SHARE, FIXED_SHARE, or RESERVE. This
+ attribute is required (there is no default).
+ \item[priority] This is the scheduling priority for jobs in this class.
+ \item[weight] FAIR_SHARE only. This is the fair-share weight for jobs in this class.
+
+ \end{description}
+