You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2009/10/27 19:49:04 UTC
svn commit: r830289 [1/2] - in /incubator/uima/uima-as/trunk/uima-as-docbooks/src: docbook/uima_async_scaleout/ olink/uima_async_scaleout/

Author: schor
Date: Tue Oct 27 18:49:03 2009
New Revision: 830289

URL: http://svn.apache.org/viewvc?rev=830289&view=rev
Log:
UIMA-1635 add new uima as chapter on monitoring and tuning, and revise overview chapter

Added:
    incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml
Modified:
    incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml
    incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml
    incubator/uima/uima-as/trunk/uima-as-docbooks/src/olink/uima_async_scaleout/htmlsingle-target.db
    incubator/uima/uima-as/trunk/uima-as-docbooks/src/olink/uima_async_scaleout/pdf-target.db

Added: incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml?rev=830289&view=auto
==============================================================================
--- incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml (added)
+++ incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml Tue Oct 27 18:49:03 2009
@@ -0,0 +1,980 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+       "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../entities.ent">  
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.async.mt">
+  <title>Monitoring and Tuning</title>
+  <para>
+    UIMA AS deployments can involve many separate parts running on many
+    different machines.  Monitoring facilities and tools built into UIMA AS help
+    in collecting information on the performance of these parts.  You can
+    use the monitoring information to identify deployment issues, such as
+    bottlenecks, and address these with various approaches that alter the
+    deployment choices; this is what we mean by "tuning the deployment".
+  </para>
+  
+  <para>
+    Monitoring happens in several parts:
+    <itemizedlist>
+      <listitem><para>Each node running a JVM hosting UIMA AS services or clients provides
+        JMX information tracking many items of interest.</para></listitem>
+      <listitem>
+        <para>UIMA AS services include some of these measurements in the information
+          passed back to its client, along with the returned CAS.  This allows
+          clients to collect and aggregate measurements over a cluster of remotely-deployed
+          components.</para>
+      </listitem>
+      <listitem>
+        <para>UIMA AS includes a Monitor component that can optionally be turned on to
+          sample the JMX data at
+          a specified interval, and write the results into the UIMA log (or to the
+          console if no log is configured) in several formats, one of which is 
+          convenient for reading, and the other is convenient for importing into
+          a spreadsheet program.</para>
+      </listitem>
+    </itemizedlist>
+  </para>
+  
+  <para>Tuning a UIMA AS application is done using several approaches:
+    <itemizedlist>
+      <listitem><para>changing the topology of the scaleout - for instance, allocating more
+        nodes to some parts, less to others</para></listitem>
+      <listitem>
+        <para>adjusting deployment parameters, such as the number of CASes in a CasPool, or
+          the number of threads assigned to do various tasks</para>
+      </listitem>
+    </itemizedlist>
+  </para>
+  
+  <para>
+    In addition, tuning can involve changing the actual analytic algorithms
+    to tune them - but that is beyond the scope of this chapter.
+  </para>
+    
+  
+  <section id="ugr.async.mt.monitoring">
+    <title>Monitoring</title>
+      
+    
+      <title>JMX</title>
+      <para>JMX (Java Management Extensions) is a standard Java mechanism that
+        is used to monitor and control Java applications.  A standard Java tool
+        provided with most Javas, called
+        <code>jconsole</code>, is a GUI based application that can connect to
+        a JVM and display the information JMX is providing, and also control
+        the application in application-defined specific ways.</para>
+        
+      <para>JMX information is provided by a hierarchy of JMX Beans.  More
+        background and information on JMX and the jconsole tool is available on the web.</para>
+        
+      <para>This section will first describe the basic JMX Beans, and then
+        later describe a UIMA AS monitor tool that can sample the values of these beans at
+        a specified interval and write the results to the UIMA log in various
+        formats.</para>
+    
+    <section id="ugr.async.mt.jmx_monitoring">
+      <title>JMX Information from UIMA AS</title>
+      
+      <para>JMX information is provided by every UIMA AS service or client as it runs.
+        Each item provided is either an instantaneous measurement (
+        e.g. the number of items in a queue) or an accumulating measurement (
+        e.g. the number of CASes processed).  Accumulating measures
+        can be reset to 0 using standard JMX mechanisms.</para>
+        
+      <para>
+        JMX information is provided on a JVM basis; a JVM can be hosting 0 or more
+        UIMA AS Services and/or clients.  A UIMA AS Service is defined as a component
+        that connects to a queue
+        and accepts CASes to process.  A UIMA AS Client, in contrast, sends CASes to
+        be processed; it can be a top level client, or
+        a UIMA AS Service having one or more AS Aggregate delegates, to which it is
+        sending CASes to be processed.
+      </para>
+      
+      <para>
+        UIMA AS Services send
+        some of their measurements back to the UIMA AS Clients that sent them CASes; those
+        clients incorporate these measurements into aggregate statistics that they provide.
+        This allows accumulating information among components deployed over many nodes 
+        interconnected on a network.
+      </para>
+     
+      <para>
+        Some JMX measurement items are constant, and document various settings, descriptors, 
+        names, etc., in use by the (one or more) UIMA AS services and/or 
+        clients running on this JVM.</para>
+
+      <para>Some time measurements are associated with running some process.  These,
+        where possible, are cpu times, as measured by the thread or threads running the process, using the
+        ThreadMXBean class.  On some Javas, thread-based cpu time may not be supported, however.  In that
+        case, wall-clock time is used instead.</para>
+        
+      <para>  
+        If the process is multi-threaded, and the cpu has multiple cores, 
+        you can get time measurements which exceed the wall clock interval, due to the process consuming
+        cpu time on multiple threads at once.</para>
+        
+      <para>Timing information not associated with running code, such as idle time, is measured as wall-clock time.</para>      
+            
+      <para>The following sections describe the JMX Beans implemented by UIMA AS.  The
+        Notes in the tables include the following flags:
+        
+        <itemizedlist>
+          <listitem>
+            <para><emphasis role="bold">inst/acc/const</emphasis> - instantaneous, accumulating, or constant measurement</para>
+          </listitem>
+          <listitem>
+            <para><emphasis role="bold">sent</emphasis> - sent up to the invoking client with returning CAS</para>
+          </listitem>
+        </itemizedlist>
+      </para>
+      
+      <section id="ugr.async.mt.jmx_monitoring.service">
+        <title>UIMA AS Services JMX measures</title>
+        <para>The next 4 tables detail the JMX measures provided by UIMA AS services.</para>
+        <section id="ugr.async.mt.jmx_monitoring.constant.service">
+          <title>Service information</title>
+          <informaltable frame="all">
+            <tgroup cols="4" colsep="1" rowsep="1">
+              <colspec colname="c1" colwidth="2*"/>
+              <colspec colname="c2" colwidth="5*"/>
+              <colspec colname="c3" colwidth="1*"/>
+              <colspec colname="c4" colwidth="1*"/>
+              <thead>
+                <row>
+                  <entry align="center">Name</entry>
+                  <entry align="center">Description</entry>
+                  <entry align="center">Units</entry>
+                  <entry align="center">Notes</entry>
+                </row>
+              </thead>
+              
+              <tbody>
+                <row>
+                  <entry>state</entry>
+                  <entry>The state of the service (Running, Initializing, Disabled, Stopping, Failed)</entry>
+                  <entry>string</entry>
+                  <entry>inst</entry>
+                </row>
+                <row>
+                  <entry>input queueName</entry>
+                  <entry>The name of the input queue</entry>
+                  <entry>string</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>reply queueName</entry>
+                  <entry>The internally generated name of the reply queue</entry>
+                  <entry>string</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>broker URL</entry>
+                  <entry>The URL of the JMS queue broker</entry>
+                  <entry>string</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>deployment descriptor</entry>
+                  <entry>The path to the deployment descriptor for this service</entry>
+                  <entry>string</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>is CAS Multiplier</entry>
+                  <entry>is this Service a CAS Multiplier</entry>
+                  <entry>boolean</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>is top level</entry>
+                  <entry>is this Service a top level service, meaning that it connects to
+                    an input queue on a queue broker</entry>
+                  <entry>boolean</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>service key</entry>
+                  <entry>The key name used in the associated Analysis Engine aggregate that specifies
+                    this as a delegate</entry>
+                  <entry>string</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>is Aggregate</entry>
+                  <entry>is this service an AS Aggregate (i.e., has delegates and
+                    is marked async="true")</entry>
+                  <entry>boolean</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>analysisEngine instance count</entry>
+                  <entry>The number of replications of the AS Primitive</entry>
+                  <entry>count</entry>
+                  <entry>const</entry>
+                </row>          
+              </tbody>
+            </tgroup>
+          </informaltable>                  
+        </section>
+        
+        
+      <section id="ugr.async.mt.jmx_monitoring.service.performance">
+        <title>Service Performance Measurements</title>
+        <informaltable frame="all">
+          <tgroup cols="4" colsep="1" rowsep="1">
+            <colspec colname="c1" colwidth="2*"/>
+            <colspec colname="c2" colwidth="5*"/>
+            <colspec colname="c3" colwidth="1*"/>
+            <colspec colname="c4" colwidth="1*"/>
+            <thead>
+              <row>
+                <entry align="center">Name</entry>
+                <entry align="center">Description</entry>
+                <entry align="center">Units</entry>
+                <entry align="center">Notes</entry>
+              </row>
+            </thead>
+            <tbody>
+              <row>
+                <entry>number of CASes processed</entry>
+                <entry>The number of CASes processed by a component</entry>
+                <entry>count - CASes</entry>
+                <entry>acc</entry>
+              </row>
+             <row>
+                <entry>cas deserialization time</entry>
+                <entry>The thread time spent deserializing CASes (receiving, either from client, or replies from delegates)</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+             <row>
+                <entry>cas serialization time</entry>
+                <entry>The thread time spent serializing CASes (sending, either to delegates or back to client)</entry>
+                <entry>count - CASes</entry>
+                <entry>acc</entry>
+             </row>
+             <row>
+                <entry>analysis time</entry>
+                <entry>The thread time spent in AS Primitive analytics</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+             </row>
+             <row>
+                <entry>idle time</entry>
+                <entry>The wall clock time a service has been idle.  Measure starts
+                  after a reply is sent until the next request is receives, and excludes
+                  serialization/deserialization times.</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+             </row>
+             <row>
+                <entry>cas pool wait time</entry>
+                <entry>The time spent waiting for a CAS to become available in the CAS Pool</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+             <row>
+                <entry>shadow cas pool wait time</entry>
+                <entry>A shadow cas pool is established for services which are Cas Multipliers.  
+                  This is the time spent waiting for a CAS to become available in the Shadow CAS Pool.</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>time spent in CM getNext</entry>
+                <entry>The time spent inside Cas Multipliers, getting another CAS.  
+                  This time (doesn't include / includes ????)
+                  the time 
+                  spent waiting for a CAS to become available in the CAS Pool waiting for a CAS to become available in the CAS Pool</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>process thread count</entry>
+                <entry>The number of threads available to process requests</entry>
+                <entry>count</entry>
+                <entry>inst</entry>
+              </row>
+            </tbody>
+          </tgroup>
+        </informaltable>        
+      </section>
+
+      <section id="ugr.async.mt.jmx_monitoring.service.internal.queues">
+        <title>Co-located Service Queues</title>
+        <para>Co-located services use light-weight, internal (not JMS) queues.  
+          These have similar measures as are used with JMS queues, and include
+          these measures for both the input queues and the reply (output) queues:
+          <informaltable frame="all">
+            <tgroup cols="4" colsep="1" rowsep="1">
+              <colspec colname="c1" colwidth="2*"/>
+              <colspec colname="c2" colwidth="5*"/>
+              <colspec colname="c3" colwidth="1*"/>
+              <colspec colname="c4" colwidth="1*"/>
+              <thead>
+                <row>
+                  <entry align="center">Name</entry>
+                  <entry align="center">Description</entry>
+                  <entry align="center">Units</entry>
+                  <entry align="center">Notes</entry>
+                </row>
+              </thead>
+              <tbody>            
+                <row>
+                  <entry>consumer count</entry>
+                  <entry>The number of threads configured to read the queue</entry>
+                  <entry>count</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>dequeue count</entry>
+                  <entry>The number of CASes that have been read from this queue</entry>
+                  <entry>count</entry>
+                  <entry>acc</entry>
+                </row>
+                <row>
+                  <entry>queue size</entry>
+                  <entry>The number of CASes in the queue</entry>
+                  <entry>count</entry>
+                  <entry>inst</entry>
+                </row>
+              </tbody>
+            </tgroup>
+          </informaltable>   
+        </para>
+      </section>
+      
+      <section id="ugr.async.mt.jmx_monitoring.service.error">
+        <title>Service Error Measurements</title>
+        <informaltable frame="all">
+          <tgroup cols="4" colsep="1" rowsep="1">
+            <colspec colname="c1" colwidth="2*"/>
+            <colspec colname="c2" colwidth="5*"/>
+            <colspec colname="c3" colwidth="1*"/>
+            <colspec colname="c4" colwidth="1*"/>
+            <thead>
+              <row>
+                <entry align="center">Name</entry>
+                <entry align="center">Description</entry>
+                <entry align="center">Units</entry>
+                <entry align="center">Notes</entry>
+              </row>
+            </thead>
+            <tbody>            
+              <row>
+                <entry>process Errors</entry>
+                <entry>The number of process errors</entry>
+                <entry>count</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>getMetadata Errors</entry>
+                <entry>The number of getMetadata errors</entry>
+                <entry>count</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>cpc Errors</entry>
+                <entry>The number of Collection Process Complete (cpc) errors</entry>
+                <entry>count</entry>
+                <entry>acc</entry>
+              </row>
+            </tbody>
+          </tgroup>
+        </informaltable>   
+      </section>
+      
+    </section>
+
+        <section id="ugr.async.mt.jmx_monitoring.client">
+          <title>Application Client information</title>
+          <para>This section describes monitoring
+            information provided by the UIMA AS Client APIs. 
+            Any code that uses the <xref linkend="ugr.ref.async.api.organization">UIMA AS Client APIs</xref>, 
+            such as the example application
+            client <code>RunRemoteAsyncAE</code>, will have a set of these
+            JMX measures.  Currently no additional
+            tooling (beyond standard tools like <code>jconsole</code>) are provided to
+            view these.
+          </para>
+          
+          <section id="ugr.async.mt.jmx_monitoring.client.measures">
+          <informaltable frame="all">
+            <tgroup cols="4" colsep="1" rowsep="1">
+              <colspec colname="c1" colwidth="2*"/>
+              <colspec colname="c2" colwidth="5*"/>
+              <colspec colname="c3" colwidth="1*"/>
+              <colspec colname="c4" colwidth="1*"/>
+              <thead>
+                <row>
+                  <entry align="center">Name</entry>
+                  <entry align="center">Description</entry>
+                  <entry align="center">Units</entry>
+                  <entry align="center">Notes</entry>
+                </row>
+              </thead>
+              <tbody>
+                
+                <row>
+                  <entry>application Name</entry>
+                  <entry>A user-supplied string identifying the application</entry>
+                  <entry>string</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>service queue name</entry>
+                  <entry>The name of the service queue this client connects to</entry>
+                  <entry>string</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>serialization method</entry>
+                  <entry>either xmi or binary. This is the serialization the client will use to send
+                    CASes to the service, and also tells the service which serialization to use
+                    in sending the CASes back.</entry>
+                  <entry>string</entry>
+                  <entry>const</entry>
+                </row>
+                <row>
+                  <entry>cas pool size</entry>
+                  <entry>This client's cas pool size, limiting the number of simultaneous outstanding requests in process</entry>
+                  <entry>count</entry>
+                  <entry>const</entry>
+                </row>                      
+              <row>
+                <entry>total number of CASes processed</entry>
+                <entry>count of the total number of CASes sent from this client.  Note: in the case
+                  where the service is a Cas Multiplier, the "child" CASes are not included in this count.</entry>
+                <entry>count</entry>
+                <entry>acc</entry>
+              </row>
+  
+              <row>
+                <entry>total time to process</entry>
+                <entry>total thread time spent in processing all CASes, including time in remote delegates</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>average process time</entry>
+                <entry>total number of CASes processed / total time to process</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+              <row>
+                <entry>max process time</entry>
+                <entry>maximum thread time spent in processing a CAS, including time in remote delegates</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+  
+              <row>
+                <entry>total serialization time</entry>
+                <entry>total thread time spent in serializing, both to delegates 
+                  (and recursively, to their delegates) and replies back to senders</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>average serialization time</entry>
+                <entry>average thread time spent in serializing a CAS, both to delegates 
+                  (and recursively, to their delegates) and replies back to senders</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+              <row>
+                <entry>max serialization time</entry>
+                <entry>maximum thread time spent in serializing a CAS, both to delegates
+                  (and recursively, to their delegates) and replies back to senders</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+  
+              <row>
+                <entry>total deserialization time</entry>
+                <entry>total thread time spent in deserializing, both replies from delegates and CASes from upper
+                  level components being sent to lower level ones.</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>average deserialization time</entry>
+                <entry>average thread time spent in deserializing, both replies from delegates and CASes from upper
+                  level components being sent to lower level ones.</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+              <row>
+                <entry>max deserialization time</entry>
+                <entry>maximum thread time spent in deserializing, both replies from delegates and CASes from upper
+                  level components being sent to lower level ones.</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+  
+              <row>
+                <entry>total idle time</entry>
+                <entry>total wall clock time a top-level service thread has been idle since the thread was last used.
+                  If there is more than one service thread, this number is the sum.</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>average idle time</entry>
+                <entry>average wall clock time all top-level service threads have been idle since they were last used</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+              <row>
+                <entry>max idle time</entry>
+                <entry>maximum wall clock time a top-level service thread has been idle since the thread was last used</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+  
+              <row>
+                <entry>total time waiting for reply</entry>
+                <entry>total wall clock time, measured from the time a CAS is sent to the top-level queue, until that CAS
+                  is returned.  Any generated CASes from Cas Multipliers are not counted in this measurement.</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>average time waiting for reply</entry>
+                <entry>average wall clock time from the time a CAS is sent to the reply is received</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+              <row>
+                <entry>max time waiting for reply</entry>
+                <entry>maximum wall clock time from the time a CAS is sent to the reply is received</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+  
+              <row>
+                <entry>total response latency time</entry>
+                <entry>total wall clock time, measured from the time a CAS is sent to the top-level queue, including
+                  the serialization and deserialization times at the client, until that CAS
+                  is returned.  Any generated CASes from Cas Multipliers are not counted in this measurement.</entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>average response latency time</entry>
+                <entry>average wall clock time, measured from the time a CAS is sent to the top-level queue, including
+                  the serialization and deserialization times at the client, until that CAS
+                  is returned.</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+              <row>
+                <entry>max response latency time</entry>
+                <entry>maximum wall clock time, measured from the time a CAS is sent to the top-level queue, including
+                  the serialization and deserialization times at the client, until that CAS
+                  is returned.</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+              
+              <row>
+                <entry>total time waiting for CAS</entry>
+                <entry>total wall-clock time spent waiting for a 
+                  free CAS to be available in the client's CAS pool, before
+                  sending the CAS to input queue for the top level service. </entry>
+                <entry>milli seconds</entry>
+                <entry>acc</entry>
+              </row>
+              <row>
+                <entry>average time waiting for CAS</entry>
+                <entry>average wall-clock time spent waiting for a 
+                  free CAS to be available in the client's CAS pool</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+              <row>
+                <entry>max time waiting for CAS</entry>
+                <entry>maximum wall-clock time spent waiting for a 
+                  free CAS to be available in the client's CAS pool</entry>
+                <entry>milli seconds</entry>
+                <entry>inst</entry>
+              </row>
+              
+              <row>
+                <entry>total number of CASes requested</entry>
+                <entry>total number of CASes fetched from the CAS pool</entry>
+                <entry>count</entry>
+                <entry>acc</entry>
+              </row>
+            </tbody>
+          </tgroup>
+        </informaltable>         
+      </section>
+      
+      <section id="ugr.async.mt.jmx_monitoring.client.error">
+        <title>Client Error Measurements</title>
+        <informaltable frame="all">
+          <tgroup cols="4" colsep="1" rowsep="1">
+            <colspec colname="c1" colwidth="2*"/>
+            <colspec colname="c2" colwidth="5*"/>
+            <colspec colname="c3" colwidth="1*"/>
+            <colspec colname="c4" colwidth="1*"/>
+            <thead>
+              <row>
+                <entry align="center">Name</entry>
+                <entry align="center">Description</entry>
+                <entry align="center">Units</entry>
+                <entry align="center">Notes</entry>
+              </row>
+            </thead>
+            <tbody>
+        
+              <row>
+                <entry>getMeta Timeout Error Count</entry>
+                <entry>number of times a getMeta timed out</entry>
+                <entry>count</entry>
+                <entry>acc</entry>
+              </row>
+             
+              <row>
+                <entry>getMeta Error Count</entry>
+                <entry>number of times a getMeta request returned with an error</entry>
+                <entry>count</entry>
+                <entry>acc</entry>
+              </row>
+             
+              <row>
+                <entry>process Timeout Error Count</entry>
+                <entry>number of times a process call timed out</entry>
+                <entry>count</entry>
+                <entry>acc</entry>
+              </row>
+             
+              <row>
+                <entry>process Error Count</entry>
+                <entry>number of times a process call returned with an error</entry>
+                <entry>count</entry>
+                <entry>acc</entry>
+              </row>
+              
+            </tbody>
+          </tgroup>
+        </informaltable>
+      </section>      
+    </section>  
+    </section>  
+    
+    <section id="ugr.async.mt.jmx_sampling">
+      <title>Logging Sampled JMX information at intervals</title>
+      
+      <para>
+        A common tuning procedure is to run a deployment for a fairly long time with a
+        typical load, and to see what and where hot spots develop.  During this process, 
+        it is sometimes useful to convert accumulating measurements into averages, perhaps
+        averages per CAS processed.
+      </para>
+      <para>
+        UIMA AS includes a monitor component, org.apache.uima.aae.jmx.monitor.JmxMonitor, 
+        to sample JMX measures at specified intervals,
+        compute various averages, and write the results into the UIMA Log (or on the console 
+        if no log is configured).  The monitor program can be automatically enabled for any deployed service
+        by specifying <code>-D</code> parameters on the JVM command 
+        line which launches the service, or, it can be run stand-alone; when run stand-alone, you provide an
+        argument specifying the JVM it is to connect to to get the JMX information.  It only connects
+        to one JVM per run; typically, you would connect it to the top-level service.  
+      </para>
+      
+      <para>
+        The monitor outputs information for that service and its immediate delegates (local or remote); however, it
+        includes information from the complete recursive chain of delegates when computing its measures.  You can
+        get detailed monitoring for sub-services by starting or attaching a monitor to those sub-services.
+      </para>
+      
+      <para>
+        ActiveMQ uses Queue Brokers to manage the JMS queues used by UIMA AS.  These brokers have JMX information
+        that is useful in tuning applications.  The Monitor program identifies the Queue Broker being used by the 
+        service, and connects to it and incorporates information about queue lengths (both the input queue 
+        and the reply queue) into its measurements.
+      </para>
+      
+      <section id="ugr.async.mt.jmx_sampling.configuring">
+        <title>Configuring JVM to run the monitor</title>
+        <para>Specify the following JVM System Variable parameters to configure a UIMA AS Client or Service to enable 
+          sampling and logging of JMX measures:
+          <itemizedlist>
+            <listitem><para><code>-Duima.jmx.monitor.interval=1000</code> - (optional; default is 1000) specifies the 
+              sampling interval in milliseconds</para></listitem>
+            <listitem><para><code>-Duima.jmx.monitor.formatter=&lt;CustomFormatterClassName></code></para></listitem>
+            <listitem><para><code>-Dcom.sun.management.jmxremote</code> - enable JMX</para></listitem>            
+            <listitem><para><code>-Dcom.sun.management.jmxremote.port=8009</code></para></listitem>
+            <listitem><para><code>-Dcom.sun.management.jmxremote.authenticate=false</code></para></listitem>
+            <listitem><para><code>-Dcom.sun.management.jmxremote.ssl=false</code></para></listitem>
+          </itemizedlist>
+          
+          This configures JMX to run on port 8009 with no authentication, and sets the sampling interval to 1 second,
+          and specifies a custom formatter class name.
+        </para>
+        
+        <para>There are two <code>formatter-classes</code> provided with UIMA AS:
+          <itemizedlist>
+            <listitem><para><code>org.apache.uima.aae.jmx.monitor.BasicUimaJmxMonitorListener - </code>
+              this is a multi-line formatter that formats for human-readable output</para></listitem>
+            <listitem><para><code>org.apache.uima.aae.jmx.monitor.SingleLineUimaJmxMonitorListener - </code>
+              this is a formatter that produces one line per interval, suitable for importing into 
+              a spreadsheet program.</para></listitem>           
+          </itemizedlist>
+          
+          Both of these log to the UIMA log at the INFO log level.
+        </para>
+        
+        <para>You can also write your own formatter.  The monitor provides an API to plug in a custom formatter 
+          for displaying service metrics. A custom formatter must implement JmxMonitorListener interface. 
+          See the method <code>startMonitor</code> in the class <code>UIMA_Service</code> for an
+          example of how custom JMX Listeners are plugged into the monitor. 
+        </para> 
+      </section> 
+      
+      <section id="ugr.async.mt.jmx_sampling.standalone">
+        <title>Running the Monitor program standalone</title>
+        <para>The monitor program can be started separately and pointed to a running UIMA AS Client or Service.
+          To start the program, invoke Java with the following classpath and parameters:
+          <itemizedlist>
+            <listitem>
+              <para>ClassPath:</para>
+              <itemizedlist>
+                <listitem><para>%UIMA_HOME%/lib/uimaj-as-activemq.jar</para></listitem>
+                <listitem><para>%UIMA_HOME%/lib/uimaj-as-core.jar</para></listitem>
+                <listitem><para>%UIMA_HOME%/lib/uima-core.jar</para></listitem>
+                <listitem><para>%UIMA_HOME%/apache-activemq-4.1.1/apache-activemq-4.1.1.jar</para></listitem>
+              </itemizedlist>
+            </listitem>
+            <listitem>
+              <para>Parameters:</para>
+              <itemizedlist>
+                <listitem><para><code>-Djava.util.logging.config.file=%UIMA_HOME%/config/MonitorLogger.properties</code>
+                  - specifies the logging file where the information is written to</para></listitem>
+                <listitem><para><code>org.apache.uima.aae.jmx.monitor.JmxMonitor</code> -
+                  the class whose main method is invoked</para></listitem>
+                <listitem><para><code>uri</code> - the URI of the jmx instance to monitor.</para></listitem>
+                <listitem><para><code>interval</code> - the (optional) 
+                  sampling interval, in milliseconds (default = 1000)</para></listitem>
+              </itemizedlist>                 
+            </listitem>
+          </itemizedlist>
+        </para>
+        
+        <para>When run in this manner, it is not (currently) possible to specify the
+          log message formatting class; the multi-line output format is always used.</para>
+      </section>     
+      
+      <section id="ugr.async.mt.jmx_sampling.output">
+        <title>Monitoring output</title>
+        <para>The monitoring program combines information from the JMX measures, including the associated
+          Queue Broker, sampling accumulating measurements at the specified sampling interval, and produces
+          the following outputs:
+          
+          <informaltable frame="all">
+            <tgroup cols="3" colsep="1" rowsep="1">
+              <colspec colname="c1" colwidth="2*"/>
+              <colspec colname="c2" colwidth="5*"/>
+              <colspec colname="c3" colwidth="1*"/>
+              <thead>
+                <row>
+                  <entry align="center">Name</entry>
+                  <entry align="center">Description</entry>
+                  <entry align="center">Units</entry>>
+                </row>
+              </thead>
+              <tbody>
+                <row>
+                  <entry>Input queue depth</entry>
+                  <entry>number of CASes waiting to be processed by a service</entry>
+                  <entry>count</entry>
+                </row>
+                <row>
+                  <entry>Reply queue depth</entry>
+                  <entry>number of CASes returned to the client but not yet picked up by the client</entry>
+                  <entry>count</entry>
+                </row>
+                <row>
+                  <entry>CASes processed in interval</entry>
+                  <entry>Number of CASes processed in this sampling interval</entry>
+                  <entry>count</entry>
+                </row>
+                <row>
+                  <entry>Idle time in interval</entry>
+                  <entry>The total time this service has been idle during this interval</entry>
+                  <entry>milli seconds</entry>
+                </row>
+                <row>
+                  <entry>Analysis time in interval</entry>
+                  <entry>The sum of the times spent in analysis by the service during this interval, 
+                    including analysis time spent in delegates, recursively</entry>
+                  <entry>milli seconds</entry>
+                </row>
+                <row>
+                  <entry>Cas Pool free Cas Count</entry>
+                  <entry>Number of available CASes in the Cas Pool at the end of the interval</entry>
+                  <entry>count</entry>
+                </row>               
+              </tbody>
+            </tgroup>
+          </informaltable>
+        </para>
+        
+        <para>In addition to the performance metrics the monitor also provides basic service information:
+          <itemizedlist>
+            <listitem>
+              <para>Service name</para>
+            </listitem>
+            <listitem>
+              <para>Is service top level</para>
+            </listitem>
+            <listitem>
+              <para>Is service remote</para>
+            </listitem>
+            <listitem>
+              <para>Is service a cas multiplier</para>
+            </listitem>
+            <listitem>
+              <para>Number of processing threads</para>
+            </listitem>
+            <listitem>
+              <para>Service uptime (milliseconds)</para>
+            </listitem>
+          </itemizedlist> 
+        </para>
+      </section>
+    </section>
+  </section>
+  
+  <section id="ugr.async.mt.tuning">
+    <title>Tuning</title>
+        
+    <section id="ugr.async.mt.tuning.approach">
+      <title>Tuning procedure</title>
+      <para>This section is a cookbook of best practices for tuning a UIMA AS deployment.  The summary information
+        provided by the Monitor program is used to guide the tuning.</para>
+
+    <para>The main metric for detecting an overloaded service is the input queue depth. If it is growing or high, the service 
+           is not able to keep up with the load. There are more CASes arriving at the queue than the service can process. 
+           Consider increasing number of instances of the services within the JVM (if on a multi-core machine having 
+           additional capacity), or deploy additional instances of the service.</para>
+
+    <para>The main metric for detecting idle service is the idle time. If it is high, it can indicate that the service is not 
+          receiving enough CASes. This can be caused by a bottleneck in the service's client; supporting evidence for this
+          can be a high reply queue depth for the client - indicating the client is overloaded. 
+          Ideally, the idle time should be at zero, which means that the service receives enough CASes 
+          to process, continually.</para>
+
+    <para>A CasPool free Cas Count of 0 can point to a bottleneck in a service's client; supporting
+      evidence for this can be a high idle time. In this case, the service does not have enough CASes in its pool and is 
+          forced to wait. Remember that a CAS is not returned to the Service's CAS pool until the client signals it can be.
+          A typical reason is a slow client (look for evidence such as a high reply queue depth). Consider 
+          incrementing service's Cas pool and check the client's metrics to determine a reason why it is slow.</para>
+      
+    </section>
+
+    <section id="ugr.async.mt.tuning.settings">
+      <title>Tuning Settings</title>
+      <para>This section has a list of the tuning parameters and a description of what they do and how they interact.</para>
+      <informaltable frame="all">
+        <tgroup cols="2" colsep="1" rowsep="1">
+          <colspec colname="c1" colwidth="2*"/>
+          <colspec colname="c2" colwidth="4*"/>
+          <thead>
+            <row>
+              <entry align="center">Name</entry>
+              <entry align="center">Description</entry>
+            </row>
+          </thead>
+          <tbody>
+            <row>
+              <entry>number of services on different machines started</entry>
+              <entry>You can adjust the number of machines assigned to a particular service,
+                even dynamically, by just starting / stopping additional servers that specify
+                the same input queue.</entry>
+            </row>
+            <row>
+              <entry>number of instances of a service</entry>
+              <entry>This is similar to the number of services on different machines started, above, 
+                but specifies replication of an AS Primitive within one JVM.  This is useful for making
+                use of multi-core machines sharing a common memory - large tables that might be 
+                part of the analysis algorithm can be shared by all instances.</entry>
+            </row>
+            <row>
+              <entry>CAS pool size</entry>
+              <entry>This size limits the number of CASes being processed asynchronously.</entry>
+            </row>
+            <row>
+              <entry>casMultiplier poolSize</entry>
+              <entry>This size limits the number of CASes generated by a CAS Multiplier that are being processed asynchronously.</entry>
+            </row>
+            <row>
+              <entry>Service input queue prefetch</entry>
+              <entry>If set greater than 0, allows up to "n" CASes to be pulled into one service provider, at a time.
+                This can increase throughput, but can hurt latency, since one service may have several CASes pulled into it,
+                queued up, while another instance of the service could be "starved" and be sitting there idle. </entry>
+            </row>
+            <row>
+              <entry>Specifying async="true"/"false" on an aggregate</entry>
+              <entry>The default is false, because there is less overhead (no queues are set up, etc.).  Setting this to 
+                "true" allows multiple CASes to flow simultaneously in the aggregate.</entry>
+            </row>
+            <row>
+              <entry>remoteReplyQueueScaleout</entry>
+              <entry>This parameter indicates the number of threads that will be deployed to read from the remote reply queue.
+                Set to > 1 if deserialization time of replies is a bottleneck.</entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>     
+      
+    </section>
+    
+  </section>
+  
+  <section id="ugr.async.mt.limits">
+    <title>Limitations</title>
+    <para>The current (2.3.0) implementation has the following limitations:
+      <itemizedlist>
+        <listitem><para>Monitoring program</para>
+          <itemizedlist>
+            <listitem><para>The monitoring program reads the JMS Queue Broker URL 
+              from the configuration information provided by JMX for the UIMA AS Service
+              being monitored.  It uses this information to connect to JMX on that broker, but
+              currently assumes that JMX is set up on the default port (1099).  This is 
+              currently hardcoded into the Monitor program, so be aware of this if you 
+              change the port number for JMX on the JMS Queue Broker (a parameter in 
+              ActiveMQ's configuration for the broker).
+            </para></listitem>
+            <listitem><para>When the Monitor program is run as a stand-alone program, 
+              it is not (currently) possible to specify alternatives for the
+              log message formatting class; the multi-line output format is always used.</para></listitem>
+          </itemizedlist>      
+        </listitem>        
+      </itemizedlist>
+    </para>
+  </section>
+   
+</chapter>
\ No newline at end of file

Modified: incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml?rev=830289&r1=830288&r2=830289&view=diff
==============================================================================
--- incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml (original)
+++ incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml Tue Oct 27 18:49:03 2009
@@ -970,111 +970,17 @@
   <section id="ugr.async.ov.concepts.mc">
     <title>Monitoring and Controlling an AS application</title>
     <titleabbrev>Monitoring &amp; Controlling</titleabbrev>
-    <para>JMX (Java Management Extensions) are used for monitoring and controlling an AS application. This
-      capability is being staged; initial versions have some monitoring capability, but little controlling
-      capability.</para>
-    <para>The first versions of AS will use the standard GUI tooling available as part of Java 5 to display the JMX
-      results. Later versions may include additional UIMA-specific tooling for this.</para>
-    <section id="ugr.async.ov.concepts.mc.what">
-      <title>Instrumentation provided</title>
-         <para>UIMA AS Service has a built-in, JMX-based instrumentation that enables service monitoring. It 
-         provides service metrics collected in real-time at configurable checkpoint intervals (typically, every second). 
-         The main purpose 
-         of the monitor is to help in detecting overloaded and idle services. Overloaded services are those 
-         that are not able to keep up with the work load. Idle services are those that are not receiving enough 
-         work and stay idle. To detect both conditions, the monitor provides the following metrics: </para>
-         
-         <section id="ugr.async.ov.concepts.mc.checkpoint">
-           <title>Checkpoint intervals</title>
-           <para>
-            Many of the measures are done with respect to a checkpoint.  This is nothing more than a defined
-            interval of time (perhaps 1 second), 
-            and is configured with the Java startup -Duima.jmx.monitor.checkpoint.interval parameter.
-           </para>
-         </section>
+    <para>JMX (Java Management Extensions) are used for monitoring and controlling an AS application.
+      As of release 2.3.0, extensive monitoring facilities have been implemented; these are described
+      in a separate chapter on <xref linkend="ugr.async.mt">Monitoring and Tuning</xref>.
+      The only controlling facility provided is to stop a service.</para>
+    
+           <para>In addition, a configurable Monitoring program is provided which works with the JMX provided measurements
+            and aggregates and samples these over specified intervals, and creates monitoring entries in the
+            UIMA log, for tuning purposes.  You can use this to detect overloaded and/or idle services;
+            see the <xref linkend="ugr.async.mt">Monitoring and Tuning</xref> chapter for details.</para>
          
 
-
-	<itemizedlist>
-          <listitem>
-            <para>Input queue depth: number of CASes waiting to be processed by a service</para>
-          </listitem>
-          <listitem>
-            <para>Reply queue depth: number of CASes returned to the client but not yet picked up by the client</para>
-          </listitem>
-          <listitem>
-            <para>Number of CASes processed since the last checkpoint</para>
-          </listitem>
-          <listitem>
-            <para>Idle time since the last checkpoint</para>
-          </listitem>
-          <listitem>
-            <para>Time spent in analysis since the last checkpoint</para>
-          </listitem>
-          <listitem>
-            <para>Number of un-checked-out CASes in the Cas Pool</para>
-          </listitem>
-    </itemizedlist> 
-
-    <para>In addition to the performance metrics the monitor also provides basic service information:</para>
-	<itemizedlist>
-          <listitem>
-            <para>Service name</para>
-          </listitem>
-          <listitem>
-            <para>Is service top level</para>
-          </listitem>
-          <listitem>
-            <para>Is service remote</para>
-          </listitem>
-          <listitem>
-            <para>Is service a cas multiplier</para>
-          </listitem>
-          <listitem>
-            <para>Number of processing threads</para>
-          </listitem>
-          <listitem>
-            <para>Service uptime</para>
-          </listitem>
-    </itemizedlist> 
-
-    <para>The main metric for detecting an overloaded service is the input queue depth. If it is growing or high, the service 
-           is not able to keep up with the load. There are more CASes arriving at the queue than the service can process. 
-           Consider increasing number of processig threads in the JVm or start another instance of the service.</para>
-
-    <para>The main metric for detecting idle service is the idle time. If it is high, it can indicate that the service is not 
-          receiving enough CASes. This can be caused by a bottleneck in the service's client; supporting evidence for this
-          can be a high reply queue depth for the client - indicating the client is overloaded. 
-          Ideally, the idle time should be at zero, which means that the service receives enough CASes 
-          to process. The idle time is shown as a delta from the last time the checkpoint was made.</para>
-
-    <para>Zero un-checked-out CASes in the Cas Pool can point to a bottleneck in a service's client; supporting
-      evidence for this can be a high idle time. In this case, the service does not have enough CASes in its pool and is 
-          forced to wait. Remember that a CAS is not returned to the Service's CAS pool until the client signals it can be.
-          A typical reason is a slow client (look for evidence such as a high reply queue depth). Consider 
-          incrementing service's Cas pool and check the client's metrics to determine a reason why it is slow.</para>
-
-    <para>The monitor provides an API to plug in a custom formatter for displaying service metrics. A default implementation 
-          is provided in the UIMA AS runtime. A custom formatter must implement JmxMonitorListener interface and can be 
-          plugged in to the monitor with the following API:
-          <programlisting><![CDATA[ jmxMonitor.addJmxMonitorListener(customJmxListener);]]></programlisting></para>
-
-
-
-    <para>To easiest way to start a service with JMX monitoring enabled is to configure UIMA's property: UIMA_JVM_OPTS:</para>
-
-    <programlisting>
-    <![CDATA[set UIMA_JVM_OPTS=-Dcom.sun.management.jmxremote 
-    -Dcom.sun.management.jmxremote.port=8009 
-    -Dcom.sun.management.jmxremote.authenticate=false 
-    -Dcom.sun.management.jmxremote.ssl=false 
-    -Djmx.monitor.frequency=1000 
-    -Djmx.monitor.formatter=<custom formatter class>]]></programlisting>
-
-	<para>The above configures JMX MBeanServer to run on port 8009 with no authentication. It also sets the checkpoint interval 
-	to 1 second and plugs in a custom formatter for displaying metrics. Please note that the default formatter logs the metrics in 
-	UIMA Log at INFO level.</para>      
-      
       <!--
       <para>The implementation provides the following kinds of instrumentation via JMX:
         <itemizedlist>
@@ -1106,9 +1012,7 @@
           </listitem>
           </itemizedlist> </para>
      -->     
-     </section>
-    <!-- of ugr.async.ov.concepts.mc.what --></section>
-  <!-- of ugr.async.ov.concepts.mc -->
+  </section><!-- of ugr.async.ov.concepts.mc -->
   <!-- ======================================================= -->
   <!-- |                 JMS Service Descriptor              | -->
   <!-- ======================================================= -->

Modified: incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml?rev=830289&r1=830288&r2=830289&view=diff
==============================================================================
--- incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml (original)
+++ incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml Tue Oct 27 18:49:03 2009
@@ -31,9 +31,10 @@
   <toc/>
 
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.overview.xml" />
-  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.errorhandling.xml"/> 
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.errorhandling.xml"/>
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.async.deployment.xml" />
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.async.api.xml" />
+  <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.monitoring.and.tuning.xml"/> 
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.camel.driver.xml" />
 
   <!--