You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@uima.apache.org by sc...@apache.org on 2009/10/27 19:49:04 UTC
svn commit: r830289 [1/2] - in
/incubator/uima/uima-as/trunk/uima-as-docbooks/src:
docbook/uima_async_scaleout/ olink/uima_async_scaleout/
Author: schor
Date: Tue Oct 27 18:49:03 2009
New Revision: 830289
URL: http://svn.apache.org/viewvc?rev=830289&view=rev
Log:
UIMA-1635 add new uima as chapter on monitoring and tuning, and revise overview chapter
Added:
incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml
Modified:
incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml
incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml
incubator/uima/uima-as/trunk/uima-as-docbooks/src/olink/uima_async_scaleout/htmlsingle-target.db
incubator/uima/uima-as/trunk/uima-as-docbooks/src/olink/uima_async_scaleout/pdf-target.db
Added: incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml?rev=830289&view=auto
==============================================================================
--- incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml (added)
+++ incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.monitoring.and.tuning.xml Tue Oct 27 18:49:03 2009
@@ -0,0 +1,980 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
+ "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd"[
+<!ENTITY % uimaents SYSTEM "../entities.ent">
+]>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements. See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership. The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied. See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<chapter id="ugr.async.mt">
+ <title>Monitoring and Tuning</title>
+ <para>
+ UIMA AS deployments can involve many separate parts running on many
+ different machines. Monitoring facilities and tools built into UIMA AS help
+ in collecting information on the performance of these parts. You can
+ use the monitoring information to identify deployment issues, such as
+ bottlenecks, and address these with various approaches that alter the
+ deployment choices; this is what we mean by "tuning the deployment".
+ </para>
+
+ <para>
+ Monitoring happens in several parts:
+ <itemizedlist>
+ <listitem><para>Each node running a JVM hosting UIMA AS services or clients provides
+ JMX information tracking many items of interest.</para></listitem>
+ <listitem>
+ <para>UIMA AS services include some of these measurements in the information
+ passed back to its client, along with the returned CAS. This allows
+ clients to collect and aggregate measurements over a cluster of remotely-deployed
+ components.</para>
+ </listitem>
+ <listitem>
+ <para>UIMA AS includes a Monitor component that can optionally be turned on to
+ sample the JMX data at
+ a specified interval, and write the results into the UIMA log (or to the
+ console if no log is configured) in several formats, one of which is
+ convenient for reading, and the other is convenient for importing into
+ a spreadsheet program.</para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>Tuning a UIMA AS application is done using several approaches:
+ <itemizedlist>
+ <listitem><para>changing the topology of the scaleout - for instance, allocating more
+ nodes to some parts, less to others</para></listitem>
+ <listitem>
+ <para>adjusting deployment parameters, such as the number of CASes in a CasPool, or
+ the number of threads assigned to do various tasks</para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>
+ In addition, tuning can involve changing the actual analytic algorithms
+ to tune them - but that is beyond the scope of this chapter.
+ </para>
+
+
+ <section id="ugr.async.mt.monitoring">
+ <title>Monitoring</title>
+
+
+ <title>JMX</title>
+ <para>JMX (Java Management Extensions) is a standard Java mechanism that
+ is used to monitor and control Java applications. A standard Java tool
+ provided with most Javas, called
+ <code>jconsole</code>, is a GUI based application that can connect to
+ a JVM and display the information JMX is providing, and also control
+ the application in application-defined specific ways.</para>
+
+ <para>JMX information is provided by a hierarchy of JMX Beans. More
+ background and information on JMX and the jconsole tool is available on the web.</para>
+
+ <para>This section will first describe the basic JMX Beans, and then
+ later describe a UIMA AS monitor tool that can sample the values of these beans at
+ a specified interval and write the results to the UIMA log in various
+ formats.</para>
+
+ <section id="ugr.async.mt.jmx_monitoring">
+ <title>JMX Information from UIMA AS</title>
+
+ <para>JMX information is provided by every UIMA AS service or client as it runs.
+ Each item provided is either an instantaneous measurement (
+ e.g. the number of items in a queue) or an accumulating measurement (
+ e.g. the number of CASes processed). Accumulating measures
+ can be reset to 0 using standard JMX mechanisms.</para>
+
+ <para>
+ JMX information is provided on a JVM basis; a JVM can be hosting 0 or more
+ UIMA AS Services and/or clients. A UIMA AS Service is defined as a component
+ that connects to a queue
+ and accepts CASes to process. A UIMA AS Client, in contrast, sends CASes to
+ be processed; it can be a top level client, or
+ a UIMA AS Service having one or more AS Aggregate delegates, to which it is
+ sending CASes to be processed.
+ </para>
+
+ <para>
+ UIMA AS Services send
+ some of their measurements back to the UIMA AS Clients that sent them CASes; those
+ clients incorporate these measurements into aggregate statistics that they provide.
+ This allows accumulating information among components deployed over many nodes
+ interconnected on a network.
+ </para>
+
+ <para>
+ Some JMX measurement items are constant, and document various settings, descriptors,
+ names, etc., in use by the (one or more) UIMA AS services and/or
+ clients running on this JVM.</para>
+
+ <para>Some time measurements are associated with running some process. These,
+ where possible, are cpu times, as measured by the thread or threads running the process, using the
+ ThreadMXBean class. On some Javas, thread-based cpu time may not be supported, however. In that
+ case, wall-clock time is used instead.</para>
+
+ <para>
+ If the process is multi-threaded, and the cpu has multiple cores,
+ you can get time measurements which exceed the wall clock interval, due to the process consuming
+ cpu time on multiple threads at once.</para>
+
+ <para>Timing information not associated with running code, such as idle time, is measured as wall-clock time.</para>
+
+ <para>The following sections describe the JMX Beans implemented by UIMA AS. The
+ Notes in the tables include the following flags:
+
+ <itemizedlist>
+ <listitem>
+ <para><emphasis role="bold">inst/acc/const</emphasis> - instantaneous, accumulating, or constant measurement</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="bold">sent</emphasis> - sent up to the invoking client with returning CAS</para>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <section id="ugr.async.mt.jmx_monitoring.service">
+ <title>UIMA AS Services JMX measures</title>
+ <para>The next 4 tables detail the JMX measures provided by UIMA AS services.</para>
+ <section id="ugr.async.mt.jmx_monitoring.constant.service">
+ <title>Service information</title>
+ <informaltable frame="all">
+ <tgroup cols="4" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="2*"/>
+ <colspec colname="c2" colwidth="5*"/>
+ <colspec colname="c3" colwidth="1*"/>
+ <colspec colname="c4" colwidth="1*"/>
+ <thead>
+ <row>
+ <entry align="center">Name</entry>
+ <entry align="center">Description</entry>
+ <entry align="center">Units</entry>
+ <entry align="center">Notes</entry>
+ </row>
+ </thead>
+
+ <tbody>
+ <row>
+ <entry>state</entry>
+ <entry>The state of the service (Running, Initializing, Disabled, Stopping, Failed)</entry>
+ <entry>string</entry>
+ <entry>inst</entry>
+ </row>
+ <row>
+ <entry>input queueName</entry>
+ <entry>The name of the input queue</entry>
+ <entry>string</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>reply queueName</entry>
+ <entry>The internally generated name of the reply queue</entry>
+ <entry>string</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>broker URL</entry>
+ <entry>The URL of the JMS queue broker</entry>
+ <entry>string</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>deployment descriptor</entry>
+ <entry>The path to the deployment descriptor for this service</entry>
+ <entry>string</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>is CAS Multiplier</entry>
+ <entry>is this Service a CAS Multiplier</entry>
+ <entry>boolean</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>is top level</entry>
+ <entry>is this Service a top level service, meaning that it connects to
+ an input queue on a queue broker</entry>
+ <entry>boolean</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>service key</entry>
+ <entry>The key name used in the associated Analysis Engine aggregate that specifies
+ this as a delegate</entry>
+ <entry>string</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>is Aggregate</entry>
+ <entry>is this service an AS Aggregate (i.e., has delegates and
+ is marked async="true")</entry>
+ <entry>boolean</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>analysisEngine instance count</entry>
+ <entry>The number of replications of the AS Primitive</entry>
+ <entry>count</entry>
+ <entry>const</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </section>
+
+
+ <section id="ugr.async.mt.jmx_monitoring.service.performance">
+ <title>Service Performance Measurements</title>
+ <informaltable frame="all">
+ <tgroup cols="4" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="2*"/>
+ <colspec colname="c2" colwidth="5*"/>
+ <colspec colname="c3" colwidth="1*"/>
+ <colspec colname="c4" colwidth="1*"/>
+ <thead>
+ <row>
+ <entry align="center">Name</entry>
+ <entry align="center">Description</entry>
+ <entry align="center">Units</entry>
+ <entry align="center">Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>number of CASes processed</entry>
+ <entry>The number of CASes processed by a component</entry>
+ <entry>count - CASes</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>cas deserialization time</entry>
+ <entry>The thread time spent deserializing CASes (receiving, either from client, or replies from delegates)</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>cas serialization time</entry>
+ <entry>The thread time spent serializing CASes (sending, either to delegates or back to client)</entry>
+ <entry>count - CASes</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>analysis time</entry>
+ <entry>The thread time spent in AS Primitive analytics</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>idle time</entry>
+ <entry>The wall clock time a service has been idle. Measure starts
+ after a reply is sent until the next request is receives, and excludes
+ serialization/deserialization times.</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>cas pool wait time</entry>
+ <entry>The time spent waiting for a CAS to become available in the CAS Pool</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>shadow cas pool wait time</entry>
+ <entry>A shadow cas pool is established for services which are Cas Multipliers.
+ This is the time spent waiting for a CAS to become available in the Shadow CAS Pool.</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>time spent in CM getNext</entry>
+ <entry>The time spent inside Cas Multipliers, getting another CAS.
+ This time (doesn't include / includes ????)
+ the time
+ spent waiting for a CAS to become available in the CAS Pool waiting for a CAS to become available in the CAS Pool</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>process thread count</entry>
+ <entry>The number of threads available to process requests</entry>
+ <entry>count</entry>
+ <entry>inst</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </section>
+
+ <section id="ugr.async.mt.jmx_monitoring.service.internal.queues">
+ <title>Co-located Service Queues</title>
+ <para>Co-located services use light-weight, internal (not JMS) queues.
+ These have similar measures as are used with JMS queues, and include
+ these measures for both the input queues and the reply (output) queues:
+ <informaltable frame="all">
+ <tgroup cols="4" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="2*"/>
+ <colspec colname="c2" colwidth="5*"/>
+ <colspec colname="c3" colwidth="1*"/>
+ <colspec colname="c4" colwidth="1*"/>
+ <thead>
+ <row>
+ <entry align="center">Name</entry>
+ <entry align="center">Description</entry>
+ <entry align="center">Units</entry>
+ <entry align="center">Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>consumer count</entry>
+ <entry>The number of threads configured to read the queue</entry>
+ <entry>count</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>dequeue count</entry>
+ <entry>The number of CASes that have been read from this queue</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>queue size</entry>
+ <entry>The number of CASes in the queue</entry>
+ <entry>count</entry>
+ <entry>inst</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </para>
+ </section>
+
+ <section id="ugr.async.mt.jmx_monitoring.service.error">
+ <title>Service Error Measurements</title>
+ <informaltable frame="all">
+ <tgroup cols="4" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="2*"/>
+ <colspec colname="c2" colwidth="5*"/>
+ <colspec colname="c3" colwidth="1*"/>
+ <colspec colname="c4" colwidth="1*"/>
+ <thead>
+ <row>
+ <entry align="center">Name</entry>
+ <entry align="center">Description</entry>
+ <entry align="center">Units</entry>
+ <entry align="center">Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>process Errors</entry>
+ <entry>The number of process errors</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>getMetadata Errors</entry>
+ <entry>The number of getMetadata errors</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>cpc Errors</entry>
+ <entry>The number of Collection Process Complete (cpc) errors</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </section>
+
+ </section>
+
+ <section id="ugr.async.mt.jmx_monitoring.client">
+ <title>Application Client information</title>
+ <para>This section describes monitoring
+ information provided by the UIMA AS Client APIs.
+ Any code that uses the <xref linkend="ugr.ref.async.api.organization">UIMA AS Client APIs</xref>,
+ such as the example application
+ client <code>RunRemoteAsyncAE</code>, will have a set of these
+ JMX measures. Currently no additional
+ tooling (beyond standard tools like <code>jconsole</code>) are provided to
+ view these.
+ </para>
+
+ <section id="ugr.async.mt.jmx_monitoring.client.measures">
+ <informaltable frame="all">
+ <tgroup cols="4" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="2*"/>
+ <colspec colname="c2" colwidth="5*"/>
+ <colspec colname="c3" colwidth="1*"/>
+ <colspec colname="c4" colwidth="1*"/>
+ <thead>
+ <row>
+ <entry align="center">Name</entry>
+ <entry align="center">Description</entry>
+ <entry align="center">Units</entry>
+ <entry align="center">Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+
+ <row>
+ <entry>application Name</entry>
+ <entry>A user-supplied string identifying the application</entry>
+ <entry>string</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>service queue name</entry>
+ <entry>The name of the service queue this client connects to</entry>
+ <entry>string</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>serialization method</entry>
+ <entry>either xmi or binary. This is the serialization the client will use to send
+ CASes to the service, and also tells the service which serialization to use
+ in sending the CASes back.</entry>
+ <entry>string</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>cas pool size</entry>
+ <entry>This client's cas pool size, limiting the number of simultaneous outstanding requests in process</entry>
+ <entry>count</entry>
+ <entry>const</entry>
+ </row>
+ <row>
+ <entry>total number of CASes processed</entry>
+ <entry>count of the total number of CASes sent from this client. Note: in the case
+ where the service is a Cas Multiplier, the "child" CASes are not included in this count.</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+
+ <row>
+ <entry>total time to process</entry>
+ <entry>total thread time spent in processing all CASes, including time in remote delegates</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>average process time</entry>
+ <entry>total number of CASes processed / total time to process</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+ <row>
+ <entry>max process time</entry>
+ <entry>maximum thread time spent in processing a CAS, including time in remote delegates</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+
+ <row>
+ <entry>total serialization time</entry>
+ <entry>total thread time spent in serializing, both to delegates
+ (and recursively, to their delegates) and replies back to senders</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>average serialization time</entry>
+ <entry>average thread time spent in serializing a CAS, both to delegates
+ (and recursively, to their delegates) and replies back to senders</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+ <row>
+ <entry>max serialization time</entry>
+ <entry>maximum thread time spent in serializing a CAS, both to delegates
+ (and recursively, to their delegates) and replies back to senders</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+
+ <row>
+ <entry>total deserialization time</entry>
+ <entry>total thread time spent in deserializing, both replies from delegates and CASes from upper
+ level components being sent to lower level ones.</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>average deserialization time</entry>
+ <entry>average thread time spent in deserializing, both replies from delegates and CASes from upper
+ level components being sent to lower level ones.</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+ <row>
+ <entry>max deserialization time</entry>
+ <entry>maximum thread time spent in deserializing, both replies from delegates and CASes from upper
+ level components being sent to lower level ones.</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+
+ <row>
+ <entry>total idle time</entry>
+ <entry>total wall clock time a top-level service thread has been idle since the thread was last used.
+ If there is more than one service thread, this number is the sum.</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>average idle time</entry>
+ <entry>average wall clock time all top-level service threads have been idle since they were last used</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+ <row>
+ <entry>max idle time</entry>
+ <entry>maximum wall clock time a top-level service thread has been idle since the thread was last used</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+
+ <row>
+ <entry>total time waiting for reply</entry>
+ <entry>total wall clock time, measured from the time a CAS is sent to the top-level queue, until that CAS
+ is returned. Any generated CASes from Cas Multipliers are not counted in this measurement.</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>average time waiting for reply</entry>
+ <entry>average wall clock time from the time a CAS is sent to the reply is received</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+ <row>
+ <entry>max time waiting for reply</entry>
+ <entry>maximum wall clock time from the time a CAS is sent to the reply is received</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+
+ <row>
+ <entry>total response latency time</entry>
+ <entry>total wall clock time, measured from the time a CAS is sent to the top-level queue, including
+ the serialization and deserialization times at the client, until that CAS
+ is returned. Any generated CASes from Cas Multipliers are not counted in this measurement.</entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>average response latency time</entry>
+ <entry>average wall clock time, measured from the time a CAS is sent to the top-level queue, including
+ the serialization and deserialization times at the client, until that CAS
+ is returned.</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+ <row>
+ <entry>max response latency time</entry>
+ <entry>maximum wall clock time, measured from the time a CAS is sent to the top-level queue, including
+ the serialization and deserialization times at the client, until that CAS
+ is returned.</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+
+ <row>
+ <entry>total time waiting for CAS</entry>
+ <entry>total wall-clock time spent waiting for a
+ free CAS to be available in the client's CAS pool, before
+ sending the CAS to input queue for the top level service. </entry>
+ <entry>milli seconds</entry>
+ <entry>acc</entry>
+ </row>
+ <row>
+ <entry>average time waiting for CAS</entry>
+ <entry>average wall-clock time spent waiting for a
+ free CAS to be available in the client's CAS pool</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+ <row>
+ <entry>max time waiting for CAS</entry>
+ <entry>maximum wall-clock time spent waiting for a
+ free CAS to be available in the client's CAS pool</entry>
+ <entry>milli seconds</entry>
+ <entry>inst</entry>
+ </row>
+
+ <row>
+ <entry>total number of CASes requested</entry>
+ <entry>total number of CASes fetched from the CAS pool</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </section>
+
+ <section id="ugr.async.mt.jmx_monitoring.client.error">
+ <title>Client Error Measurements</title>
+ <informaltable frame="all">
+ <tgroup cols="4" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="2*"/>
+ <colspec colname="c2" colwidth="5*"/>
+ <colspec colname="c3" colwidth="1*"/>
+ <colspec colname="c4" colwidth="1*"/>
+ <thead>
+ <row>
+ <entry align="center">Name</entry>
+ <entry align="center">Description</entry>
+ <entry align="center">Units</entry>
+ <entry align="center">Notes</entry>
+ </row>
+ </thead>
+ <tbody>
+
+ <row>
+ <entry>getMeta Timeout Error Count</entry>
+ <entry>number of times a getMeta timed out</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+
+ <row>
+ <entry>getMeta Error Count</entry>
+ <entry>number of times a getMeta request returned with an error</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+
+ <row>
+ <entry>process Timeout Error Count</entry>
+ <entry>number of times a process call timed out</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+
+ <row>
+ <entry>process Error Count</entry>
+ <entry>number of times a process call returned with an error</entry>
+ <entry>count</entry>
+ <entry>acc</entry>
+ </row>
+
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </section>
+ </section>
+ </section>
+
+ <section id="ugr.async.mt.jmx_sampling">
+ <title>Logging Sampled JMX information at intervals</title>
+
+ <para>
+ A common tuning procedure is to run a deployment for a fairly long time with a
+ typical load, and to see what and where hot spots develop. During this process,
+ it is sometimes useful to convert accumulating measurements into averages, perhaps
+ averages per CAS processed.
+ </para>
+ <para>
+ UIMA AS includes a monitor component, org.apache.uima.aae.jmx.monitor.JmxMonitor,
+ to sample JMX measures at specified intervals,
+ compute various averages, and write the results into the UIMA Log (or on the console
+ if no log is configured). The monitor program can be automatically enabled for any deployed service
+ by specifying <code>-D</code> parameters on the JVM command
+ line which launches the service, or, it can be run stand-alone; when run stand-alone, you provide an
+ argument specifying the JVM it is to connect to to get the JMX information. It only connects
+ to one JVM per run; typically, you would connect it to the top-level service.
+ </para>
+
+ <para>
+ The monitor outputs information for that service and its immediate delegates (local or remote); however, it
+ includes information from the complete recursive chain of delegates when computing its measures. You can
+ get detailed monitoring for sub-services by starting or attaching a monitor to those sub-services.
+ </para>
+
+ <para>
+ ActiveMQ uses Queue Brokers to manage the JMS queues used by UIMA AS. These brokers have JMX information
+ that is useful in tuning applications. The Monitor program identifies the Queue Broker being used by the
+ service, and connects to it and incorporates information about queue lengths (both the input queue
+ and the reply queue) into its measurements.
+ </para>
+
+ <section id="ugr.async.mt.jmx_sampling.configuring">
+ <title>Configuring JVM to run the monitor</title>
+ <para>Specify the following JVM System Variable parameters to configure a UIMA AS Client or Service to enable
+ sampling and logging of JMX measures:
+ <itemizedlist>
+ <listitem><para><code>-Duima.jmx.monitor.interval=1000</code> - (optional; default is 1000) specifies the
+ sampling interval in milliseconds</para></listitem>
+ <listitem><para><code>-Duima.jmx.monitor.formatter=<CustomFormatterClassName></code></para></listitem>
+ <listitem><para><code>-Dcom.sun.management.jmxremote</code> - enable JMX</para></listitem>
+ <listitem><para><code>-Dcom.sun.management.jmxremote.port=8009</code></para></listitem>
+ <listitem><para><code>-Dcom.sun.management.jmxremote.authenticate=false</code></para></listitem>
+ <listitem><para><code>-Dcom.sun.management.jmxremote.ssl=false</code></para></listitem>
+ </itemizedlist>
+
+ This configures JMX to run on port 8009 with no authentication, and sets the sampling interval to 1 second,
+ and specifies a custom formatter class name.
+ </para>
+
+ <para>There are two <code>formatter-classes</code> provided with UIMA AS:
+ <itemizedlist>
+ <listitem><para><code>org.apache.uima.aae.jmx.monitor.BasicUimaJmxMonitorListener - </code>
+ this is a multi-line formatter that formats for human-readable output</para></listitem>
+ <listitem><para><code>org.apache.uima.aae.jmx.monitor.SingleLineUimaJmxMonitorListener - </code>
+ this is a formatter that produces one line per interval, suitable for importing into
+ a spreadsheet program.</para></listitem>
+ </itemizedlist>
+
+ Both of these log to the UIMA log at the INFO log level.
+ </para>
+
+ <para>You can also write your own formatter. The monitor provides an API to plug in a custom formatter
+ for displaying service metrics. A custom formatter must implement JmxMonitorListener interface.
+ See the method <code>startMonitor</code> in the class <code>UIMA_Service</code> for an
+ example of how custom JMX Listeners are plugged into the monitor.
+ </para>
+ </section>
+
+ <section id="ugr.async.mt.jmx_sampling.standalone">
+ <title>Running the Monitor program standalone</title>
+ <para>The monitor program can be started separately and pointed to a running UIMA AS Client or Service.
+ To start the program, invoke Java with the following classpath and parameters:
+ <itemizedlist>
+ <listitem>
+ <para>ClassPath:</para>
+ <itemizedlist>
+ <listitem><para>%UIMA_HOME%/lib/uimaj-as-activemq.jar</para></listitem>
+ <listitem><para>%UIMA_HOME%/lib/uimaj-as-core.jar</para></listitem>
+ <listitem><para>%UIMA_HOME%/lib/uima-core.jar</para></listitem>
+ <listitem><para>%UIMA_HOME%/apache-activemq-4.1.1/apache-activemq-4.1.1.jar</para></listitem>
+ </itemizedlist>
+ </listitem>
+ <listitem>
+ <para>Parameters:</para>
+ <itemizedlist>
+ <listitem><para><code>-Djava.util.logging.config.file=%UIMA_HOME%/config/MonitorLogger.properties</code>
+ - specifies the logging file where the information is written to</para></listitem>
+ <listitem><para><code>org.apache.uima.aae.jmx.monitor.JmxMonitor</code> -
+ the class whose main method is invoked</para></listitem>
+ <listitem><para><code>uri</code> - the URI of the jmx instance to monitor.</para></listitem>
+ <listitem><para><code>interval</code> - the (optional)
+ sampling interval, in milliseconds (default = 1000)</para></listitem>
+ </itemizedlist>
+ </listitem>
+ </itemizedlist>
+ </para>
+
+ <para>When run in this manner, it is not (currently) possible to specify the
+ log message formatting class; the multi-line output format is always used.</para>
+ </section>
+
+ <section id="ugr.async.mt.jmx_sampling.output">
+ <title>Monitoring output</title>
+ <para>The monitoring program combines information from the JMX measures, including the associated
+ Queue Broker, sampling accumulating measurements at the specified sampling interval, and produces
+ the following outputs:
+
+ <informaltable frame="all">
+ <tgroup cols="3" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="2*"/>
+ <colspec colname="c2" colwidth="5*"/>
+ <colspec colname="c3" colwidth="1*"/>
+ <thead>
+ <row>
+ <entry align="center">Name</entry>
+ <entry align="center">Description</entry>
+ <entry align="center">Units</entry>>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>Input queue depth</entry>
+ <entry>number of CASes waiting to be processed by a service</entry>
+ <entry>count</entry>
+ </row>
+ <row>
+ <entry>Reply queue depth</entry>
+ <entry>number of CASes returned to the client but not yet picked up by the client</entry>
+ <entry>count</entry>
+ </row>
+ <row>
+ <entry>CASes processed in interval</entry>
+ <entry>Number of CASes processed in this sampling interval</entry>
+ <entry>count</entry>
+ </row>
+ <row>
+ <entry>Idle time in interval</entry>
+ <entry>The total time this service has been idle during this interval</entry>
+ <entry>milli seconds</entry>
+ </row>
+ <row>
+ <entry>Analysis time in interval</entry>
+ <entry>The sum of the times spent in analysis by the service during this interval,
+ including analysis time spent in delegates, recursively</entry>
+ <entry>milli seconds</entry>
+ </row>
+ <row>
+ <entry>Cas Pool free Cas Count</entry>
+ <entry>Number of available CASes in the Cas Pool at the end of the interval</entry>
+ <entry>count</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+ </para>
+
+ <para>In addition to the performance metrics the monitor also provides basic service information:
+ <itemizedlist>
+ <listitem>
+ <para>Service name</para>
+ </listitem>
+ <listitem>
+ <para>Is service top level</para>
+ </listitem>
+ <listitem>
+ <para>Is service remote</para>
+ </listitem>
+ <listitem>
+ <para>Is service a cas multiplier</para>
+ </listitem>
+ <listitem>
+ <para>Number of processing threads</para>
+ </listitem>
+ <listitem>
+ <para>Service uptime (milliseconds)</para>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </section>
+ </section>
+ </section>
+
+ <section id="ugr.async.mt.tuning">
+ <title>Tuning</title>
+
+ <section id="ugr.async.mt.tuning.approach">
+ <title>Tuning procedure</title>
+ <para>This section is a cookbook of best practices for tuning a UIMA AS deployment. The summary information
+ provided by the Monitor program is used to guide the tuning.</para>
+
+ <para>The main metric for detecting an overloaded service is the input queue depth. If it is growing or high, the service
+ is not able to keep up with the load. There are more CASes arriving at the queue than the service can process.
+ Consider increasing number of instances of the services within the JVM (if on a multi-core machine having
+ additional capacity), or deploy additional instances of the service.</para>
+
+ <para>The main metric for detecting idle service is the idle time. If it is high, it can indicate that the service is not
+ receiving enough CASes. This can be caused by a bottleneck in the service's client; supporting evidence for this
+ can be a high reply queue depth for the client - indicating the client is overloaded.
+ Ideally, the idle time should be at zero, which means that the service receives enough CASes
+ to process, continually.</para>
+
+ <para>A CasPool free Cas Count of 0 can point to a bottleneck in a service's client; supporting
+ evidence for this can be a high idle time. In this case, the service does not have enough CASes in its pool and is
+ forced to wait. Remember that a CAS is not returned to the Service's CAS pool until the client signals it can be.
+ A typical reason is a slow client (look for evidence such as a high reply queue depth). Consider
+ incrementing service's Cas pool and check the client's metrics to determine a reason why it is slow.</para>
+
+ </section>
+
+ <section id="ugr.async.mt.tuning.settings">
+ <title>Tuning Settings</title>
+ <para>This section has a list of the tuning parameters and a description of what they do and how they interact.</para>
+ <informaltable frame="all">
+ <tgroup cols="2" colsep="1" rowsep="1">
+ <colspec colname="c1" colwidth="2*"/>
+ <colspec colname="c2" colwidth="4*"/>
+ <thead>
+ <row>
+ <entry align="center">Name</entry>
+ <entry align="center">Description</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>number of services on different machines started</entry>
+ <entry>You can adjust the number of machines assigned to a particular service,
+ even dynamically, by just starting / stopping additional servers that specify
+ the same input queue.</entry>
+ </row>
+ <row>
+ <entry>number of instances of a service</entry>
+ <entry>This is similar to the number of services on different machines started, above,
+ but specifies replication of an AS Primitive within one JVM. This is useful for making
+ use of multi-core machines sharing a common memory - large tables that might be
+ part of the analysis algorithm can be shared by all instances.</entry>
+ </row>
+ <row>
+ <entry>CAS pool size</entry>
+ <entry>This size limits the number of CASes being processed asynchronously.</entry>
+ </row>
+ <row>
+ <entry>casMultiplier poolSize</entry>
+ <entry>This size limits the number of CASes generated by a CAS Multiplier that are being processed asynchronously.</entry>
+ </row>
+ <row>
+ <entry>Service input queue prefetch</entry>
+ <entry>If set greater than 0, allows up to "n" CASes to be pulled into one service provider, at a time.
+ This can increase throughput, but can hurt latency, since one service may have several CASes pulled into it,
+ queued up, while another instance of the service could be "starved" and be sitting there idle. </entry>
+ </row>
+ <row>
+ <entry>Specifying async="true"/"false" on an aggregate</entry>
+ <entry>The default is false, because there is less overhead (no queues are set up, etc.). Setting this to
+ "true" allows multiple CASes to flow simultaneously in the aggregate.</entry>
+ </row>
+ <row>
+ <entry>remoteReplyQueueScaleout</entry>
+ <entry>This parameter indicates the number of threads that will be deployed to read from the remote reply queue.
+ Set to > 1 if deserialization time of replies is a bottleneck.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </informaltable>
+
+ </section>
+
+ </section>
+
+ <section id="ugr.async.mt.limits">
+ <title>Limitations</title>
+ <para>The current (2.3.0) implementation has the following limitations:
+ <itemizedlist>
+ <listitem><para>Monitoring program</para>
+ <itemizedlist>
+ <listitem><para>The monitoring program reads the JMS Queue Broker URL
+ from the configuration information provided by JMX for the UIMA AS Service
+ being monitored. It uses this information to connect to JMX on that broker, but
+ currently assumes that JMX is set up on the default port (1099). This is
+ currently hardcoded into the Monitor program, so be aware of this if you
+ change the port number for JMX on the JMS Queue Broker (a parameter in
+ ActiveMQ's configuration for the broker).
+ </para></listitem>
+ <listitem><para>When the Monitor program is run as a stand-alone program,
+ it is not (currently) possible to specify alternatives for the
+ log message formatting class; the multi-line output format is always used.</para></listitem>
+ </itemizedlist>
+ </listitem>
+ </itemizedlist>
+ </para>
+ </section>
+
+</chapter>
\ No newline at end of file
Modified: incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml?rev=830289&r1=830288&r2=830289&view=diff
==============================================================================
--- incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml (original)
+++ incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/async.overview.xml Tue Oct 27 18:49:03 2009
@@ -970,111 +970,17 @@
<section id="ugr.async.ov.concepts.mc">
<title>Monitoring and Controlling an AS application</title>
<titleabbrev>Monitoring & Controlling</titleabbrev>
- <para>JMX (Java Management Extensions) are used for monitoring and controlling an AS application. This
- capability is being staged; initial versions have some monitoring capability, but little controlling
- capability.</para>
- <para>The first versions of AS will use the standard GUI tooling available as part of Java 5 to display the JMX
- results. Later versions may include additional UIMA-specific tooling for this.</para>
- <section id="ugr.async.ov.concepts.mc.what">
- <title>Instrumentation provided</title>
- <para>UIMA AS Service has a built-in, JMX-based instrumentation that enables service monitoring. It
- provides service metrics collected in real-time at configurable checkpoint intervals (typically, every second).
- The main purpose
- of the monitor is to help in detecting overloaded and idle services. Overloaded services are those
- that are not able to keep up with the work load. Idle services are those that are not receiving enough
- work and stay idle. To detect both conditions, the monitor provides the following metrics: </para>
-
- <section id="ugr.async.ov.concepts.mc.checkpoint">
- <title>Checkpoint intervals</title>
- <para>
- Many of the measures are done with respect to a checkpoint. This is nothing more than a defined
- interval of time (perhaps 1 second),
- and is configured with the Java startup -Duima.jmx.monitor.checkpoint.interval parameter.
- </para>
- </section>
+ <para>JMX (Java Management Extensions) are used for monitoring and controlling an AS application.
+ As of release 2.3.0, extensive monitoring facilities have been implemented; these are described
+ in a separate chapter on <xref linkend="ugr.async.mt">Monitoring and Tuning</xref>.
+ The only controlling facility provided is to stop a service.</para>
+
+ <para>In addition, a configurable Monitoring program is provided which works with the JMX provided measurements
+ and aggregates and samples these over specified intervals, and creates monitoring entries in the
+ UIMA log, for tuning purposes. You can use this to detect overloaded and/or idle services;
+ see the <xref linkend="ugr.async.mt">Monitoring and Tuning</xref> chapter for details.</para>
-
- <itemizedlist>
- <listitem>
- <para>Input queue depth: number of CASes waiting to be processed by a service</para>
- </listitem>
- <listitem>
- <para>Reply queue depth: number of CASes returned to the client but not yet picked up by the client</para>
- </listitem>
- <listitem>
- <para>Number of CASes processed since the last checkpoint</para>
- </listitem>
- <listitem>
- <para>Idle time since the last checkpoint</para>
- </listitem>
- <listitem>
- <para>Time spent in analysis since the last checkpoint</para>
- </listitem>
- <listitem>
- <para>Number of un-checked-out CASes in the Cas Pool</para>
- </listitem>
- </itemizedlist>
-
- <para>In addition to the performance metrics the monitor also provides basic service information:</para>
- <itemizedlist>
- <listitem>
- <para>Service name</para>
- </listitem>
- <listitem>
- <para>Is service top level</para>
- </listitem>
- <listitem>
- <para>Is service remote</para>
- </listitem>
- <listitem>
- <para>Is service a cas multiplier</para>
- </listitem>
- <listitem>
- <para>Number of processing threads</para>
- </listitem>
- <listitem>
- <para>Service uptime</para>
- </listitem>
- </itemizedlist>
-
- <para>The main metric for detecting an overloaded service is the input queue depth. If it is growing or high, the service
- is not able to keep up with the load. There are more CASes arriving at the queue than the service can process.
- Consider increasing number of processig threads in the JVm or start another instance of the service.</para>
-
- <para>The main metric for detecting idle service is the idle time. If it is high, it can indicate that the service is not
- receiving enough CASes. This can be caused by a bottleneck in the service's client; supporting evidence for this
- can be a high reply queue depth for the client - indicating the client is overloaded.
- Ideally, the idle time should be at zero, which means that the service receives enough CASes
- to process. The idle time is shown as a delta from the last time the checkpoint was made.</para>
-
- <para>Zero un-checked-out CASes in the Cas Pool can point to a bottleneck in a service's client; supporting
- evidence for this can be a high idle time. In this case, the service does not have enough CASes in its pool and is
- forced to wait. Remember that a CAS is not returned to the Service's CAS pool until the client signals it can be.
- A typical reason is a slow client (look for evidence such as a high reply queue depth). Consider
- incrementing service's Cas pool and check the client's metrics to determine a reason why it is slow.</para>
-
- <para>The monitor provides an API to plug in a custom formatter for displaying service metrics. A default implementation
- is provided in the UIMA AS runtime. A custom formatter must implement JmxMonitorListener interface and can be
- plugged in to the monitor with the following API:
- <programlisting><![CDATA[ jmxMonitor.addJmxMonitorListener(customJmxListener);]]></programlisting></para>
-
-
-
- <para>To easiest way to start a service with JMX monitoring enabled is to configure UIMA's property: UIMA_JVM_OPTS:</para>
-
- <programlisting>
- <![CDATA[set UIMA_JVM_OPTS=-Dcom.sun.management.jmxremote
- -Dcom.sun.management.jmxremote.port=8009
- -Dcom.sun.management.jmxremote.authenticate=false
- -Dcom.sun.management.jmxremote.ssl=false
- -Djmx.monitor.frequency=1000
- -Djmx.monitor.formatter=<custom formatter class>]]></programlisting>
-
- <para>The above configures JMX MBeanServer to run on port 8009 with no authentication. It also sets the checkpoint interval
- to 1 second and plugs in a custom formatter for displaying metrics. Please note that the default formatter logs the metrics in
- UIMA Log at INFO level.</para>
-
<!--
<para>The implementation provides the following kinds of instrumentation via JMX:
<itemizedlist>
@@ -1106,9 +1012,7 @@
</listitem>
</itemizedlist> </para>
-->
- </section>
- <!-- of ugr.async.ov.concepts.mc.what --></section>
- <!-- of ugr.async.ov.concepts.mc -->
+ </section><!-- of ugr.async.ov.concepts.mc -->
<!-- ======================================================= -->
<!-- | JMS Service Descriptor | -->
<!-- ======================================================= -->
Modified: incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml
URL: http://svn.apache.org/viewvc/incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml?rev=830289&r1=830288&r2=830289&view=diff
==============================================================================
--- incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml (original)
+++ incubator/uima/uima-as/trunk/uima-as-docbooks/src/docbook/uima_async_scaleout/uima_async_scaleout.xml Tue Oct 27 18:49:03 2009
@@ -31,9 +31,10 @@
<toc/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.overview.xml" />
- <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.errorhandling.xml"/>
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.errorhandling.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.async.deployment.xml" />
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="ref.async.api.xml" />
+ <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.monitoring.and.tuning.xml"/>
<xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="async.camel.driver.xml" />
<!--