You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@logging.apache.org by rp...@apache.org on 2016/05/01 07:39:51 UTC

logging-log4j2 git commit: LOG4J2-1297 improved latency performance text and graphs

Repository: logging-log4j2
Updated Branches:
  refs/heads/master 21a9a1371 -> 79777a732


LOG4J2-1297 improved latency performance text and graphs


Project: http://git-wip-us.apache.org/repos/asf/logging-log4j2/repo
Commit: http://git-wip-us.apache.org/repos/asf/logging-log4j2/commit/79777a73
Tree: http://git-wip-us.apache.org/repos/asf/logging-log4j2/tree/79777a73
Diff: http://git-wip-us.apache.org/repos/asf/logging-log4j2/diff/79777a73

Branch: refs/heads/master
Commit: 79777a7326f3ed5a0c17273e8940682729916a3f
Parents: 21a9a13
Author: rpopma <rp...@apache.org>
Authored: Sun May 1 14:40:05 2016 +0900
Committer: rpopma <rp...@apache.org>
Committed: Sun May 1 14:40:05 2016 +0900

----------------------------------------------------------------------
 .../ResponseTimeAsyncClassicVsGcFree-label.png  | Bin 31517 -> 29217 bytes
 .../garbage-free2.6-SyncThroughputLinux.png     | Bin 70039 -> 64440 bytes
 src/site/xdoc/manual/garbagefree.xml            |  72 ++++++++++++++-----
 3 files changed, 54 insertions(+), 18 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/logging-log4j2/blob/79777a73/src/site/resources/images/ResponseTimeAsyncClassicVsGcFree-label.png
----------------------------------------------------------------------
diff --git a/src/site/resources/images/ResponseTimeAsyncClassicVsGcFree-label.png b/src/site/resources/images/ResponseTimeAsyncClassicVsGcFree-label.png
index f2f1c03..70353fb 100644
Binary files a/src/site/resources/images/ResponseTimeAsyncClassicVsGcFree-label.png and b/src/site/resources/images/ResponseTimeAsyncClassicVsGcFree-label.png differ

http://git-wip-us.apache.org/repos/asf/logging-log4j2/blob/79777a73/src/site/resources/images/garbage-free2.6-SyncThroughputLinux.png
----------------------------------------------------------------------
diff --git a/src/site/resources/images/garbage-free2.6-SyncThroughputLinux.png b/src/site/resources/images/garbage-free2.6-SyncThroughputLinux.png
index 5de5d9f..bea429d 100644
Binary files a/src/site/resources/images/garbage-free2.6-SyncThroughputLinux.png and b/src/site/resources/images/garbage-free2.6-SyncThroughputLinux.png differ

http://git-wip-us.apache.org/repos/asf/logging-log4j2/blob/79777a73/src/site/xdoc/manual/garbagefree.xml
----------------------------------------------------------------------
diff --git a/src/site/xdoc/manual/garbagefree.xml b/src/site/xdoc/manual/garbagefree.xml
index 2ecfaee..b879720 100644
--- a/src/site/xdoc/manual/garbagefree.xml
+++ b/src/site/xdoc/manual/garbagefree.xml
@@ -364,23 +364,63 @@ public void garbageFree() {
         <a name="Latency" />
         <h4>Response Time Latency</h4>
         <p>Response time is how long it takes to log a message under a certain load.
-          When measuring, each messsage has its own response time. Typically the vast majority of measurements
-          are in a certain low range, with a small number of large outliers. When talking about latency,
+          What is often reported as latency is actually <em>service time</em>: how long it took to perform the operation.
+          This hides the fact that a single spike in service time adds queueing delay for many of the subsequent operations.
+          Service time is easy to measure (and often looks good on paper) but is irrelevant for users since it
+          omits the time spent waiting for service.
+          For this reason we report
+          <a href="https://en.wikipedia.org/wiki/Response_time_(technology)">response time</a>:
+          service time plus wait time.
+        </p>
+        <!--
+        <p>Response time cannot be expressed in a single number: each message has its own response time.
+          Typically the vast majority of measurements
+          are in a certain (low) range, with a small number of large outliers. When talking about latency,
           we are usually interested in the outliers: how many there are and how large they are.
-          "Average latency" is meaningless.
+          There is no bell curve, so "average latency" and "standard deviation" are meaningless terms
+          when talking about latency. The graphs below shows a histogram with the entire percentile spectrum.
         </p>
-        <p>Commonly what is reported as latency is <em>service time</em>: how long it took to perform the operation.
-          This omits the queueing effect: we were trying to log at a certain rate, say 100,000 messages per second.
-          A single spike in service time can delay many of the subsequent messages.
-          <em>Response time</em> includes the effect of spikes.
+        -->
+        <table>
+          <tr><td>I encourage everyone to take the red pill and watch Gil Tene's presentation
+            <a href="http://www.infoq.com/presentations/latency-response-time">How NOT to measure latency</a>.
+            Be warned, this talk is about removing the wool from your eyes.
+            You will learn a lot but may not always like what you've learnt.</td></tr>
+        </table>
+        <p>The response time test results below were all derived from running the ResponseTimeTest class
+          which can be found in the Log4j 2 unit test source directory. If you want to run these tests yourself,
+          here are the command line options we used:
         </p>
+        <ul>
+          <li>-Xms1G -Xmx1G (prevent heap resizing during the test)</li>
+          <li>-XX:+UnlockDiagnosticVMOptions -XX:GuaranteedSafepointInterval=500000 (by default Hotspot schedules a
+            safepoint pause every second. Reduce jitter by postponing this for the duration of the test.)</li>
+          <li>-DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
+            -DAsyncLogger.WaitStrategy=busyspin (to use Async Loggers. The BusySpin wait strategy reduces some jitter.)</li>
+          <li><b>classic mode: </b>-Dlog4j2.enable.threadlocals=false -Dlog4j2.enable.direct.encoders=false<br />
+            <b>garbage-free mode: </b>-Dlog4j2.enable.threadlocals=true -Dlog4j2.enable.direct.encoders=true</li>
+          <li>-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution
+            -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime (to eyeball GC and safepoint pauses)</li>
+        </ul>
+        <!--
+        <pre style="overflow: auto; white-space: pre-wrap; word-wrap: normal;">java -Xms1G -Xmx1G -XX:+UnlockDiagnosticVMOptions -verbose:gc -XX:+PrintGCDetails
+          -XX:+PrintGCDateStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationConcurrentTime
+          -XX:+PrintGCApplicationStoppedTime -XX:GuaranteedSafepointInterval=500000
+          -XX:CompileCommand=dontinline,org.apache.logging.log4j.core.async.perftest.NoOpIdleStrategy::idle
+          -DLog4jContextSelector=org.apache.logging.log4j.core.async.AsyncLoggerContextSelector
+          -Dlog4j2.enable.threadlocals=false -Dlog4j2.enable.direct.encoders=false -DAsyncLogger.WaitStrategy=busyspin
+          -cp .:HdrHistogram-2.1.8.jar:disruptor-3.3.4.jar:log4j-1.2.17.jar:slf4j-api-1.7.13.jar:slf4j-ext-1.7.13.jar:
+          logback-core-1.1.3.jar:logback-classic-1.1.3.jar:log4j-api-2.6-SNAPSHOT.jar:log4j-core-2.6-SNAPSHOT.jar:
+          log4j-core-2.6-SNAPSHOT-tests.jar org.apache.logging.log4j.core.async.perftest.ResponseTimeTest 1 50000</pre>
+        -->
         <h4>Async Loggers</h4>
         <p>The graph below compares "classic" logging to
           garbage-free logging response time behaviour for Log4j's Async Loggers.
           In the graph, "100k" means logging at a sustained load of 100,000 messages/second, "800k" is
-          800,000 messages/second.
+          a sustained load of 800,000 messages/second.
         </p>
-        <p>In classic mode we see numerous minor garbage collections which pause the application threads
+        <p><img src="../images/ResponseTimeAsyncClassicVsGcFree-label.png" /></p>
+        <p>In <b>classic</b> mode we see numerous minor garbage collections which pause the application threads
           for 3 milliseconds or more. This quickly adds up to response time delays of almost 10
           milliseconds.
           As you can see in the graph, increasing the load shifts the curve to the left (there are more spikes).
@@ -388,28 +428,24 @@ public void garbageFree() {
           We experimented a little with reducing the load to 50,000 or even 5000 messages/second,
           but this did not eliminate the 3 millisecond pauses, it just made them occur less frequently.
           Note that all GC pauses in this test are minor GC pauses. We did not see any full garbage collections.</p>
-        <p>In garbage-free mode, maximum response time remains well below 1 millisecond under a wide range of loads.
+        <p>In <b>garbage-free</b> mode, maximum response time remains well below 1 millisecond under a wide range of loads.
           (Max 780 us at 800,000 messages/sec, max 407 us at 600,000 messages/sec, with the 99% around 5 us for
           all loads up to 800,000 messages/sec.) Increasing or decreasing the load does not change the response time
           behaviour. We did not investigate the cause of the 200-300 microsecond pauses we saw in these tests.
         </p>
         <p>
-          TODO title should include "lower is better".
-          TODO legend should be "microseconds" (full)
-        </p>
-
-        <p><img src="../images/ResponseTimeAsyncClassicVsGcFree-label.png" /></p>
-        <p>
           When we increased the load further we begin to see larger response time pauses for both classic and
           garbage-free logging.
           At sustained loads of 1 million messages/second or more we start to approach the maximum throughput of
           the underlying RandomAccessFile Appender (see the synchronous logging throughput chart below).
           At these loads the ringbuffer starts to fill up and backpressure kicks in: attempting to add another message
-          when the ringbuffer is full will block until a free slot becomes available. We start to see 20 millisecond
-          response times or more; and attempting to increase the load even more results in larger and larger
+          when the ringbuffer is full will block until a free slot becomes available. We start to see response times
+          of tens of milliseconds or more; and attempting to increase the load even more results in larger and larger
           response time spikes.
         </p>
         <h4>Synchronous Logging</h4>
+        <p>With synchronous logging, garbage-free logging still performs better, but the difference is less pronounced.
+        </p>
         <p>For both classic and garbage-free logging, synchronous logging showed response times of 3-5 milliseconds
         even at loads of 100,000 messages/second. It may be possible to improve on this, we did not investigate
         further yet.</p>