You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@bookkeeper.apache.org by GitBox <gi...@apache.org> on 2018/04/23 21:18:49 UTC

[GitHub] nicmichael opened a new pull request #1364: Issue #1363: Fast and Garbage-Free Codahale Statistics Timers

nicmichael opened a new pull request #1364: Issue #1363: Fast and Garbage-Free Codahale Statistics Timers
URL: https://github.com/apache/bookkeeper/pull/1364
 
 
   This change introduces a new Codahale-based metrics provider "FastCodahaleMetricsProvider" in the codahale-metrics-provider artifact, which can be activated through the "statsProviderClass" config parameter (no changes made to the existing "CodahaleMetricsProvider"). This new "FastCodahaleMetricsProvider" brings a new timer implementation "FastTimer", which extends Codahale's Timer class, provides an identical API, is registered in Codahale's timer registry, and can be queried through the existing JettyServices implementation. FastTimer provides similar functionality as Codahale's default timer, but is significantly faster and (nearly) garbage free.
   
   The improvements of FastTimer over Codahale default Timer with ExponentiallyDecayingReservoirs are:
   * entirely garbage-free Timer updates (only snapshot creations when querying a timer allocate a small number of Java objects)
   * 3x faster Timer updates (i.e. 3x shorter code-path to update a timer)
   * half the memory footprint (in default configuration)
   
   FastTimer reduces GC pause times in our experiments as described in Issue 1363 by half (same as entirely deactivating all timers).
   
   The primary functional differences between FastTimer and Codahale default Timer with ExponentiallyDecayingReservoirs are:
   * FastTimer keeps values for a fixed, configurable time period (default: 60 seconds) rather than exponentially decaying them
   * FastTimer bucketizes event times rather than storing discrete values. Percentile values depend on the bucket resolution.
   
   Design Considerations
   ---------------------
   
   The design goals of this timer implementation are for it to be
    - fast (i.e. few instructions to update a timer)
    - scalable (i.e. little synchronization cost for concurrent timer updates)
    - garbage-free for timer updates (i.e. no object allocation for timer updates)
    - space-efficient (i.e. as little memory footprint as possible while achieving first three goals)
    - provide similar functionality as Codahale's default timers with ExponentiallyDecayingReservoirs
   
   This implementation provides rate and response times over a configurable sliding time window. Data
   is stored in upfront allocated circular arrays, in which each array element holds data
   for one second. Data is overwritten in a circular fashion without the allocation of new data
   structures and is therefore garbage-free for all timer updates.
   
   This implementation does not store individual response times, but instead allocates bucketized counters
   upfront, which are incremented for any event falling into a particular response time bucket. A
   fine-grained bucket definition (intended for capturing successsful events) and a coarse-grained
   bucket definition (intended to capture failure or timed-out events) are provided.
   
   To improve scalability of concurrent timer updates, most data structures are replicated HASH_SIZE
   times, and calling threads updating a timer are hashed to individual instances. Performance tests
   (see below) have shown that this implementation is light-weight enough to achieve slightly better
   scalability than Codahale's default timers even without hashing, and can further improve scalability
   if hashing is used.
   
   Trading off performance and scalability vs. memory footprint, we need to be conservative in the hash
   size we chose. Different flavors of this timer implementation have been evaluated using JMH
   micro-benchmarks (see microbenchmarks/src/main/java/org/apache/bookkeeper/stats/TimerBenchmark.java),
   comparing implementations of FastTimer with a time window of 60 seconds and
   - (DEV1)   a HASH_SIZE of 3 for all data structures (meters, counters, min/max, and response time buckets)
   - (DEV2)   a HASH_SIZE of 1 for all data structures (meters, counters, min/max, and response time buckets)
   - (FINAL)  a HASH_SIZE of 3 for meters, counters, min/max, and no hashing for response time buckets
   to the default timer implementation
   - (BASE-E) Codahale Timer with ExponentiallyDecayingReservoir (default as used by bookkeeper code)
   - (BASE-T) Codahale Timer with SlidingTimeWindowReservoir configured to 60 seconds
   - (BASE-S) Codahale Timer with SlidingWindowReservoir configured to hold 100,000 events.
   
   Based on results below, implementation (FINAL) was chosen as the final FastTimer implementation, as it
   achieves nearly the same throughput as (DEV1) at nearly the same memory footprint as (DEV2), and
   ultimately achieves roughly 3x higher throughput and scalability that Codahale's default implementation
   at around half the memory footprint.
   
   The following results have been collected on an eight core x86 server running at 3.2 GHz (updated
   timers are shared across 4 threads):
   
   ```
   Config   Timer Impl             Timers   Threads      ops/ms   Alloc B/op   Kb/TimerPair
   ----------------------------------------------------------------------------------------
    DEV1    FastTimer (Hash 3)          1         4   11487.904        0                253
    DEV1    FastTimer (Hash 3)         10         4   22621.702        0                253
    DEV1    FastTimer (Hash 3)        100         4   21781.319        0                253
    DEV2    FastTimer (Hash 1)          1         4    5138.143        0                 88
    DEV2    FastTimer (Hash 1)         10         4   22902.195        0                 88
    DEV2    FastTimer (Hash 1)        100         4   19173.085        0                 88
    FINAL   FastTimer (Hash 3/1)        1         4    9291.002        0                 99
    FINAL   FastTimer (Hash 3/1)       10         4   16379.940        0                 99
    FINAL   FastTimer (Hash 3/1)      100         4   16751.020        0                 99
    BASE-E  CodahaleTimer               1         4    3845.187       82.609            189
    BASE-E  CodahaleTimer              10         4    7262.445       35.035            189
    BASE-E  CodahaleTimer             100         4    7051.77        32.843            189
    BASE-T  CodahaleTimer/TimeWindow    1         4     102.479       90.851            174
    BASE-T  CodahaleTimer/TimeWindow   10         4      68.852       84.812            174
    BASE-T  CodahaleTimer/TimeWindow  100         4     153.444      136.436            174
    BASE-S  CodahaleTimer/SlidingWdw    1         4    4670.543        0               2103
    BASE-S  CodahaleTimer/SlidingWdw   10         4   13696.168        0               2103
    BASE-S  CodahaleTimer/SlidingWdw  100         4   12541.936        0               2103
   ```
   
   - ops/ms is the number of timer updates per millisecond.
   - Alloc B/op is the number of bytes allocated per timer update
   - Kb/TimerPair is the heap footprint per pair of timers (one with fine-grained, one with coarse-grained buckets)
   
   The following test results include snapshot creation every 109 timer updates (typically, we would assume
   snapshot creation to be much less frequent), and show that also with snapshots in the mix, FastTimer outperforms
   Codahale default Timers both with respect to throughput and scalability as well as object allocation:
   
   ```
   Config   Timer Impl             Timers   Threads      ops/ms   Alloc B/op
   -------------------------------------------------------------------------
    FINAL   FastTimer (Hash 3/1)        1         4    1569.953       23.707
    FINAL   FastTimer (Hash 3/1)       10         4    7316.794       24.073
    FINAL   FastTimer (Hash 3/1)      100         4    6498.215       24.073
    BASE-E  CodahaleTimer               1         4     246.953      481.771
    BASE-E  CodahaleTimer              10         4    1989.134      476.807
    BASE-E  CodahaleTimer             100         4    1514.729      468.624
    BASE-T  CodahaleTimer/TimeWindow    1         4       6.063    43795.810
    BASE-T  CodahaleTimer/TimeWindow   10         4      44.651    33916.315
    BASE-T  CodahaleTimer/TimeWindow  100         4     180.431    12330.939
    BASE-S  CodahaleTimer/SlidingWdw    1         4      17.439    14683.756
    BASE-S  CodahaleTimer/SlidingWdw   10         4     107.257    14683.745
    BASE-S  CodahaleTimer/SlidingWdw  100         4     236.538     9767.106
   ```
   
   Unfortunately Codahale does not have a Timer interface we can implement, and some Codahale
   base classes are assuming instances of Timer (for example, our JettyServices instantiate a
   Codahale MetricsServlet, which instantiates a Codahale MetricsModule, which only serializes
   timers that are instances of Timer class into the json output stream). Unless we wanted to
   reimplement or override all these base classes, we can't just implement Codahale's Metered and Sampling
   interfaces. Instead we have to extend its Timer class, even though we're not using any of its
   inherited functionality or data structures. The inherited (unused) member variables of Codahale Timer
   consume slightly less than 512 byte per FastTimer (measured around 425 byte in Codahale 3.1).
   Above memory footprint results include ~ 1 kb of inherited (unused) data structures, which comprise
   around 1% of FastTimer's overall memory footprint.
   
   In terms of functionality, FastTimer provides the same functionality as Codahale's timers
   (in default configuration with ExponentiallyDecayingReservoirs), with the following exceptions:
   - Statistics are kept for a fixed amount of time (rather than exponentially decayed), by
     default 60 seconds. As a consequence, getMeanRate(), getOneMinuteRate(),  getFiveMinuteRate()
     and getFifteenMinuteRate() all return the same value if FastTimer is configured to use a
     60 second time window.
   - FastTimer and FastSnapshot only record bucketized instead of discrete response times. As a
     consequence, the accuracy of percentiles depends on bucket granularity. FastSnapshot also
     can't return discrete values: getValues() returns an empty array, and size returns 0.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services