You are viewing a plain text version of this content. The canonical link for it is here.
Posted to gitbox@activemq.apache.org by GitBox <gi...@apache.org> on 2021/03/07 17:01:33 UTC

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3479: ARTEMIS-3163 Experimental support for Netty IO_URING incubator

franz1981 edited a comment on pull request #3479:
URL: https://github.com/apache/activemq-artemis/pull/3479#issuecomment-792311156


   These are my results by using a single single threaded acceptor for both clients and replication (on the live broker) to fairly compare epoll vs io_uring under load.
   The test is similar to the one on https://issues.apache.org/jira/browse/ARTEMIS-2852 with 32 JMS core clients, 100 persistent bytes messages and IO_URING transport has been used *only* on the live server, leaving the rest as it is ie backup + clients
   
   NOTE: These are just preliminary results, so I won't share HW configuration or anything to make this reproducible, but it should give the magnitude of improvement offered by io_uring.
   
   `master`:
   ```
   **************
   EndToEnd Throughput: 22582 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean               1410.83
   min                 333.82
   50.00%             1368.06
   90.00%             1679.36
   99.00%             2293.76
   99.90%             3489.79
   99.99%            13107.20
   max               16187.39
   count               320000
   ```
   `this pr`:
   ```
   **************
   EndToEnd Throughput: 30540 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean               1052.52
   min                 329.73
   50.00%             1007.62
   90.00%             1286.14
   99.00%             1736.70
   99.90%             4653.06
   99.99%            13893.63
   max               16711.68
   count               320000
   ```
   The profile data collected with https://github.com/jvm-profiling-tools/async-profiler/ are attached on https://issues.apache.org/jira/browse/ARTEMIS-3163
   
   But the important bits are:
   
   - Replication event loop thread: 935 (epoll) vs 775 (io_uring) samples -> ~94% cpu usage vs 78% cpu usage
   - SYSCALLs samples: 
   `epoll`: ~61% samples
   ![image](https://user-images.githubusercontent.com/13125299/110247754-14618e80-7f6e-11eb-8114-ff642fe7bf66.png)
   `io_uring`: ~31% samples
   ![image](https://user-images.githubusercontent.com/13125299/110247721-f1cf7580-7f6d-11eb-8801-1bd1f0e314e1.png)
   
   The io_uring version is far more efficient while using resources then epoll despite our replication process already try to batch writes as much as possible to amortize syscall cost: would be interesting to compare epoll with some OpenOnLoad kernel bypass driver vs io_uring :P
   
   *IMPORTANT*:
   Why I've chosen to use a single thread for everything?
   Don't be tempted to leave the default configuration, because it uses 3 * available cores for the replication/client acceptors: the io_uring version is that much efficient then epoll then the Netty event loops tends to go idle most of the time and need to be awaken, causing application threads to always pay the cost to wakeup event loop threads...this can make the io_uring version to look worse then epoll, while is right the opposite(!!)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org