You are viewing a plain text version of this content. The canonical link for it is here.

Posted to gitbox@activemq.apache.org by GitBox <gi...@apache.org> on 2021/05/11 08:26:12 UTC

[GitHub] [activemq-artemis] franz1981 opened a new pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

franz1981 opened a new pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572


   https://issues.apache.org/jira/browse/ARTEMIS-3289


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187


   @clebertsuconic @michaelandrepearce @gtully 
   One important note related the default thread pool(s) configuration of the broker.
   Just to have some fun, I've played a bit with the thread configuration in order to reduce the number of context switched of master and this is the improvement I got:
   ```
   **************
   RUN 1	EndToEnd Throughput: 51031 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                346.02
   min                  92.67
   50.00%              321.54
   90.00%              444.42
   99.00%              626.69
   99.90%             4014.08
   99.99%            14942.21
   max               22020.10
   count              1600000
   ```
   Same test as https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475, where I was getting `37770 ops/sec`
   
   I've just reduced the acceptors/netty thread pool size to be 1/2 of the available cores (instead of 3X), while leaving the rest to the `thread-pool-max-size` (instead of having it to be `30`).
   
   eg on a 16 cores machines -> 8 threads for Netty + 8 threads for Artemis generic thread pool.
   Given that core protocol perform very few operations on the Netty threads I've decided to split the CPU capacity in 2.
   The boost is A LOT, hence I think we should re-think the default parameters we set for the broker wdyt? any concern?
   I'm going to file a new issue if you agree: both public benchmarks and cloud instances will get a huge boost by having proper default values, because users won't know upfront the machines they're going to use.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475


   I've run some benchmarks vs this version getting these results on the replica broker:
   
   `main` total broker CPU usage = ~1000 samples. 
   ```
   430 samples on (5) I/O threads + 570 samples on replication endpoint thread
   
   155 samples/ 570 samples on replication endpoint thread are spent to awake the I/O threads.
   ```
    
   `this PR` total broker CPU usage = ~664 samples
   ```
   171 samples on a single I/O thread + 492 samples on replication endpoint thread
   
   91 samples/492 samples on replication endpoint thread are spent to awake the single I/O thread.
   ```
   TLDR: *The improvement in term of CPU usage is ~30% (!!!!)*
   
   Throughput is slightly improved for the single producer/single consumer case, but not dramatically.
   The improved CPU usage instead means having more room to handle I/O, but start to become interesting while increasing the scale of connected clients:
   
   `main` with 16 producers/consumers
   
   ```
   **************
   RUN 1	EndToEnd Throughput: 33265 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                582.69
   min                 108.03
   50.00%              462.85
   90.00%              610.30
   99.00%             4554.75
   99.90%             9699.33
   99.99%            15073.28
   max               23855.10
   count              1600000
   ```
   
   While `this PR`:
   ```
   **************
   RUN 1	EndToEnd Throughput: 37770 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                471.10
   min                 116.22
   50.00%              411.65
   90.00%              544.77
   99.00%             2326.53
   99.90%             7274.50
   99.99%            12648.45
   max               28442.62
   count              1600000
   ```
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475


   I've run some benchmarks vs this version getting these results on the replica broker:
   
   `main` total broker CPU usage = ~1000 samples. 
   
   430 samples on (5) I/O threads + 570 samples on replication endpoint thread
   
   155 samples/ 570 samples on replication endpoint thread are spent to awake the I/O threads.
    
   `this PR` total broker CPU usage = ~664 samples
   
   171 samples on a single I/O thread + 492 samples on replication endpoint thread
   
   91 samples/492 samples on replication endpoint thread are spent to awake the single I/O thread.
   
   TLDR: *The improvement in term of CPU usage is ~30% (!!!!)*
   
   Throughput is slightly improved for the single producer/single consumer case, but not dramatically.
   The improved CPU usage instead means having more room to handle I/O, but start to become interesting while increasing the scale of clients connected eg
   
   `main` with 16 producers/consumers
   
   ```
   **************
   RUN 1	EndToEnd Throughput: 33265 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                582.69
   min                 108.03
   50.00%              462.85
   90.00%              610.30
   99.00%             4554.75
   99.90%             9699.33
   99.99%            15073.28
   max               23855.10
   count              1600000
   ```
   
   While `this PR`:
   ```
   **************
   RUN 1	EndToEnd Throughput: 37770 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                471.10
   min                 116.22
   50.00%              411.65
   90.00%              544.77
   99.00%             2326.53
   99.90%             7274.50
   99.99%            12648.45
   max               28442.62
   count              1600000
   ```
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841792315


   By using the improved broker configuration on https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187, I've been able to run some tests with 32 producers/consumers and this PR is indeed delivering a huge speedup under heavy load
   
   `main`:
   ```
   **************
   RUN 1	EndToEnd Throughput: 51037 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean               2740.07
   min                 102.91
   50.00%              565.25
   90.00%             6258.69
   99.00%            48496.64
   99.90%            82837.50
   99.99%            95944.70
   max               96993.28
   count              3200000
   ```
   `this PR`:
   ```
   **************
   RUN 1	EndToEnd Throughput: 60405 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                611.91
   min                  95.23
   50.00%              471.04
   90.00%              700.42
   99.00%             4358.14
   99.90%            10158.08
   99.99%            18350.08
   max               29097.98
   count              3200000
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841681576


   @clebertsuconic I haven't been able to run scale tests with more then 32 clients because I was hitting the issue reported here https://github.com/apache/activemq-artemis/pull/3558#issuecomment-841664877 with both this PR and `main` too, so I suppose it worth to investigate on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 closed pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 closed pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-846778250


   I've cleanup machines and performed some tests (again) with the previous version of this PR vs last commit, without getting any evident regressions. I need to cleanup the exclusive executor impl putting it into a separate package (+ some tests) and we're ready to go: in the best case this PR is delivering a nice speedup in both throughput and latencies but the most noticeable benefit come from CPU usage, 30% less under mild load for backup.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187


   @clebertsuconic @michaelandrepearce @gtully 
   One important note related the default thread pool(s) configuration of the broker.
   Just to have some fun, I've played a bit with the thread configuration in order to reduce the number of context switched of master and this is the improvement I got:
   ```
   **************
   RUN 1	EndToEnd Throughput: 51031 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                346.02
   min                  92.67
   50.00%              321.54
   90.00%              444.42
   99.00%              626.69
   99.90%             4014.08
   99.99%            14942.21
   max               22020.10
   count              1600000
   ```
   Same test as https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475, where I was getting `37770 ops/sec`
   
   I've just reduced the acceptors/netty thread pool size to be 1/2 of the available cores (instead of 3X), while leaving the rest to the `thread-pool-max-size` (instead of having it to be `30`).
   
   eg on a 16 cores machines -> 8 threads for Netty + 8 threads for Artemis generic thread pool.
   Given that core protocol perform very few operations on the Netty threads I've decided to split the CPU capacity in 2.
   The boost is A LOT, hence I think we should re-think the default parameters we set for the broker wdyt?
   I'm going to file a new issue if you agree 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841618355


   @TomasHofman I see that you've worked on https://issues.apache.org/jira/browse/ARTEMIS-3037.
   My changes for https://issues.apache.org/jira/browse/ARTEMIS-3290  are merging the append attempt with `checkKnowRecordID` on the appender thread.
   
   I'm not sure I can see the issue re https://issues.apache.org/jira/browse/ARTEMIS-3037 so I don't know if/how I should fix it now.
   I see that there is no way to make read lock lock to throw any exception, see:
   ```java
      public static void main(String[] args) throws InterruptedException {
         ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
         lock.writeLock().lock();
         CountDownLatch started = new CountDownLatch(1);
         final AtomicBoolean tryLock = new AtomicBoolean();
         Thread t = new Thread(() -> {
            started.countDown();
            while (!tryLock.get()) {
   
            }
            System.out.println("try lock");
            try {
               lock.readLock().lock();
            } catch (Throwable err) {
               err.printStackTrace();
            } finally {
               System.out.println("EXIT!");
            }
         });
         t.start();
         started.await();
         t.interrupt();
         tryLock.set(true);
         System.out.println("INTERRUPT!");
      }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-914045280


   Closing this to be reopened in the future


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475


   I've run some benchmarks vs this version getting these results on the replica:
   
   `main` total broker CPU usage = ~1000 samples. 
   
   430 samples on (5) I/O threads + 570 samples on replication endpoint thread
   
   155 samples/ 570 samples on replication endpoint thread are spent to awake the I/O threads.
    
   `this PR` total broker CPU usage = ~664 samples
   
   171 samples on a single I/O thread + 492 samples on replication endpoint thread
   
   91 samples/492 samples on replication endpoint thread are spent to awake the single I/O thread.
   
   TLDR: *The improvement in term of CPU usage is ~30% (!!!!)*
   
   Throughput is slightly improved for the single producer/single consumer case, but not dramatically.
   The improved CPU usage instead means having more room to handle I/O, but I don't have a big enough machine to test this case.
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-845691579


   I've reverted the executor change, re-applying the original `ExclusiveExecutor` code and removing the usage of JCTools qs, getting back just some of the original performance of https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841792315 but still with ugly 90 percentiles: still investigating what's going on
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-845710520


   I see that I'm not getting anymore the improvement of the first version of this PR (removing the correct thread pool sizing): 
   ```
   **************
   RUN 1	EndToEnd Throughput: 35957 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                649.70
   min                 107.01
   50.00%              544.77
   90.00%              729.09
   99.00%             4194.30
   99.90%             9043.97
   99.99%            14811.14
   max               24510.46
   count              1600000
   ```
   if compared to https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841655602


   @clebertsuconic @gtully 
   I'm not totally sold about the current semantic while checking for record ID presence: if `sync == false` (as `ReplicationEndpoint`) it seems that `JournalBase` is going to use `DummyCallback.getInstance()` as callback, meaning that it won't expect to block *ever*, but the existing semantic that check for record presence will make the caller thread to block until the record presence is verified, is it correct? 
   I would expect instead that if `sync == false` the caller thread won't get any error back, if not for something that can be verified on the same thread eg the new lock-free `knownRecordID` check.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-846790513


   This is the last comparison of `main` vs `this PR` with 32 producers.
   `main`:
   ```
   **************
   RUN 1	EndToEnd Throughput: 52667 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                638.65
   min                 104.45
   50.00%              573.44
   90.00%              884.74
   99.00%             2015.23
   99.90%             7045.12
   99.99%            11141.12
   max               21233.66
   count              3200000
   ```
   `this PR`
   ```
   **************
   RUN 1	EndToEnd Throughput: 57492 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                725.65
   min                  94.72
   50.00%              501.76
   90.00%             1286.14
   99.00%             4685.82
   99.90%            10616.83
   99.99%            17563.65
   max               25690.11
   count              3200000
   ```
   Despite what looks to be a better latency profile for `main` the reality is that the new version is capable of pushing a bit more throughput with a better CPU usage overall, that could be used elsewhere (more load or GC).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841660131


   I need to run some benchmark using replication to see how much it worths both changes :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on a change in pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on a change in pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#discussion_r630026428



##########
File path: artemis-journal/src/main/java/org/apache/activemq/artemis/core/io/mapped/MappedFile.java
##########
@@ -48,7 +48,8 @@ private MappedFile(FileChannel channel, MappedByteBuffer byteBuffer, int positio
       this.position = position;
       this.length = length;
       this.byteBufWrapper = Unpooled.wrappedBuffer(buffer);
-      this.channelBufferWrapper = new ChannelBufferWrapper(this.byteBufWrapper, false);
+      // we control channelBufferWrapper life-cycle
+      this.channelBufferWrapper = new ChannelBufferWrapper(this.byteBufWrapper, true);

Review comment:
       you're right :P 
   This was an old branch and I've probably missed to remove some of the changes I made, let me remove this, thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-845435198


   I see that the very last version that's using a ThreadPoolExecutor + an ordered executor isn't getting the same performance of the `ExclusiveExecutor` I've built https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841792315, hence I'm going to revert to the original abstraction
   ```
   **************
   RUN 1	EndToEnd Throughput: 54793 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean              20899.53
   min                  92.67
   50.00%              561.15
   90.00%            84410.37
   99.00%           211812.35
   99.90%           333447.17
   99.99%           385875.97
   max              404750.34
   count              3200000
   
   ```
   Going to try some more tests to be sure of this 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-845710520


   I see that I'm not getting anymore the improvement of the first version of this PR (removing the correct thread pool sizing): 
   ```
   **************
   RUN 2	EndToEnd Throughput: 33550 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                728.90
   min                 135.17
   50.00%              565.25
   90.00%              819.20
   99.00%             5210.11
   99.90%            10420.22
   99.99%            23855.10
   max               24510.46
   count               160000
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841681576


   @clebertsuconic I haven't been able to run scale tests with more then 32 clients because I was hitting the issue reported here https://github.com/apache/activemq-artemis/pull/3558#issuecomment-841664877 with both this PR and `main` too, so I suppose it worth to address it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187


   @clebertsuconic @michaelandrepearce @gtully 
   One important note related the default thread pool(s) configuration of the broker.
   Just to have some fun, I've played a bit with the thread configuration in order to reduce the number of context switched of master and this is the improvement I got:
   ```
   **************
   RUN 1	EndToEnd Throughput: 51031 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                346.02
   min                  92.67
   50.00%              321.54
   90.00%              444.42
   99.00%              626.69
   99.90%             4014.08
   99.99%            14942.21
   max               22020.10
   count              1600000
   ```
   Same test as https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475, where I was getting `37770 ops/sec`
   
   I've just reduced the acceptor thread pool size to be 1/2 of the available cores (instead of 3X), while leaving the rest to the `thread-pool-max-size` (instead of having it to be `30`).
   
   eg on a 16 cores machines -> 8 threads for Netty + 8 threads for Artemis generic thread pool.
   Given that core protocol perform very few operations on the Netty threads I've decided to split the CPU capacity in 2.
   The boost is A LOT, hence I think we should re-think the default parameters we set for the broker wdyt?
   I'm going to file a new issue if you agree 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475


   I've run some benchmarks vs this version getting these results on the replica:
   
   `main` total broker CPU usage = ~1000 samples. 
   
   430 samples on (5) I/O threads + 570 samples on replication endpoint thread
   
   155 samples/ 570 samples on replication endpoint thread are spent to awake the I/O threads.
    
   `this PR` total broker CPU usage = ~664 samples
   
   171 samples on a single I/O thread + 492 samples on replication endpoint thread
   
   91 samples/570 samples on replication endpoint thread are spent to awake the single I/O thread.
   
   TLDR: *The improvement in term of CPU usage is ~30% (!!!!)*
   
   Throughput is slightly improved for the single producer/single consumer case, but not dramatically.
   The improved CPU usage instead means having more room to handle I/O, but I don't have a big enough machine to test this case.
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841618355


   @TomasHofman I see that you've worked on https://issues.apache.org/jira/browse/ARTEMIS-3037.
   My changes for https://issues.apache.org/jira/browse/ARTEMIS-3290  are merging the append attempt with `checkKnowRecordID` on the appender thread.
   
   I'm not sure I can see the issue re https://issues.apache.org/jira/browse/ARTEMIS-3037 so I don't know if/how I should fix it now.
   I see that there is no way to make read lock lock to throw any exception, see:
   ```java
      public static void main(String[] args) throws InterruptedException {
         ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
         lock.writeLock().lock();
         CountDownLatch started = new CountDownLatch(1);
         final AtomicBoolean tryLock = new AtomicBoolean();
         Thread t = new Thread(() -> {
            started.countDown();
            while (!tryLock.get()) {
   
            }
            System.out.println("try lock");
            try {
               lock.readLock().lock();
            } catch (Throwable err) {
               err.printStackTrace();
            } finally {
               System.out.println("EXIT!");
            }
         });
         t.start();
         started.await();
         t.interrupt();
         tryLock.set(true);
         System.out.println("INTERRUPT!");
      }
   ```
   So, in short, if the read lock isn't using https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Lock.html#lockInterruptibly() it cannot detect a Thread interruption and cannot propagate it to anyone waiting for it.
   I'm going to send a separate commit that's going to fix it, by referencing your same issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841618355


   @TomasHofman I see that you've worked on https://issues.apache.org/jira/browse/ARTEMIS-3037.
   My changes for https://issues.apache.org/jira/browse/ARTEMIS-3290  are merging the append attempt with the `checkKnowRecordID` while happening on the appender thread.
   But I see that the appender methods are not handling Thread interruption while awaiting to enter the read lock(!!!) that seems like a bug of the original implementation too, agree?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475


   I've run some benchmarks vs this version getting these results on the replica:
   
   `main` total broker CPU usage = ~1000 samples. 
   
   430 samples on (5) I/O threads + 570 samples on replication endpoint thread
   
   155 samples/ 570 samples on replication endpoint thread are spent to awake the I/O threads.
    
   `this PR` total broker CPU usage = ~664 samples
   
   171 samples on a single I/O thread + 492 samples on replication endpoint thread
   
   91 samples/492 samples on replication endpoint thread are spent to awake the single I/O thread.
   
   TLDR: *The improvement in term of CPU usage is ~30% (!!!!)*
   
   Throughput is slightly improved for the single producer/single consumer case, but not dramatically.
   The improved CPU usage instead means having more room to handle I/O, but start to become interesting while increasing the scale of clients connected eg
   
   `main` with 16 producers/consumers
   
   ```
   **************
   RUN 1	EndToEnd Throughput: 31101 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                645.23
   min                 105.98
   50.00%              489.47
   90.00%              671.74
   99.00%             5439.49
   99.90%            11010.05
   99.99%            20185.09
   max               35913.73
   count              1600000
   ```
   
   While `this PR`:
   ```
   **************
   RUN 1	EndToEnd Throughput: 37770 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                471.10
   min                 116.22
   50.00%              411.65
   90.00%              544.77
   99.00%             2326.53
   99.90%             7274.50
   99.99%            12648.45
   max               28442.62
   count              1600000
   ```
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187


   @clebertsuconic @michaelandrepearce @gtully 
   One important note related the default thread pool(s) configuration of the broker.
   Just to have some fun, I've played a bit with the thread configuration in order to reduce the number of context switched of master and this is the improvement I got:
   ```
   **************
   RUN 1	EndToEnd Throughput: 51031 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                346.02
   min                  92.67
   50.00%              321.54
   90.00%              444.42
   99.00%              626.69
   99.90%             4014.08
   99.99%            14942.21
   max               22020.10
   count              1600000
   ```
   Same test as https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475, where I was getting `37770 ops/sec`
   
   Just as a side note `main` behaviour after fine tuning the box is getting
   ```
   **************
   RUN 1	EndToEnd Throughput: 46643 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                358.92
   min                  98.30
   50.00%              344.06
   90.00%              458.75
   99.00%              618.50
   99.90%             3276.80
   99.99%            11599.87
   max               27394.05
   count              1600000
   ```
   vs `33265 ops/sec`
   
   The improvement delivered by this PR is still there `51031 ops/sec vs 46643 ops/sec`
   
   I've just reduced the acceptors/netty thread pool size to be 1/2 of the available cores (instead of 3X), while leaving the rest to the `thread-pool-max-size` (instead of having it to be `30`).
   
   eg on a 16 cores machines -> 8 threads for Netty + 8 threads for Artemis generic thread pool.
   Given that core protocol perform very few operations on the Netty threads I've decided to split the CPU capacity in 2.
   The boost is A LOT, hence I think we should re-think the default parameters we set for the broker wdyt? any concern?
   I'm going to file a new issue if you agree: both public benchmarks and cloud instances will get a huge boost by having proper default values, because users won't know upfront the machines they're going to use.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841618355


   @TomasHofman I see that you've worked on https://issues.apache.org/jira/browse/ARTEMIS-3037.
   My changes for https://issues.apache.org/jira/browse/ARTEMIS-3290  are merging the append attempt with the `checkKnowRecordID` while happening on the appender thread.
   But I see that the existing appender methods are not handling Thread interruption while awaiting to enter the read lock(!!!) that seems like a bug of the original implementation too, agree?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-845731469


   Found the reason of the regression: the changes for https://issues.apache.org/jira/browse/ARTEMIS-3290 are impacting differently from the original runs and I'm getting again some sync check for known record ID that slow down replica append update records.
   
   Probably worth re-thinking https://issues.apache.org/jira/browse/ARTEMIS-3290 to save the sync call to happen as a whole.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] gtully commented on a change in pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

gtully commented on a change in pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#discussion_r629975131



##########
File path: artemis-journal/src/main/java/org/apache/activemq/artemis/core/io/mapped/MappedFile.java
##########
@@ -48,7 +48,8 @@ private MappedFile(FileChannel channel, MappedByteBuffer byteBuffer, int positio
       this.position = position;
       this.length = length;
       this.byteBufWrapper = Unpooled.wrappedBuffer(buffer);
-      this.channelBufferWrapper = new ChannelBufferWrapper(this.byteBufWrapper, false);
+      // we control channelBufferWrapper life-cycle
+      this.channelBufferWrapper = new ChannelBufferWrapper(this.byteBufWrapper, true);

Review comment:
       I don't see the relevance of this change? can you explain the implication?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841660131


   I need to run some benchmark using replication to see how much worth these changes :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-845731469


   Found the reason of the regression: the changes for https://issues.apache.org/jira/browse/ARTEMIS-3290 are impacting differently from the original runs and I'm getting again some sync check for known record ID that slow down replica append update records.
   
   Probably worth re-thinking https://issues.apache.org/jira/browse/ARTEMIS-3290 to save the sync call to happen as a whole.
   The improvement on CPU usage, instead, is still there and it's the motivation behind this PR, so worth to bring in.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-845710520


   I see that I'm not getting anymore the improvement of the first version of this PR (removing the correct thread pool sizing): 
   ```
   **************
   RUN 2	EndToEnd Throughput: 33550 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                728.90
   min                 135.17
   50.00%              565.25
   90.00%              819.20
   99.00%             5210.11
   99.90%            10420.22
   99.99%            23855.10
   max               24510.46
   count               160000
   ```
   if compared to https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475


   I've run some benchmarks vs this version getting these results on the replica broker:
   
   `main` total broker CPU usage = ~1000 samples. 
   
   430 samples on (5) I/O threads + 570 samples on replication endpoint thread
   
   155 samples/ 570 samples on replication endpoint thread are spent to awake the I/O threads.
    
   `this PR` total broker CPU usage = ~664 samples
   
   171 samples on a single I/O thread + 492 samples on replication endpoint thread
   
   91 samples/492 samples on replication endpoint thread are spent to awake the single I/O thread.
   
   TLDR: *The improvement in term of CPU usage is ~30% (!!!!)*
   
   Throughput is slightly improved for the single producer/single consumer case, but not dramatically.
   The improved CPU usage instead means having more room to handle I/O, but start to become interesting while increasing the scale of clients connected eg
   
   `main` with 16 producers/consumers
   
   ```
   **************
   RUN 1	EndToEnd Throughput: 31101 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                645.23
   min                 105.98
   50.00%              489.47
   90.00%              671.74
   99.00%             5439.49
   99.90%            11010.05
   99.99%            20185.09
   max               35913.73
   count              1600000
   ```
   
   While `this PR`:
   ```
   **************
   RUN 1	EndToEnd Throughput: 37770 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                471.10
   min                 116.22
   50.00%              411.65
   90.00%              544.77
   99.00%             2326.53
   99.90%             7274.50
   99.99%            12648.45
   max               28442.62
   count              1600000
   ```
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841618355


   @TomasHofman I see that you've worked on https://issues.apache.org/jira/browse/ARTEMIS-3037.
   My changes for https://issues.apache.org/jira/browse/ARTEMIS-3290  are merging the append attempt with the `checkKnowRecordID` while happening on the appender thread.
   
   I'm not sure I can see the issue re https://issues.apache.org/jira/browse/ARTEMIS-3037 so I don't know if/how I should fix it now.
   I see that there is no way to make read lock lock to throw any exception, see:
   ```java
      public static void main(String[] args) throws InterruptedException {
         ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
         lock.writeLock().lock();
         CountDownLatch started = new CountDownLatch(1);
         final AtomicBoolean tryLock = new AtomicBoolean();
         Thread t = new Thread(() -> {
            started.countDown();
            while (!tryLock.get()) {
   
            }
            System.out.println("try lock");
            try {
               lock.readLock().lock();
            } catch (Throwable err) {
               err.printStackTrace();
            } finally {
               System.out.println("EXIT!");
            }
         });
         t.start();
         started.await();
         t.interrupt();
         tryLock.set(true);
         System.out.println("INTERRUPT!");
      }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841677475


   I've run some benchmarks vs this version getting these results on the replica broker:
   
   `main` total broker CPU usage = ~1000 samples. 
   
   430 samples on (5) I/O threads + 570 samples on replication endpoint thread
   
   155 samples/ 570 samples on replication endpoint thread are spent to awake the I/O threads.
    
   `this PR` total broker CPU usage = ~664 samples
   
   171 samples on a single I/O thread + 492 samples on replication endpoint thread
   
   91 samples/492 samples on replication endpoint thread are spent to awake the single I/O thread.
   
   TLDR: *The improvement in term of CPU usage is ~30% (!!!!)*
   
   Throughput is slightly improved for the single producer/single consumer case, but not dramatically.
   The improved CPU usage instead means having more room to handle I/O, but start to become interesting while increasing the scale of connected clients:
   
   `main` with 16 producers/consumers
   
   ```
   **************
   RUN 1	EndToEnd Throughput: 33265 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                582.69
   min                 108.03
   50.00%              462.85
   90.00%              610.30
   99.00%             4554.75
   99.90%             9699.33
   99.99%            15073.28
   max               23855.10
   count              1600000
   ```
   
   While `this PR`:
   ```
   **************
   RUN 1	EndToEnd Throughput: 37770 ops/sec
   **************
   EndToEnd SERVICE-TIME Latencies distribution in MICROSECONDS
   mean                471.10
   min                 116.22
   50.00%              411.65
   90.00%              544.77
   99.00%             2326.53
   99.90%             7274.50
   99.99%            12648.45
   max               28442.62
   count              1600000
   ```
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3572: ARTEMIS-3289 Reduce journal appender executor Thread wakeup cost

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3572:
URL: https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org