You are viewing a plain text version of this content. The canonical link for it is here.

Posted to gitbox@activemq.apache.org by GitBox <gi...@apache.org> on 2021/05/17 13:56:30 UTC

[GitHub] [activemq-artemis] franz1981 opened a new pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

franz1981 opened a new pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584


   https://issues.apache.org/jira/browse/ARTEMIS-3303


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440


   Im always cautious about such changes, after all one default for one org, is different to anothers use case in another org. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-843023532


   In short, I see no harm into reducing a bit the requirements for both thread pools, at least getting:
   
   - netty to use N cores
   - global thread pool to use N cores
   
   Leaving the OS to handle, via context switches, the fairness of task executions among both.
   I've decided (in this PR) to use half the core for both because of GC and background OS threads that can handle disk/networking tasks too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-843013239


   All fair points, and indeed I believe this should be a cautious and more conservative change but still, there are some historical motivations and experimental facts that can prove that what we set by default is no longer valid/usefull and that it was optimizing for context switches here, nor for throughout or latencies trade-offs.
   
   1. Re the Netty event loop sizing
   
   - historical facts: HornetQ and earlier versions of Artemis was blocking Netty threads, but that's no longer true. We can even choose to use `Blockhound` to enforce/check it on our CI, see https://github.com/netty/netty/pull/9687
   - experimental facts: generating a uniformly distributed load with clients >= cores using Core clients shown that the default configuration of Netty thread pool (3X number of cores) prevent scaling and is both hitting troughput and latencies. See https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187 for some more details about it.
   
   The motivation re the experimental facts seems related how Netty event loop group works: 
   - Netty assign client connections in round-robin fashion to the configured Netty threads
   - each client connection can issue write/read events on the event loop (single) selector to wakeup for any work to do
   - if the number of Netty threads exceed the number of cores and the number of clients is <= Netty threads, each time such notification happen they have some chance (2/3 possibilities) the thread that's going to handle it won't be on cpu (because they exceed the amount of cores) and the OS is forced to deschedule some (random) thread in order to run the Netty thread responsible to handle the interrupt, causing un-necessary context-switches.
   
   The netty default is of 2X the amount of cores for applications that heavily relies just on event loop processing, but Artemis it's not: even AMQP use I/O threads and need GC, compiler threads and sometime global threads to perform its job. Just using 3X is a waste of resources for the current Artemis version.
   
   2. Re the global thread pool sizing
   
   That's a bit more complex and depends by how `ActiveMQThreadPoolExecutor` works.
   Just writing a simple program can help to spot what's the problem with it (very similar to the Netty one, but not the same).
   ```java
      public static void main(String[] args) throws InterruptedException {
         ThreadPoolExecutor executor = new ActiveMQThreadPoolExecutor(0, 30, 60L, TimeUnit.SECONDS, new ThreadFactory() {
            @Override
            public Thread newThread(Runnable r) {
               Thread t = new Thread(r);
               System.err.println("created new thread: " + t);
               return t;
            }
         });
         ExecutorFactory factory = new OrderedExecutorFactory(executor);
         final int clients = 30;
         int bursts = 100;
         ConcurrentHashSet[] executingThreads = new ConcurrentHashSet[clients];
         ArtemisExecutor[] artemisExecutor = new ArtemisExecutor[clients];
         for (int i = 0; i< clients; i++) {
            artemisExecutor[i] = factory.getExecutor();
            executingThreads[i] = new ConcurrentHashSet();
         }
         ConcurrentMap<Thread, AtomicLong> executingT = new ConcurrentHashMap<>();
         for (int j = 0; j< bursts;j++) {
            for (int i = 0; i < clients; i++) {
               ConcurrentHashSet threadsSeen =executingThreads[i];
               artemisExecutor[i].execute(() -> {
                  try {
                     TimeUnit.MILLISECONDS.sleep(1);
                  } catch (InterruptedException e) {
                     e.printStackTrace();
                  }
                  threadsSeen.add(Thread.currentThread());
                  AtomicLong counter = executingT.get(Thread.currentThread());
                  if (counter == null) {
                     executingT.put(Thread.currentThread(), new AtomicLong(1));
                  } else {
                     counter.lazySet(counter.get() + 1);
                  }
               });
            }
            System.out.println("GC pause");
            Thread.sleep(100);
         }
         for (int i = 0; i< clients; i++) {
            artemisExecutor[i].flush(60, TimeUnit.SECONDS);
         }
         executor.shutdown();
         executor.awaitTermination(70, TimeUnit.SECONDS);
         System.out.println("Executing threads: " + executingT);
         System.out.println("Workload distribution per artemis executor:");
         for (int i = 0; i < clients; i++) {
            System.out.println("[" + (i + 1) + "] - " + executingThreads[i].size());
         }
      }
   ```
   On my machine (12 cores with HT - 6 real cores) it prints 30 times 
   ```created new thread: ...```
   and 
   ```
   Executing threads: 
   {Thread[Thread-1,5,]=103, 
   Thread[Thread-20,5,]=99, 
   Thread[Thread-17,5,]=99, 
   Thread[Thread-11,5,]=101, 
   Thread[Thread-18,5,]=99, 
   Thread[Thread-14,5,]=100, 
   Thread[Thread-13,5,]=100, 
   Thread[Thread-21,5,]=99, 
   Thread[Thread-24,5,]=98, 
   Thread[Thread-28,5,]=98, 
   Thread[Thread-5,5,]=103, 
   Thread[Thread-30,5,]=97, 
   Thread[Thread-27,5,]=97, 
   Thread[Thread-6,5,]=103, 
   Thread[Thread-4,5,]=102, 
   Thread[Thread-23,5,]=98, 
   Thread[Thread-25,5,]=98, 
   Thread[Thread-8,5,]=102, 
   Thread[Thread-7,5,]=102,
   Thread[Thread-3,5,]=103,
   Thread[Thread-9,5,]=101, 
   Thread[Thread-10,5,]=102, 
   Thread[Thread-19,5,]=99, 
   Thread[Thread-12,5,]=101, 
   Thread[Thread-15,5,]=100, 
   Thread[Thread-26,5,]=97, 
   Thread[Thread-29,5,]=97, 
   Thread[Thread-2,5,]=103,
   Thread[Thread-16,5,]=100, 
   Thread[Thread-22,5,]=99}
   Workload distribution per artemis executor:
   [1] - 17
   [2] - 18
   [3] - 17
   [4] - 15
   [5] - 13
   [6] - 13
   [7] - 17
   [8] - 17
   [9] - 18
   [10] - 14
   [11] - 13
   [12] - 17
   [13] - 15
   [14] - 14
   [15] - 17
   [16] - 18
   [17] - 16
   [18] - 12
   [19] - 17
   [20] - 16
   [21] - 16
   [22] - 14
   [23] - 17
   [24] - 17
   [25] - 17
   [26] - 21
   [27] - 20
   [28] - 19
   [29] - 22
   [30] - 18
   ```
   It gives some important info to understand how this thread pool works.
   with burst of small enough tasks (but not super small - ~1 ms), issued by several core clients (30 for this test) with some pauses (100 ms is the g1gc default pause target): 
   
   - the load is spread among all threads ie each thread is getting ~100 tasks each
   - each executor (client) is getting it's tasks executed by different threads (12->22 on 30 available)
   - the number of created threads depends how busy existing ones are
   
   In short, if the global thread executor is going to perform mostly non-blocking operations (NOTE: the I/O executor is responsible for I/O blocking ops), with enough clients (clients > available cores) we're going to use the whole number of threads configured on the pool. 
   But if the max pool size exceed the available cores we will end up, similartly to the Netty case, to deschedule some at random, just to wake-up the next one in charge to handle a specific task.
   
   There are few assumptions to be verified (what if `ArtemisExecutor` kept busy for too much time a specific Thread, global thread pool tasks cannot block? etc etc) and more tests to be performed, but this shouldn't stop from searching for better adapative (based on the machine spec) default IMO.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-843036436


   I do indeed trust 100% your intent and belief that this is a great improvement, im just saying we need some stats and testing to back it all up, in theory yes perfect, but as we've learnt before the best of intentions caused issues for others.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440


   Im always cautious about such changes, after all one default for one org, is different to another's use case in another org.  And those using defaults and tested with those, will get sudden un-expected shocks should they change. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440


   Im always cautious about such changes, after all one default for one org, is different to another's use case in another org.  And those using defaults and tested with those, will get sudden un-expected shocks should they change. e.g.  you normally trade throughput for latency, or x for y ... its always a trade.... Unless there stats showing performance improvement across the board for all the main different types of setups our users have. e.g. high throughput users, low throughput users, low latency users, mqtt users, amqp users, core users, bare metal users vs virtualised users, users who care about the 50th percentile, users who care about the fat tails 99.99ths percentiles and max.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440


   Im always cautious about such changes, after all one default for one org, is different to another's use case in another org.  And those using defaults and tested with those, will get sudden un-expected shocks should they change. e.g.  you normally trade throughput for latency, or x for y ... its always a trade.... Unless there stats showing performance improvement across the board for all the main different types of setups our users have. e.g. high throughput users, low throughput users, low latency users, mqtt users, amqp users, core users, bare metal users vs virtualised users, users who care about the 50th percentile, users who care about the fat tails 99.99ths percentiles and max.
   
   Id much more favour ability to document better tuning, and tools that enable to auto tune / give defaults to the broker better, for different known certain use cases / setups, than can be setup during broker creation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] clebertsuconic commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

clebertsuconic commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-846301161


   can't you change ./artemis create to take the correct counting, and apply the new defaults on new created servers?
   
   users moving older tested configurations wouldn't be affected by anything... new brokers would then use the "new defaults" and it would be up to the user to decide on what to do...
   
   we can always log.info (" We recommend new values now.. please change your settings");


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842346795


   Need some discussion about:
   
   - min pool size
   - default pool size
   
   What happen when different acceptors are being used?
   This behaviour is still a bit static.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842346795


   Need some discussion about:
   
   - min pool size
   - default pool size
   
   What happen when different acceptors are being used?
   This behaviour is still a bit static.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440


   Im always cautious about such changes, after all one default for one org, is different to another's use case in another org.  And those using defaults and tested with those, will get sudden un-expected shocks should they change. e.g.  you normally trade throughput for latency, or x for y ... its always a trade.... Unless there stats showing performance improvement for all the main different types of setups our users have. e.g. high throughput users, low throughput users, low latency users, mqtt users, amqp users, core users, bare metal users vs virtualised users... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440

Im always cautious about such changes, as you know we've been hit a few times by such sort of changes, which impacted users/we reverted in later releases.

After all one default for one org, is different to another's use case in another org. And those using defaults and tested with those, will get sudden un-expected shocks should they change. e.g. you normally trade throughput for latency, or x for y ... its always a trade.... Unless there stats showing performance improvement across the board for all the main different types of setups our users have. e.g. high throughput users, low throughput users, low latency users, mqtt users, amqp users, core users, bare metal users vs virtualised users, users who care about the 50th percentile, users who care about the fat tails 99.99ths percentiles and max, high client use cases, low client use cases.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-843013239


   All fair points, and indeed I believe this should be a cautious and more conservative change but still, there are some historical motivations and experimental facts that can prove that what we set by default is no longer valid/usefull and that it was optimizing for context switches here, nor for throughout or latencies trafe-offs: there are some historical motivations and experimental facts that can prove this.
   
   1. Re the Netty event loop sizing
   
   - historical facts: HornetQ and earlier versions of Artemis was blocking Netty threads, but that's no longer true. We can even choose to use `Blockhound` to enforce/check it on our CI, see https://github.com/netty/netty/pull/9687
   - experimental facts: generating a uniformly distributed load with clients >= cores using Core clients shown that the default configuration of Netty thread pool (3X number of cores) prevent scaling and is both hitting troughput and latencies. See https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187 for some more details about it.
   
   The motivation re the experimental facts seems related how Netty event loop group works: 
   - Netty assign client connections in round-robin fashion to the configured Netty threads
   - each client connection can issue write/read events on the event loop (single) selector to wakeup for any work to do
   - if the number of Netty threads exceed the number of cores and the number of clients is <= Netty threads, each time such notification happen they have some chance (2/3 possibilities) the thread that's going to handle it won't be on cpu (because they exceed the amount of cores) and the OS is forced to deschedule some (random) thread in order to run the Netty thread responsible to handle the interrupt, causing un-necessary context-switches.
   
   The netty default is of 2X the amount of cores for applications that heavily relies just on event loop processing, but Artemis it's not: even AMQP use I/O threads and need GC, compiler threads and sometime global threads to perform its job. Just using 3X is a waste of resources for the current Artemis version.
   
   2. Re the global thread pool sizing
   
   That's a bit more complex and depends by how `ActiveMQThreadPoolExecutor` works.
   Just writing a simple program can help to spot what's the problem with it (very similar to the Netty one, but not the same).
   ```java
      public static void main(String[] args) throws InterruptedException {
         ThreadPoolExecutor executor = new ActiveMQThreadPoolExecutor(0, 30, 60L, TimeUnit.SECONDS, new ThreadFactory() {
            @Override
            public Thread newThread(Runnable r) {
               Thread t = new Thread(r);
               System.err.println("created new thread: " + t);
               return t;
            }
         });
         ExecutorFactory factory = new OrderedExecutorFactory(executor);
         final int clients = 30;
         int bursts = 100;
         ConcurrentHashSet[] executingThreads = new ConcurrentHashSet[clients];
         ArtemisExecutor[] artemisExecutor = new ArtemisExecutor[clients];
         for (int i = 0; i< clients; i++) {
            artemisExecutor[i] = factory.getExecutor();
            executingThreads[i] = new ConcurrentHashSet();
         }
         ConcurrentMap<Thread, AtomicLong> executingT = new ConcurrentHashMap<>();
         for (int j = 0; j< bursts;j++) {
            for (int i = 0; i < clients; i++) {
               ConcurrentHashSet threadsSeen =executingThreads[i];
               artemisExecutor[i].execute(() -> {
                  try {
                     TimeUnit.MILLISECONDS.sleep(1);
                  } catch (InterruptedException e) {
                     e.printStackTrace();
                  }
                  threadsSeen.add(Thread.currentThread());
                  AtomicLong counter = executingT.get(Thread.currentThread());
                  if (counter == null) {
                     executingT.put(Thread.currentThread(), new AtomicLong(1));
                  } else {
                     counter.lazySet(counter.get() + 1);
                  }
               });
            }
            System.out.println("GC pause");
            Thread.sleep(100);
         }
         for (int i = 0; i< clients; i++) {
            artemisExecutor[i].flush(60, TimeUnit.SECONDS);
         }
         executor.shutdown();
         executor.awaitTermination(70, TimeUnit.SECONDS);
         System.out.println("Executing threads: " + executingT);
         System.out.println("Workload distribution per artemis executor:");
         for (int i = 0; i < clients; i++) {
            System.out.println("[" + (i + 1) + "] - " + executingThreads[i].size());
         }
      }
   ```
   On my machine (12 cores with HT - 6 real cores) it prints 30 times 
   ```created new thread: ...```
   and 
   ```
   Executing threads: 
   {Thread[Thread-1,5,]=103, 
   Thread[Thread-20,5,]=99, 
   Thread[Thread-17,5,]=99, 
   Thread[Thread-11,5,]=101, 
   Thread[Thread-18,5,]=99, 
   Thread[Thread-14,5,]=100, 
   Thread[Thread-13,5,]=100, 
   Thread[Thread-21,5,]=99, 
   Thread[Thread-24,5,]=98, 
   Thread[Thread-28,5,]=98, 
   Thread[Thread-5,5,]=103, 
   Thread[Thread-30,5,]=97, 
   Thread[Thread-27,5,]=97, 
   Thread[Thread-6,5,]=103, 
   Thread[Thread-4,5,]=102, 
   Thread[Thread-23,5,]=98, 
   Thread[Thread-25,5,]=98, 
   Thread[Thread-8,5,]=102, 
   Thread[Thread-7,5,]=102,
   Thread[Thread-3,5,]=103,
   Thread[Thread-9,5,]=101, 
   Thread[Thread-10,5,]=102, 
   Thread[Thread-19,5,]=99, 
   Thread[Thread-12,5,]=101, 
   Thread[Thread-15,5,]=100, 
   Thread[Thread-26,5,]=97, 
   Thread[Thread-29,5,]=97, 
   Thread[Thread-2,5,]=103,
   Thread[Thread-16,5,]=100, 
   Thread[Thread-22,5,]=99}
   Workload distribution per artemis executor:
   [1] - 17
   [2] - 18
   [3] - 17
   [4] - 15
   [5] - 13
   [6] - 13
   [7] - 17
   [8] - 17
   [9] - 18
   [10] - 14
   [11] - 13
   [12] - 17
   [13] - 15
   [14] - 14
   [15] - 17
   [16] - 18
   [17] - 16
   [18] - 12
   [19] - 17
   [20] - 16
   [21] - 16
   [22] - 14
   [23] - 17
   [24] - 17
   [25] - 17
   [26] - 21
   [27] - 20
   [28] - 19
   [29] - 22
   [30] - 18
   ```
   It gives some important info to understand how this thread pool works.
   with small enough burst of tasks (but not that small, ~1 ms), issued by several core clients (30 for this test) with some pauses (100 ms is the g1gc default pause target): 
   
   - the load is spread among all threads ie each thread is getting ~100 tasks each
   - each executor (client) is getting it's tasks executed by different threads (12->22 on 30 available)
   - the number of created threads depends how busy existing ones are
   
   In short, if the global thread executor is going to perform mostly non-blocking operations (NOTE: the I/O executor is responsible for I/O blocking ops), with enough clients (clients > available cores) we're going to use the whole number of threads configured on the pool. 
   But if the max pool size exceed the available cores we will end up, similartly to the Netty case, to deschedule some at random, just to wake-up the next one in charge to handle a specific task.
   
   There are few assumptions to be verified (what if `ArtemisExecutor` kept busy for too much time a specific Thread, global thread pool tasks cannot block? etc etc) and more tests to be performed, but this shouldn't stop from searching for better adapative (based on the machine spec) default IMO.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-843013239


   All fair points, and indeed I believe this should be a cautious and more conservative change but still, there are some historical motivations and experimental facts that can prove that what we set by default is no longer valid/usefull and that it was optimizing for context switches here, nor for throughout or latencies trade-offs.
   
   1. Re the Netty event loop sizing
   
   - historical facts: HornetQ and earlier versions of Artemis was blocking Netty threads, but that's no longer true. We can even choose to use `Blockhound` to enforce/check it on our CI, see https://github.com/netty/netty/pull/9687
   - experimental facts: generating a uniformly distributed load with clients >= cores using Core clients shown that the default configuration of Netty thread pool (3X number of cores) prevent scaling and is both hitting troughput and latencies. See https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187 for some more details about it.
   
   The motivation re the experimental facts seems related how Netty event loop group works: 
   - Netty assign client connections in round-robin fashion to the configured Netty threads
   - each client connection can issue write/read events on the event loop (single) selector to wakeup for any work to do
   - if the number of Netty threads exceed the number of cores and the number of clients is <= Netty threads, each time such notification happen they have some chance (2/3 possibilities) the thread that's going to handle it won't be on cpu (because they exceed the amount of cores) and the OS is forced to deschedule some (random) thread in order to run the Netty thread responsible to handle the interrupt, causing un-necessary context-switches.
   
   The netty default is of 2X the amount of cores for applications that heavily relies just on event loop processing, but Artemis it's not: even AMQP use I/O threads and need GC, compiler threads and sometime global threads to perform its job. Just using 3X is a waste of resources for the current Artemis version.
   
   2. Re the global thread pool sizing
   
   That's a bit more complex and depends by how `ActiveMQThreadPoolExecutor` works.
   Just writing a simple program can help to spot what's the problem with it (very similar to the Netty one, but not the same).
   ```java
      public static void main(String[] args) throws InterruptedException {
         ThreadPoolExecutor executor = new ActiveMQThreadPoolExecutor(0, 30, 60L, TimeUnit.SECONDS, new ThreadFactory() {
            @Override
            public Thread newThread(Runnable r) {
               Thread t = new Thread(r);
               System.err.println("created new thread: " + t);
               return t;
            }
         });
         ExecutorFactory factory = new OrderedExecutorFactory(executor);
         final int clients = 30;
         int bursts = 100;
         ConcurrentHashSet[] executingThreads = new ConcurrentHashSet[clients];
         ArtemisExecutor[] artemisExecutor = new ArtemisExecutor[clients];
         for (int i = 0; i< clients; i++) {
            artemisExecutor[i] = factory.getExecutor();
            executingThreads[i] = new ConcurrentHashSet();
         }
         ConcurrentMap<Thread, AtomicLong> executingT = new ConcurrentHashMap<>();
         for (int j = 0; j< bursts;j++) {
            for (int i = 0; i < clients; i++) {
               ConcurrentHashSet threadsSeen =executingThreads[i];
               artemisExecutor[i].execute(() -> {
                  try {
                     TimeUnit.MILLISECONDS.sleep(1);
                  } catch (InterruptedException e) {
                     e.printStackTrace();
                  }
                  threadsSeen.add(Thread.currentThread());
                  AtomicLong counter = executingT.get(Thread.currentThread());
                  if (counter == null) {
                     executingT.put(Thread.currentThread(), new AtomicLong(1));
                  } else {
                     counter.lazySet(counter.get() + 1);
                  }
               });
            }
            System.out.println("GC pause");
            Thread.sleep(100);
         }
         for (int i = 0; i< clients; i++) {
            artemisExecutor[i].flush(60, TimeUnit.SECONDS);
         }
         executor.shutdown();
         executor.awaitTermination(70, TimeUnit.SECONDS);
         System.out.println("Executing threads: " + executingT);
         System.out.println("Workload distribution per artemis executor:");
         for (int i = 0; i < clients; i++) {
            System.out.println("[" + (i + 1) + "] - " + executingThreads[i].size());
         }
      }
   ```
   On my machine (12 cores with HT - 6 real cores) it prints 30 times 
   ```created new thread: ...```
   and 
   ```
   Executing threads: 
   {Thread[Thread-1,5,]=103, 
   Thread[Thread-20,5,]=99, 
   Thread[Thread-17,5,]=99, 
   Thread[Thread-11,5,]=101, 
   Thread[Thread-18,5,]=99, 
   Thread[Thread-14,5,]=100, 
   Thread[Thread-13,5,]=100, 
   Thread[Thread-21,5,]=99, 
   Thread[Thread-24,5,]=98, 
   Thread[Thread-28,5,]=98, 
   Thread[Thread-5,5,]=103, 
   Thread[Thread-30,5,]=97, 
   Thread[Thread-27,5,]=97, 
   Thread[Thread-6,5,]=103, 
   Thread[Thread-4,5,]=102, 
   Thread[Thread-23,5,]=98, 
   Thread[Thread-25,5,]=98, 
   Thread[Thread-8,5,]=102, 
   Thread[Thread-7,5,]=102,
   Thread[Thread-3,5,]=103,
   Thread[Thread-9,5,]=101, 
   Thread[Thread-10,5,]=102, 
   Thread[Thread-19,5,]=99, 
   Thread[Thread-12,5,]=101, 
   Thread[Thread-15,5,]=100, 
   Thread[Thread-26,5,]=97, 
   Thread[Thread-29,5,]=97, 
   Thread[Thread-2,5,]=103,
   Thread[Thread-16,5,]=100, 
   Thread[Thread-22,5,]=99}
   Workload distribution per artemis executor:
   [1] - 17
   [2] - 18
   [3] - 17
   [4] - 15
   [5] - 13
   [6] - 13
   [7] - 17
   [8] - 17
   [9] - 18
   [10] - 14
   [11] - 13
   [12] - 17
   [13] - 15
   [14] - 14
   [15] - 17
   [16] - 18
   [17] - 16
   [18] - 12
   [19] - 17
   [20] - 16
   [21] - 16
   [22] - 14
   [23] - 17
   [24] - 17
   [25] - 17
   [26] - 21
   [27] - 20
   [28] - 19
   [29] - 22
   [30] - 18
   ```
   It gives some important info to understand how this thread pool works.
   with burst of small enough tasks (but not super small - ~1 ms), issued by several core clients (30 for this test) with some pauses (100 ms is the g1gc default pause target): 
   
   - the load is spread among all threads ie each thread is getting ~100 tasks each
   - each executor (client) is getting it's tasks executed by different threads (12->22 on 30 available)
   - the number of created threads depends how busy existing ones are
   
   In short, if the global thread executor is going to perform mostly non-blocking operations (NOTE: the I/O executor is responsible for I/O blocking ops), with enough clients (clients > available cores) we're going to use the whole number of threads configured on the pool. 
   This is ok, given that's what we're expecting by setting 30 as max thread pool size.
   But if the thread max pool size exceed the available cores we will end up, similartly to the Netty case, to deschedule some at random, just to wake-up the next one in charge to handle a specific task.
   In addition to this problem, there's another one related to the `Workload distribution`: having each client tasks to be handled by different thread is ok, but can cause many cache misses because each new thread handling its workload doesn't know about the task executing context. Reusing the same thread again (in a more "sticky" way) ensure CPU bounds computations to go faster, as the thread-per-core application often advocate about. 
   
   There are few assumptions to be verified (what if `ArtemisExecutor` kept busy for too much time a specific Thread, global thread pool tasks cannot block? etc etc) and more tests to be performed, but this shouldn't stop from searching for better adaptive (based on the machine spec) default IMO.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440


   Im always cautious about such changes, after all one default for one org, is different to another's use case in another org.  And those using defaults and tested with those, will get sudden un-expected shocks should they change. e.g.  you normally trade throughput for latency, or x for y ... its always a trade.... Unless there stats showing performance improvement across the board for all the main different types of setups our users have. e.g. high throughput users, low throughput users, low latency users, mqtt users, amqp users, core users, bare metal users vs virtualised users... 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

franz1981 commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-914046144


   I'm keen to re-implement it as suggested by @clebertsuconic , but after some extensive performance testing to be sure is just the right choice for most users. Closing this, to be reopened in the future


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 closed pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

franz1981 closed pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440

Im always cautious about such changes, as you know we've been hit a few times by such sort of changes, which impacted users/we reverted in later releases.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440

Im always cautious about such changes, as you know we've been hit a few times by such sort of changes, which impacted users/we reverted in later releases.

Id much more favour ability to document better tuning, and tools that enable to auto tune / give defaults to the broker better, for different known certain use cases / setups, than can be setup during broker creation. Akin to the tool we have for journal, where it perf tests the disk and then sets a suitable values based off the observed kits. tbh i love the journal tuning tool is the best tool we have i actually use it sometimes just to test hardware kit for other non activemq projects/systems!

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440


   Im always cautious about such changes, as you know we've been hit a few times by such sort of changes, which impacted users/we reverted in later releases.
   
   After all one default for one org, is different to another's use case in another org.  And those using defaults and tested with those, will get sudden un-expected shocks should they change. e.g.  you normally trade throughput for latency, or x for y ... its always a trade.... Unless there stats showing performance improvement across the board for all the main different types of setups our users have. e.g. high throughput users, low throughput users, low latency users, mqtt users, amqp users, core users, bare metal users vs virtualised users, users who care about the 50th percentile, users who care about the fat tails 99.99ths percentiles and max.
   
   Id much more favour ability to document better tuning, and tools that enable to auto tune / give defaults to the broker better, for different known certain use cases / setups, than can be setup during broker creation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-846319302


   This sounds like a good suggestion to me, it avoids existing users getting any surprises of defaults changing beneath their feet, but for new servers, they get the new sparkly settings.
   
   Like wise, I like the idea that if someone is using default, e.g. not explicitly setting, then an info or warn could alert a user to this tuning, so they can at their own will, update their settings to the new sparkly settings.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelandrepearce commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelandrepearce commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-913536656


   @franz1981 whats occuring with this, i think an agree-able way forward was suggested by @clebertsuconic , but since didnt see any further updates, wanting to start clearing down old / stagnant PR's a bit of spring cleaning so to say - so we focus on stuff thats actively working on / relevant .
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@activemq.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] gtully commented on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

gtully commented on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-843043516


   I too would take the conservative approach on this change. I think it would make a great documentation update, and we can recommend to explicitly set values to balance io and workers across cores. It makes good sense.
   Following happy user feedback, we can work to change the defaults in 2.19 or 3, ie: some point in the future.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] michaelpearce-gain edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

michaelpearce-gain edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-842715440


   Im always cautious about such changes, after all one default for one org, is different to another's use case in another org.  And those using defaults and tested with those, will get sudden un-expected shocks should they change. e.g.  you normally trade throughput for latency, or x for y ... its always a trade....


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [activemq-artemis] franz1981 edited a comment on pull request #3584: ARTEMIS-3303 Default thread pool size is too generous

Posted by GitBox <gi...@apache.org>.

franz1981 edited a comment on pull request #3584:
URL: https://github.com/apache/activemq-artemis/pull/3584#issuecomment-843013239


   All fair points, and indeed I believe this should be a cautious and more conservative change but still, there are some historical motivations and experimental facts that can prove that what we set by default is no longer valid/usefull and that it was optimizing for context switches here, nor for throughout or latencies trade-offs.
   
   1. Re the Netty event loop sizing
   
   - historical facts: HornetQ and earlier versions of Artemis was blocking Netty threads, but that's no longer true. We can even choose to use `Blockhound` to enforce/check it on our CI, see https://github.com/netty/netty/pull/9687
   - experimental facts: generating a uniformly distributed load with clients >= cores using Core clients shown that the default configuration of Netty thread pool (3X number of cores) prevent scaling and is both hitting troughput and latencies. See https://github.com/apache/activemq-artemis/pull/3572#issuecomment-841788187 for some more details about it.
   
   The motivation re the experimental facts seems related how Netty event loop group works: 
   - Netty assign client connections in round-robin fashion to the configured Netty threads
   - each client connection can issue write/read events on the event loop (single) selector to wakeup for any work to do
   - if the number of Netty threads exceed the number of cores and the number of clients is <= Netty threads, each time such notification happen they have some chance (2/3 possibilities) the thread that's going to handle it won't be on cpu (because they exceed the amount of cores) and the OS is forced to deschedule some (random) thread in order to run the Netty thread responsible to handle the interrupt, causing un-necessary context-switches.
   
   The netty default is of 2X the amount of cores for applications that heavily relies just on event loop processing, but Artemis it's not: even AMQP use I/O threads and need GC, compiler threads and sometime global threads to perform its job. Just using 3X is a waste of resources for the current Artemis version.
   
   2. Re the global thread pool sizing
   
   That's a bit more complex and depends by how `ActiveMQThreadPoolExecutor` works.
   Just writing a simple program can help to spot what's the problem with it (very similar to the Netty one, but not the same).
   ```java
      public static void main(String[] args) throws InterruptedException {
         ThreadPoolExecutor executor = new ActiveMQThreadPoolExecutor(0, 30, 60L, TimeUnit.SECONDS, new ThreadFactory() {
            @Override
            public Thread newThread(Runnable r) {
               Thread t = new Thread(r);
               System.err.println("created new thread: " + t);
               return t;
            }
         });
         ExecutorFactory factory = new OrderedExecutorFactory(executor);
         final int clients = 30;
         int bursts = 100;
         ConcurrentHashSet[] executingThreads = new ConcurrentHashSet[clients];
         ArtemisExecutor[] artemisExecutor = new ArtemisExecutor[clients];
         for (int i = 0; i< clients; i++) {
            artemisExecutor[i] = factory.getExecutor();
            executingThreads[i] = new ConcurrentHashSet();
         }
         ConcurrentMap<Thread, AtomicLong> executingT = new ConcurrentHashMap<>();
         for (int j = 0; j< bursts;j++) {
            for (int i = 0; i < clients; i++) {
               ConcurrentHashSet threadsSeen =executingThreads[i];
               artemisExecutor[i].execute(() -> {
                  try {
                     TimeUnit.MILLISECONDS.sleep(1);
                  } catch (InterruptedException e) {
                     e.printStackTrace();
                  }
                  threadsSeen.add(Thread.currentThread());
                  AtomicLong counter = executingT.get(Thread.currentThread());
                  if (counter == null) {
                     executingT.put(Thread.currentThread(), new AtomicLong(1));
                  } else {
                     counter.lazySet(counter.get() + 1);
                  }
               });
            }
            System.out.println("GC pause");
            Thread.sleep(100);
         }
         for (int i = 0; i< clients; i++) {
            artemisExecutor[i].flush(60, TimeUnit.SECONDS);
         }
         executor.shutdown();
         executor.awaitTermination(70, TimeUnit.SECONDS);
         System.out.println("Executing threads: " + executingT);
         System.out.println("Workload distribution per artemis executor:");
         for (int i = 0; i < clients; i++) {
            System.out.println("[" + (i + 1) + "] - " + executingThreads[i].size());
         }
      }
   ```
   On my machine (12 cores with HT - 6 real cores) it prints 30 times 
   ```created new thread: ...```
   and 
   ```
   Executing threads: 
   {Thread[Thread-1,5,]=103, 
   Thread[Thread-20,5,]=99, 
   Thread[Thread-17,5,]=99, 
   Thread[Thread-11,5,]=101, 
   Thread[Thread-18,5,]=99, 
   Thread[Thread-14,5,]=100, 
   Thread[Thread-13,5,]=100, 
   Thread[Thread-21,5,]=99, 
   Thread[Thread-24,5,]=98, 
   Thread[Thread-28,5,]=98, 
   Thread[Thread-5,5,]=103, 
   Thread[Thread-30,5,]=97, 
   Thread[Thread-27,5,]=97, 
   Thread[Thread-6,5,]=103, 
   Thread[Thread-4,5,]=102, 
   Thread[Thread-23,5,]=98, 
   Thread[Thread-25,5,]=98, 
   Thread[Thread-8,5,]=102, 
   Thread[Thread-7,5,]=102,
   Thread[Thread-3,5,]=103,
   Thread[Thread-9,5,]=101, 
   Thread[Thread-10,5,]=102, 
   Thread[Thread-19,5,]=99, 
   Thread[Thread-12,5,]=101, 
   Thread[Thread-15,5,]=100, 
   Thread[Thread-26,5,]=97, 
   Thread[Thread-29,5,]=97, 
   Thread[Thread-2,5,]=103,
   Thread[Thread-16,5,]=100, 
   Thread[Thread-22,5,]=99}
   Workload distribution per artemis executor:
   [1] - 17
   [2] - 18
   [3] - 17
   [4] - 15
   [5] - 13
   [6] - 13
   [7] - 17
   [8] - 17
   [9] - 18
   [10] - 14
   [11] - 13
   [12] - 17
   [13] - 15
   [14] - 14
   [15] - 17
   [16] - 18
   [17] - 16
   [18] - 12
   [19] - 17
   [20] - 16
   [21] - 16
   [22] - 14
   [23] - 17
   [24] - 17
   [25] - 17
   [26] - 21
   [27] - 20
   [28] - 19
   [29] - 22
   [30] - 18
   ```
   It gives some important info to understand how this thread pool works.
   with small enough burst of tasks (but not that small, ~1 ms), issued by several core clients (30 for this test) with some pauses (100 ms is the g1gc default pause target): 
   
   - the load is spread among all threads ie each thread is getting ~100 tasks each
   - each executor (client) is getting it's tasks executed by different threads (12->22 on 30 available)
   - the number of created threads depends how busy existing ones are
   
   In short, if the global thread executor is going to perform mostly non-blocking operations (NOTE: the I/O executor is responsible for I/O blocking ops), with enough clients (clients > available cores) we're going to use the whole number of threads configured on the pool. 
   But if the max pool size exceed the available cores we will end up, similartly to the Netty case, to deschedule some at random, just to wake-up the next one in charge to handle a specific task.
   
   There are few assumptions to be verified (what if `ArtemisExecutor` kept busy for too much time a specific Thread, global thread pool tasks cannot block? etc etc) and more tests to be performed, but this shouldn't stop from searching for better adapative (based on the machine spec) default IMO.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org