You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/02/16 13:03:22 UTC

[GitHub] [pulsar] lhotari opened a new pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

lhotari opened a new pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320


   ### Motivation
   
   Since Pulsar Admin API uses the blocking servlet API, it is possible that all Jetty threads are occupied and this causes unavailability on the Pulsar Admin API. The default value for the maximum number of threads for Jetty is too low in Pulsar. That is the root cause of many problems where Pulsar Admin API is unavailable when all threads are in use.
   
   ### Additional context
   
   Mailing list thread about "make async" changes: https://lists.apache.org/thread/tn7rt59cd1k724l4ytfcmzx1w2sbtw7l
   - Related issues/PRs #13666 #4756 #10619
   
   ### Modification 
   
   - Jetty defaults to 200 maximum threads, to prevent thread pool starvation. Make Pulsar use the same default value by setting `numHttpServerThreads=200`.
   - Update the documentation for `numHttpServerThreads`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] nodece edited a comment on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
nodece edited a comment on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042715597


   The unavailability of the Admin API is not caused by the HTTP server thread, the root cause is that the ZK callback thread is blocked. 
   
   When an admin API calls the ZK metadatastore API, it gets the ZK data by call the `CompletableFuture`, note that we did not use the executor to execute the `CompletableFuture#complete()` in [ZKMetadataStore.java#L171](https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java#L171). In ZK callback thread, once the caller converts async to sync calls then the ZK callback thread will be blocked, this code so like: `metadata.getAsync().get(30, TimeUnit.SECONDS)`.
   
   How to solve this problem?
   1. Use an executor to execute the callback that passes data to Pulsar in ZK callback
   2. Don't convert async to sync calls
   
   
   How to reproduce the ZK callback thread is blocked:
   ```
   docker run -d -p 2181:2181 --name test-zookeeper zookeeper
   ```
   ```java
   public class Main {
       private static final long CACHE_REFRESH_TIME_MILLIS = TimeUnit.MINUTES.toMillis(5);
   
       public static void printThread(String name) {
           System.out.println(name + " thread name -> " + Thread.currentThread().getName());
       }
   
       public static void main(String[] args) throws Exception {
           ZooKeeper zkc = new ZooKeeper("localhost:2181", 60_000, null);
   
           System.out.println("Check the zk connect");
           CountDownLatch zkLatch = new CountDownLatch(1);
           new Thread(() -> {
               while (true) {
                   if (zkc.getState().isConnected()) {
                       zkLatch.countDown();
                       break;
                   }
               }
           }).start();
           if (!zkLatch.await(5, TimeUnit.SECONDS)) {
               throw new Exception("zk connect failed");
           }
   
           AsyncLoadingCache<String, byte[]> objCache = Caffeine.newBuilder()
                   .refreshAfterWrite(CACHE_REFRESH_TIME_MILLIS, TimeUnit.MILLISECONDS)
                   .buildAsync((key, executor) -> {
                       CompletableFuture<byte[]> future = new CompletableFuture<>();
                       zkc.multi(Lists.newArrayList(Op.getData("/")), (rc, path, ctx, opResults) -> {
                           printThread("zk callback");
                           future.complete(null);
                       }, null);
                       return future;
                   });
   
           CountDownLatch countDownLatch = new CountDownLatch(1);
   
           // Reproduce the ZK callback is blocked
           System.out.println("async get start");
           objCache.get("/").whenComplete((unused, ignored) -> {
               printThread("async get done");
               try {
                   System.out.println("zk thread will blocked after sync get");
                   System.out.println("sync get start");
                   objCache.get("/1").get(5, TimeUnit.SECONDS);
                   // Unreachable
                   printThread("sync get done");
                   countDownLatch.countDown();
               } catch (Exception e) {
                   e.printStackTrace();
               } finally {
                   countDownLatch.countDown();
               }
           });
   
           countDownLatch.await();
       }
   }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] hangc0276 commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
hangc0276 commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041554631


   > These are threads. Jetty defaults to 200 maximum threads
   
   @lhotari  Do we need to take the number of availableProcessors into consideration for the maximum threads of the thread pool?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041581863


   > > These are threads. Jetty defaults to 200 maximum threads
   > 
   > @lhotari Do we need to take the number of availableProcessors into consideration for the maximum threads of the thread pool?
   
   No. Jetty is different from Netty in this aspect. In Netty, everything should be asynchronous and "thou shall never block". In Jetty, the maximum number of threads for the thread pool should be set to 50-500 threads and blocking operations are fine. 
   
   The recommendation for the thread pool is explained in Jetty documentation https://www.eclipse.org/jetty/documentation/jetty-9/index.html#_thread_pool
   > Thread Pool
   > Configure with goal of limiting memory usage maximum available. Typically this is >50 and <500
   
   In Jetty there's also a "acceptor" and "selector" thread count settings. These have been fixed to 1 in Pulsar.
   > Acceptors
   > The standard rule of thumb for the number of Accepters to configure is one per CPU on a given machine.
   
   @Shoothzj  The number of acceptors should take the system cpu cores into account. However that is a different setting and Pulsar now uses a fixed number there: 1. 
   
   http port:
   https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L88
   
   https port:
   https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L125
   
   acceptors and selectors should be both set to -1. Jetty would pick the recommended count based on cores in that case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042835728


   I'll continue the discussion in the PIP-142 discussion email.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041536494


   > @lhotari If this is jetty defaults. Can we just leave it blank?
   
   @Jason918 no. Pulsar overrides [the default](https://github.com/eclipse/jetty.project/blob/4a0c91c0be53805e3fcffdcdcc9587d5301863db/jetty-util/src/main/java/org/eclipse/jetty/util/thread/ExecutorThreadPool.java#L57) with the value set in numHttpServerThreads in the configuration.
   
   Pulsar code locations:
   
   https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L79-L81
   
   https://github.com/apache/pulsar/blob/adcbe0f118ece0999b8603f37010194b44c241b4/pulsar-broker-common/src/main/java/org/apache/pulsar/broker/web/WebExecutorThreadPool.java#L33-L36
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041918413


   > We shouldn't add some temporary PRs during the release phase unless they are really important.
   
   @BewareMyPower  This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid. The sync->async changes in the Pulsar Admin API are not needed when a proper value is used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041918413


   > We shouldn't add some temporary PRs during the release phase unless they are really important.
   
   @BewareMyPower  This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid. The sync->async changes are not needed when a proper value is used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042545245


   > > We shouldn't add some temporary PRs during the release phase unless they are really important.
   > 
   > @BewareMyPower This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid. The sync->async changes in the Pulsar Admin API are not needed when a proper value is used.
   
   I think I need to explain more for the **important** term I used. IMO, a PR that could block a release during the release phase must match following rules:
   1. It must be a bug fix.
   2. The bug was introduced from the current release, i.e. it's a regression.
   3. There is no workaround.
   
   It's only my opinion. I think our release phase missed something like this.
   
   Let's look back to this PR. First, I don't think a change to the default configuration value can be treated as a bug fix. It's more like an enhancement. Because the previous stable releases all should have the same problem. Then, we can see it's not a regression. Third, it's not something serious like Log4j2 Vulnerability (CVE-2021-44228). It just make some certain cases not work for Admin API and can be fixed by configuration tuning.
   
   In short, IMO, when a release started, we must be very careful and strict on the new PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041569902


   > The Pulsar community makes major decisions on the dev mailing list according to the Apache Way. The mailing list is the place to decide whether this change needs a PIP or not. Please participate in the existing dev mailing list discussion: https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl
   
   For changing the default configuration, need to approve through a PIP, here is some good examples https://github.com/apache/pulsar/issues?q=is%3Aissue+label%3APIP+assignee%3Amerlimat+is%3Aclosed
   
   And please use a separate thread to discuss, let's focus on the 2.10.0 release on the 2.10.0 release thread.
   It's not a breaking change introduced in 2.10.0, not a blocker in 2.10.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041533186


   > Change the default configuration need to start with a proposal
   
   This PR is a proposal. I have also made this proposal on the dev mailing list in the discussion. https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl . What else is needed?
   
   In this case, the previous default for numHttpServerThreads is simply too small and invalid when blocking servlet API is used.
   The value 200 doesn't mean that there will be 200 threads to start with. This is the maximum size for the thread pool. When the value is more than 8, Jetty will start with 8 initial threads and add more threads to the pool when all threads are occupied.
   
   There is no breaking change in increasing the default value to 200. It's just an improvement and fixes "the problem" where Admin API goes unresponsive when all threads are occupied.
   
   We might end up setting the default value to something lower than 200. A value like 50 or 100 might be fine. I just think that 200 is a good default since Jetty also uses that as the default value.
   
   The main overhead of a thread is the amount of memory that the stack of each thread consumes. It's 1MB by default. 200 threads will consume 200MB of RSS memory in the thread stacks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041590575


   > For changing the default configuration, need to approve through a PIP, here are some good examples https://github.com/apache/pulsar/issues?q=is%3Aissue+label%3APIP+assignee%3Amerlimat+is%3Aclosed
   
   Thanks, I'll create a PIP.
   
   > And please use a separate thread to discuss, let's focus on the 2.10.0 release on the 2.10.0 release thread. It's not a breaking change introduced in 2.10.0, not a blocker in 2.10.0
   
   I'll start a new thread.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041560954


   > This doesn't seem to solve the root cause of the issue.
   
   Please tell me what is "the issue" that you are referring to? 
   
   > 
   > > This PR is a proposal. I have also made this proposal on the dev mailing list in the discussion. https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl . What else is needed?
   > 
   > I think we need a PIP to get this change approved.
   
   The Pulsar community makes major decisions on the dev mailing list according to the Apache Way. The mailing list is the place to decide whether this change needs a PIP or not. Please participate in the existing dev mailing list discussion: https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower edited a comment on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
BewareMyPower edited a comment on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042545245


   > > We shouldn't add some temporary PRs during the release phase unless they are really important.
   > 
   > @BewareMyPower This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid. The sync->async changes in the Pulsar Admin API are not needed when a proper value is used.
   
   I think I need to explain more for the **important** term I used. IMO, a PR that could block a release during the release phase must match following rules:
   1. It must be a bug fix.
   2. The bug was introduced from the current release, i.e. it's a regression.
   3. There is no workaround.
   
   It's only my opinion. I think our release document for release manager missed something like this.
   
   Let's look back to this PR. First, I don't think a change to the default configuration value can be treated as a bug fix. It's more like an enhancement. Because the previous stable releases all should have the same problem. Then, we can see it's not a regression. Third, it's not something serious like Log4j2 Vulnerability (CVE-2021-44228). It just make some certain cases not work for Admin API and can be fixed by configuration tuning.
   
   In short, IMO, after a release started, we must be very careful and strict on the new PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041583010


   > @lhotari @eolivelli I suggest that we should consider the system cpu cores, it may be hurtful change for people who run pulsar in a low machine, like one cpu core.
   
   no it doesn't hurt performance. I explained this in my reply above.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] nodece commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
nodece commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1043130253


   > > When an admin API calls the ZK metadatastore API, it gets the ZK data by call the `CompletableFuture`, note that we did not use the executor to execute the `CompletableFuture#complete()` in [ZKMetadataStore.java#L171](https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java#L171). In ZK callback thread, once the caller converts async to sync calls then the ZK callback thread will be blocked, this code so like: `metadata.getAsync().get(30, TimeUnit.SECONDS)`.
   > 
   > The blocked thread in #13666 is a HTTP server thread.
   > 
   > ```
   > "pulsar-web-40-28" #238 prio=5 os_prio=0 tid=0x00007f5a4000d800 nid=0x2bcf waiting on condition [0x00007f5961d3b000]
   >    java.lang.Thread.State: WAITING (parking)
   > 	at sun.misc.Unsafe.park(Native Method)
   > 	- parking to wait for  <0x00000005c5529d40> (a java.util.concurrent.CompletableFuture$Signaller)
   > 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   > 	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
   > 	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
   > 	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
   > 	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
   > 	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalDeleteSubscriptionForNonPartitionedTopic(PersistentTopicsBase.java:1498)
   > ```
   > 
   > I'll clarify what I have been referring to as "sync -> async" changes: changes where the use of the blocking Servlet API is migrated to use Asynchronous Servlet API. I understand that it's necessary to not block in Zookeeper callbacks, but that is a different problem, which isn't related to Servlet API change.
   
   When the ZK callback thread is blocked in the WEB thread, another admin API request the ZK metadata store is not working, because the ZK callback is blocked.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042801722


   > Instead, you can say that a lot of PRs with[ "make async"](https://github.com/apache/pulsar/pulls?q=is%3Apr+make+async) have been pushed and merged recently.
   
   Yeah, I noticed these PRs recently as well. But are these PRs blockers for 2.10.0 release? IMO, **they should not be blockers as well.** I've thought they are intended to be included in Pulsar 2.11.0.
   
   > I haven't requested to block the release.
   
   Sorry, I might missed some context. I just saw this PR in the 2.10.0 release email list. This PR should focus on the fix itself, but the previous discussion might go far for the release issue.
   
   > [My valid questions were never answered by the contributors of the "make async" changes](https://github.com/apache/pulsar/issues/14013#issuecomment-1033528348).
   
   It's a pity to see the lack of communication. AFAIK, @Technoboy- is also preparing for a PIP to make admin APIs async. I think you should have a discussion about:
   1. Whether this PR solve the root problem?
   2. Based on this PR, is making admin APIs async meaningful?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
eolivelli commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1043118221


   This is not something that can block  release.
   The value is already configurable.
   I believe that there is no hurry in committing this change, and we can discuss about a new value or decide that the default should not be changed.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041527088


   > What's the reason of setting the default value to 200? If the node just have one core, what will happen? Please send email to dev mail list to discuss.
   
   These are threads. Jetty defaults to 200 maximum threads, to prevent thread pool starvation. This is recommended when using blocking Servlet API. The problem is that Pulsar uses the blocking servlet API and doesn't have a sufficient amount of threads which are needed and recommended.
   
   The value 200 doesn't mean that there will be 200 threads to start with. This is the maximum size for the thread pool. When the value is more than 8, Jetty will start with 8 initial threads and add more threads to the pool when all threads are occupied.
   
   I have already started an email discussion to discuss this topic. Please reply to https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl .
   
   There is useful background information in https://lists.apache.org/thread/hso8qwsv40ccrk116fj5ggdpt3b9d4g4 . I wrote that reply before I noticed Penghui's response. It contains a link to Jetty's documenation about asynchronous servlets: https://wiki.eclipse.org/Jetty/Feature/Continuations#Why_Asynchronous_Servlets_.3F .
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Shoothzj commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
Shoothzj commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041565611


   @lhotari @eolivelli I suggest that we should consider the system cpu cores, it may be hurtful change for people who run pulsar in a low machine, like one cpu core.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
merlimat commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041985838


   > > We shouldn't add some temporary PRs during the release phase unless they are really important.
   > 
   > @BewareMyPower This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid. 
   
   @lhotari This is not a new thing and it's something that can easily be changed by users. I don't think we need to rush into a release.
   
   > The sync->async changes in the Pulsar Admin API are not needed when a proper value is used. 
   
   The problem is not just keeping threads busy but the cases in which you have calls that spawn to multiple brokers. In this cases, the HTTP call can come back to same broker and we can have a deadlock, no matters how many threads we have in the pool. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041533186


   > Change the default configuration need to start with a proposal
   
   This PR is a proposal. I have also make this proposal on the dev mailing list in the discussion. https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl . What else is needed?
   
   In this case, the previous default for numHttpServerThreads is simply too small and invalid when blocking servlet API is used.
   The value 200 doesn't mean that there will be 200 threads to start with. This is the maximum size for the thread pool. When the value is more than 8, Jetty will start with 8 initial threads and add more threads to the pool when all threads are occupied.
   
   There is no breaking change in increasing the default value to 200. It's just an improvement and fixes "the problem" where Admin API goes unresponsive when all threads are occupied.
   
   We might end up setting the default value to something lower than 200. A value like 50 or 100 might be fine. I just think that 200 is a good default since Jetty also uses that as the default value.
   
   The main overhead of a thread is the amount of memory that the stack of each thread consumes. It's 1MB by default. 200 threads will consume 200MB of RSS memory in the thread stacks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui edited a comment on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
codelipenghui edited a comment on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041569902


   > The Pulsar community makes major decisions on the dev mailing list according to the Apache Way. The mailing list is the place to decide whether this change needs a PIP or not. Please participate in the existing dev mailing list discussion: https://lists.apache.org/thread/byg1g081o6mfj0xn8ntryvb5qplmrjyl
   
   For changing the default configuration, need to approve through a PIP, here are some good examples https://github.com/apache/pulsar/issues?q=is%3Aissue+label%3APIP+assignee%3Amerlimat+is%3Aclosed
   
   And please use a separate thread to discuss, let's focus on the 2.10.0 release on the 2.10.0 release thread.
   It's not a breaking change introduced in 2.10.0, not a blocker in 2.10.0


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] nodece edited a comment on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
nodece edited a comment on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1043130253


   > > When an admin API calls the ZK metadatastore API, it gets the ZK data by call the `CompletableFuture`, note that we did not use the executor to execute the `CompletableFuture#complete()` in [ZKMetadataStore.java#L171](https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java#L171). In ZK callback thread, once the caller converts async to sync calls then the ZK callback thread will be blocked, this code so like: `metadata.getAsync().get(30, TimeUnit.SECONDS)`.
   > 
   > The blocked thread in #13666 is a HTTP server thread.
   > 
   > ```
   > "pulsar-web-40-28" #238 prio=5 os_prio=0 tid=0x00007f5a4000d800 nid=0x2bcf waiting on condition [0x00007f5961d3b000]
   >    java.lang.Thread.State: WAITING (parking)
   > 	at sun.misc.Unsafe.park(Native Method)
   > 	- parking to wait for  <0x00000005c5529d40> (a java.util.concurrent.CompletableFuture$Signaller)
   > 	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   > 	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
   > 	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
   > 	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
   > 	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
   > 	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalDeleteSubscriptionForNonPartitionedTopic(PersistentTopicsBase.java:1498)
   > ```
   > 
   > I'll clarify what I have been referring to as "sync -> async" changes: changes where the use of the blocking Servlet API is migrated to use Asynchronous Servlet API. I understand that it's necessary to not block in Zookeeper callbacks, but that is a different problem, which isn't related to Servlet API change.
   
   When the ZK callback thread is blocked in the WEB thread, another admin API request the ZK metadata store is not working, so you see this thread stack.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] nodece commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
nodece commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042715597


   The unavailability of the Admin API is not caused by the HTTP server thread, the root cause is that the ZK callback thread is blocked. 
   
   When an admin API calls the ZK metadatastore API, we can return the ZK data to the caller by `CompletableFuture`, Note that we did not use the executor to execute the callback in [ZKMetadataStore.java#L171](https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java#L171). In ZK callback thread, once the caller converts async to sync calls, the code so like: `metadata.getAsync().get(30, TimeUnit.SECONDS)`, then the ZK callback thread will be blocked.
   
   How to solve this problem?
   1. Use an executor to execute the callback that passes data to Pulsar in ZK callback
   2. Don't convert async to sync calls
   
   
   How to reproduce the ZK callback thread is blocked:
   ```
   docker run -d -p 2181:2181 --name test-zookeeper zookeeper
   ```
   ```java
   public class Main {
       private static final long CACHE_REFRESH_TIME_MILLIS = TimeUnit.MINUTES.toMillis(5);
   
       public static void printThread(String name) {
           System.out.println(name + " thread name -> " + Thread.currentThread().getName());
       }
   
       public static void main(String[] args) throws Exception {
           ZooKeeper zkc = new ZooKeeper("localhost:2181", 60_000, null);
   
           System.out.println("Check the zk connect");
           CountDownLatch zkLatch = new CountDownLatch(1);
           new Thread(() -> {
               while (true) {
                   if (zkc.getState().isConnected()) {
                       zkLatch.countDown();
                       break;
                   }
               }
           }).start();
           if (!zkLatch.await(5, TimeUnit.SECONDS)) {
               throw new Exception("zk connect failed");
           }
   
           AsyncLoadingCache<String, byte[]> objCache = Caffeine.newBuilder()
                   .refreshAfterWrite(CACHE_REFRESH_TIME_MILLIS, TimeUnit.MILLISECONDS)
                   .buildAsync((key, executor) -> {
                       CompletableFuture<byte[]> future = new CompletableFuture<>();
                       zkc.multi(Lists.newArrayList(Op.getData("/")), (rc, path, ctx, opResults) -> {
                           printThread("zk callback");
                           future.complete(null);
                       }, null);
                       return future;
                   });
   
           CountDownLatch countDownLatch = new CountDownLatch(1);
   
           // Reproduce the ZK callback is blocked
           System.out.println("async get start");
           objCache.get("/").whenComplete((unused, ignored) -> {
               printThread("async get done");
               try {
                   System.out.println("zk thread will blocked after sync get");
                   System.out.println("sync get start");
                   objCache.get("/1").get(5, TimeUnit.SECONDS);
                   printThread("sync get done");
                   countDownLatch.countDown();
               } catch (Exception e) {
                   e.printStackTrace();
               } finally {
                   countDownLatch.countDown();
               }
           });
   
           countDownLatch.await();
       }
   }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] codelipenghui commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042480000


   > @codelipenghui https://github.com/apache/pulsar/issues/14329 has been published. Please follow up on the mailing list.
   
   Sure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] nodece edited a comment on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
nodece edited a comment on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042715597


   The unavailability of the Admin API is not caused by the HTTP server thread, the root cause is that the ZK callback thread is blocked. 
   
   When an admin API calls the ZK metadatastore API, it gets the ZK data by call the `CompletableFuture`, note that we did not use the executor to execute the `CompletableFuture#complete()` in [ZKMetadataStore.java#L171](https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java#L171). In ZK callback thread, once the caller converts async to sync calls then the ZK callback thread will be blocked, this code so like: `metadata.getAsync().get(30, TimeUnit.SECONDS)`.
   
   How to solve this problem?
   1. Use an executor to execute the callback that passes data to Pulsar in ZK callback
   2. Don't convert async to sync calls
   
   
   How to reproduce the ZK callback thread is blocked:
   ```
   docker run -d -p 2181:2181 --name test-zookeeper zookeeper
   ```
   ```java
   public class Main {
       private static final long CACHE_REFRESH_TIME_MILLIS = TimeUnit.MINUTES.toMillis(5);
   
       public static void printThread(String name) {
           System.out.println(name + " thread name -> " + Thread.currentThread().getName());
       }
   
       public static void main(String[] args) throws Exception {
           ZooKeeper zkc = new ZooKeeper("localhost:2181", 60_000, null);
   
           System.out.println("Check the zk connect");
           CountDownLatch zkLatch = new CountDownLatch(1);
           new Thread(() -> {
               while (true) {
                   if (zkc.getState().isConnected()) {
                       zkLatch.countDown();
                       break;
                   }
               }
           }).start();
           if (!zkLatch.await(5, TimeUnit.SECONDS)) {
               throw new Exception("zk connect failed");
           }
   
           AsyncLoadingCache<String, byte[]> objCache = Caffeine.newBuilder()
                   .refreshAfterWrite(CACHE_REFRESH_TIME_MILLIS, TimeUnit.MILLISECONDS)
                   .buildAsync((key, executor) -> {
                       CompletableFuture<byte[]> future = new CompletableFuture<>();
                       zkc.multi(Lists.newArrayList(Op.getData("/")), (rc, path, ctx, opResults) -> {
                           printThread("zk callback");
                           future.complete(null);
                       }, null);
                       return future;
                   });
   
           CountDownLatch countDownLatch = new CountDownLatch(1);
   
           // Reproduce the ZK callback is blocked
           System.out.println("async get start");
           objCache.get("/").whenComplete((unused, ignored) -> {
               printThread("async get done");
               try {
                   System.out.println("zk thread will blocked after sync get");
                   System.out.println("sync get start");
                   objCache.get("/1").get(5, TimeUnit.SECONDS);
                   printThread("sync get done");
                   countDownLatch.countDown();
               } catch (Exception e) {
                   e.printStackTrace();
               } finally {
                   countDownLatch.countDown();
               }
           });
   
           countDownLatch.await();
       }
   }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042774086


   > > @BewareMyPower This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid.
   > 
   > @lhotari This is not a new thing and it's something that can easily be changed by users. I don't think we need to rush into a release.
   
   I agree. Have I been rushing this? Instead, you can say that a lot of PRs with[ "make async"](https://github.com/apache/pulsar/pulls?q=is%3Apr+make+async) have been pushed and merged recently.
   This has introduced several known regressions that have been fixed. We don't know which are regressions that just haven't been found yet.
   
   [My valid questions were never answered by the contributors of the "make async" changes](https://github.com/apache/pulsar/issues/14013#issuecomment-1033528348).
   
   
   ![image](https://user-images.githubusercontent.com/66864/154451547-0e9937fb-ba50-425e-b9f0-5e77cdbd653c.png)
   
   I'm expecting that there are issues or a PIP which is referred to. 
   @merlimat WDYT?
   
   > 
   > > The sync->async changes in the Pulsar Admin API are not needed when a proper value is used.
   > 
   > The problem is not just keeping threads busy but the cases in which you have calls that spawn to multiple brokers. In this cases, the HTTP call can come back to same broker and we can have a deadlock, no matters how many threads we have in the pool.
   
   I'll clarify: what I have been referring to as "sync -> async" changes: changes where the use of the blocking Servlet API is migrated to use Asynchronous Servlet API. That won't solve any problems on it's own. Any problems that it might solve would be solved also by configuring Jetty as it is recommended to be configured when there are blocking calls involved. 
   The recommended maximum thread pool size is 50 to 500 for Jetty. I have been going through some details and there are multiple other things that are not properly configured. I'll be following up with separate PRs.
   
   I'd assume that the reason for deadlocks when thread pool size is properly configured are caused by locks. I like to see an example of a deadlock which couldn't be resolved by continuing to use the blocking servlet api. I'm not against the changes from blocking API to async API, but I think changes need proper justification, especially when the "make async" changes have been initiated without referring any reported issues or a PIP.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] github-actions[bot] commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1073147964


   The pr had no activity for 30 days, mark with Stale label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] nodece edited a comment on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
nodece edited a comment on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042715597


   The unavailability of the Admin API is not caused by the HTTP server thread, the root cause is that the ZK callback thread is blocked. 
   
   When an admin API calls the ZK metadatastore API, it gets the ZK data by call the `CompletableFuture`, note that we did not use the executor to execute the `CompletableFuture#complete()` in [ZKMetadataStore.java#L171](https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java#L171). In ZK callback thread, once the caller converts async to sync calls then the ZK callback thread will be blocked, this code so like: `metadata.getAsync().get(30, TimeUnit.SECONDS)`.
   
   How to solve this problem?
   1. Use an executor to execute the callback that passes data to Pulsar in ZK callback
   2. Don't convert async to sync calls, so there are make some PR that converts sync to async calls
   
   
   How to reproduce the ZK callback thread is blocked:
   ```
   docker run -d -p 2181:2181 --name test-zookeeper zookeeper
   ```
   ```java
   public class Main {
       private static final long CACHE_REFRESH_TIME_MILLIS = TimeUnit.MINUTES.toMillis(5);
   
       public static void printThread(String name) {
           System.out.println(name + " thread name -> " + Thread.currentThread().getName());
       }
   
       public static void main(String[] args) throws Exception {
           ZooKeeper zkc = new ZooKeeper("localhost:2181", 60_000, null);
   
           System.out.println("Check the zk connect");
           CountDownLatch zkLatch = new CountDownLatch(1);
           new Thread(() -> {
               while (true) {
                   if (zkc.getState().isConnected()) {
                       zkLatch.countDown();
                       break;
                   }
               }
           }).start();
           if (!zkLatch.await(5, TimeUnit.SECONDS)) {
               throw new Exception("zk connect failed");
           }
   
           AsyncLoadingCache<String, byte[]> objCache = Caffeine.newBuilder()
                   .refreshAfterWrite(CACHE_REFRESH_TIME_MILLIS, TimeUnit.MILLISECONDS)
                   .buildAsync((key, executor) -> {
                       CompletableFuture<byte[]> future = new CompletableFuture<>();
                       zkc.multi(Lists.newArrayList(Op.getData("/")), (rc, path, ctx, opResults) -> {
                           printThread("zk callback");
                           future.complete(null);
                       }, null);
                       return future;
                   });
   
           CountDownLatch countDownLatch = new CountDownLatch(1);
   
           // Reproduce the ZK callback is blocked
           System.out.println("async get start");
           objCache.get("/").whenComplete((unused, ignored) -> {
               printThread("async get done");
               try {
                   System.out.println("zk thread will blocked after sync get");
                   System.out.println("sync get start");
                   objCache.get("/1").get(5, TimeUnit.SECONDS);
                   // Unreachable
                   printThread("sync get done");
                   countDownLatch.countDown();
               } catch (Exception e) {
                   e.printStackTrace();
               } finally {
                   countDownLatch.countDown();
               }
           });
   
           countDownLatch.await();
       }
   }
   ```
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042867195


   > > [My valid questions were never answered by the contributors of the "make async" changes](https://github.com/apache/pulsar/issues/14013#issuecomment-1033528348).
   > 
   > It's a pity to see the lack of communication. AFAIK, @Technoboy- is also preparing for a PIP to make admin APIs async. I think you should have a discussion about:
   > 
   > 1. Whether this PR solve the root problem?
   
   Great point @BewareMyPower . I hope that the problem would first be discussed or reported before a PIP is created. @Technoboy- Would you be able to start some discussion even before the PIP is ready? 
   
   > 2. Based on this PR, is making admin APIs async meaningful?
   
   That's also a valid question to ask. When we work together, we can learn together.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041917123


   @codelipenghui  [PIP-142](https://github.com/apache/pulsar/issues/14329) has been published. Please follow up on the mailing list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042752949


   > When an admin API calls the ZK metadatastore API, it gets the ZK data by call the `CompletableFuture`, note that we did not use the executor to execute the `CompletableFuture#complete()` in [ZKMetadataStore.java#L171](https://github.com/apache/pulsar/blob/master/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/ZKMetadataStore.java#L171). In ZK callback thread, once the caller converts async to sync calls then the ZK callback thread will be blocked, this code so like: `metadata.getAsync().get(30, TimeUnit.SECONDS)`.
   
   The blocked thread in #13666 is a HTTP server thread.
   
   ```
   "pulsar-web-40-28" #238 prio=5 os_prio=0 tid=0x00007f5a4000d800 nid=0x2bcf waiting on condition [0x00007f5961d3b000]
      java.lang.Thread.State: WAITING (parking)
   	at sun.misc.Unsafe.park(Native Method)
   	- parking to wait for  <0x00000005c5529d40> (a java.util.concurrent.CompletableFuture$Signaller)
   	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
   	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
   	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
   	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
   	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalDeleteSubscriptionForNonPartitionedTopic(PersistentTopicsBase.java:1498)
   ```
   
   I'll clarify what I have been referring to as "sync -> async" changes: changes where the use of the blocking Servlet API is migrated to use Asynchronous Servlet API.
   I understand that it's necessary to not block in Zookeeper callbacks, but that is a different problem, which isn't related to Servlet API change.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Jason918 commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
Jason918 commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041531761


   > These are threads. Jetty defaults to 200 maximum threads,
   
   @lhotari If this is jetty defaults. Can we just leave it blank? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1041668621


   > Thread Pool
   > Configure with goal of limiting memory usage maximum available. Typically this is >50 and <500
   
   It makes sense to me. But I don't think it should be a blocker for 2.10.0 release. I found 2.10.0 release has been delayed for some time. The previous 2.9.0 release is also a bad case.
   
   We shouldn't add some temporary PRs during the release phase unless they are really important.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042782213


   > I think I need to explain more for the **important** term I used. IMO, a PR that could block a release during the release phase must match following rules:
   
   I haven't requested to block the release.
   
   > Let's look back to this PR. First, I don't think a change to the default configuration value can be treated as a bug fix. It's more like an enhancement. Because the previous stable releases all should have the same problem. Then, we can see it's not a 
   
   The Jetty documentation recommends values 50-500 for the maximum thread pool size. That is a fact, so there cannot be different opinions on this. Since the default configuration value doesn't fall in the recommended value range, my opinion is that this is a bug. For some people a bug is a feature. :) Does it really matter whether we call this a bug or an improvement?
   The fact is that the current value doesn't fall in the recommended value range.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari edited a comment on pull request #14320: [Broker] Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by GitBox <gi...@apache.org>.
lhotari edited a comment on pull request #14320:
URL: https://github.com/apache/pulsar/pull/14320#issuecomment-1042774086


   > > @BewareMyPower This PR contains a very important change. The recommended maximum thread count for Jetty thread pool is 50-500. Pulsar current uses an invalid valid.
   > 
   > @lhotari This is not a new thing and it's something that can easily be changed by users. I don't think we need to rush into a release.
   
   I agree. Have I been rushing this? Instead, you can say that a lot of PRs with[ "make async"](https://github.com/apache/pulsar/pulls?q=is%3Apr+make+async) have been pushed and merged recently.
   This has introduced several known regressions that have been fixed. We don't know which are regressions that just haven't been found yet.
   
   [My valid questions were never answered by the contributors of the "make async" changes](https://github.com/apache/pulsar/issues/14013#issuecomment-1033528348).
   
   
   ![image](https://user-images.githubusercontent.com/66864/154451547-0e9937fb-ba50-425e-b9f0-5e77cdbd653c.png)
   
   I'm expecting that there are issues or a PIP which is referred to. 
   @merlimat WDYT?
   
   > 
   > > The sync->async changes in the Pulsar Admin API are not needed when a proper value is used.
   > 
   > The problem is not just keeping threads busy but the cases in which you have calls that spawn to multiple brokers. In this cases, the HTTP call can come back to same broker and we can have a deadlock, no matters how many threads we have in the pool.
   
   I'll clarify: what I have been referring to as "sync -> async" changes: changes where the use of the blocking Servlet API is migrated to use Asynchronous Servlet API. That won't solve any problems on it's own. Any problems that it might solve would be solved also by configuring Jetty as it is recommended to be configured when there are blocking calls involved. 
   The recommended maximum thread pool size is 50 to 500 for Jetty. I have been going through some details and there are multiple other things that are not properly configured. I'll be following up with separate PRs. (UPDATE: the draft PR which fixes backpressure handling is #14353).
   
   I'd assume that the reason for deadlocks when thread pool size is properly configured are caused by locks. I like to see an example of a deadlock which couldn't be resolved by continuing to use the blocking servlet api. I'm not against the changes from blocking API to async API, but I think changes need proper justification, especially when the "make async" changes have been initiated without referring any reported issues or a PIP.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org