You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pulsar.apache.org by Lari Hotari <lh...@apache.org> on 2022/02/16 17:32:38 UTC

[DISCUSS] PIP-142 Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

URL: https://github.com/apache/pulsar/issues/14329

Motivation

Since Pulsar Admin API uses the blocking servlet API, all Jetty threads
might be occupied and this causes unavailability of the Pulsar Admin
API. The default value for the maximum number of threads for Jetty is
too low in Pulsar. That is the root cause of many problems where Pulsar
Admin API is unavailable when all threads are in use.

Additional context

-   Examples of previous issues where Jetty threads have been occupied
    and caused problems: #13666 #4756 #10619
-   Mailing list thread about “make async” changes:
    https://lists.apache.org/thread/tn7rt59cd1k724l4ytfcmzx1w2sbtw7l

Implementation

-   Jetty defaults to 200 maximum threads, to prevent thread pool
    starvation. Make Pulsar use the same default value by setting
    numHttpServerThreads=200.
-   Update the documentation for numHttpServerThreads
    -   The PR is already in place:
        https://github.com/apache/pulsar/pull/14320
-   Set Jetty selectors and acceptors parameters to -1 so that Jetty
    automatically chooses optimal values based on available cores. The
    rationale is explained in the Q&A below.
    -   A separate PR will be made for this change.

Q&A

Q: What’s the reason of setting the default value to 200? If the node just have one core, what will happen?

These are threads. Jetty defaults to 200 maximum threads, to prevent
thread pool starvation. This is recommended when using blocking Servlet
API. The problem is that Pulsar uses the blocking servlet API and
doesn’t have a sufficient number of threads which are needed and
recommended.

The value 200 doesn’t mean that there will be 200 threads to start with.
This is the maximum size for the thread pool. When the value is more
than 8, Jetty will start with 8 initial threads and add more threads to
the pool when all threads are occupied.

Q: Do we need to take the number of system cores into consideration for the maximum threads of the thread pool?

No. Jetty is different from Netty in this aspect. In Netty, everything
should be asynchronous and “thou shall never block”. In Jetty, the
maximum number of threads for the thread pool should be set to 50-500
threads and blocking operations are fine.

The recommendation for the thread pool is explained in Jetty
documentation
https://www.eclipse.org/jetty/documentation/jetty-9/index.html#_thread_pool
> Thread Pool > Configure with goal of limiting memory usage maximum
available. Typically this is >50 and <500

However, there are separate settings which should take the number of
available processors (cores) into account in Jetty.

http port acceptor and selector count:
https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L88

https port acceptor and selector count:
https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L125

Jetty documentantion for acceptors: > Acceptors > The standard rule of
thumb for the number of Accepters to configure is one per CPU on a given
machine.

Jetty documentation for selectors: > Selectors > The default number of
selectors is equal to half of the number of processors available to the
JVM, which should allow optimal performance even if all the connections
used are performing significant non-blocking work in the callback tasks.

The settings in jetty are the “acceptor” and “selector” thread count
settings. These have been fixed to 1 in Pulsar. The acceptors and
selectors settings should be both set to -1. Jetty would pick the
recommended count based on cores in that case.

Re: [DISCUSS] PIP-142 Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by Hangda Sun <lw...@apache.org>.
I can't understand how deadlock happens, can you explain it in detail ?

On 2022/02/17 03:00:36 PengHui Li wrote:
> Hi Lari,
> 
> Thanks for raising a proposal and starting the discussion thread here.
> Increasing the thread does not solve the underlying problem.
> but also brings additional resource consumption.
> 
> Thanks, Matteo provided a great explanation,
> and, I totally agree with his explanation.
> What should we try to optimize thread usage,
> reduce block calls, rather than going in the opposite direction.
> 
> For the current REST API implementation,
> I saw 2 main problems
> 
> 1. The API defined we are using the Jetty async mode, but
> is actually a blocking call. I think 100% this is bad code,
> confuse contributors, and bring potential risks
> 2. We don't know the exact return type from the interface definition [1]
> until reading the implementation code here[2]
> 
> So, I think to change the current implementation to purely async
> and avoid passing the `AsyncResponse` to the implementation
> is the right way to improve the REST API and fix the potential problems.
> 
> [1]
> https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/v2/PersistentTopics.java#L1073
> [2]
> https://github.com/apache/pulsar/blob/df9a12dc90316e0d5c1d9effb416ec83fbcf9b5e/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/PersistentTopicsBase.java#L1259
> 
> Thanks,
> Penghui
> 
> On Thu, Feb 17, 2022 at 10:00 AM Matteo Merli <ma...@gmail.com>
> wrote:
> 
> > Hi Lari,
> >
> > thanks for the proposal, though I don't think it would be good to
> > change this default value.
> >
> > Pulsar HTTP API is an administrative service and it is not on the
> > critical data path.
> > If all the threads in the pool are busy, it is good to apply back
> > pressure and slow down the incoming new requests.
> >
> > In the past, we had several cases of deadlocks involving HTTP Jetty
> > threads. In all these cases, the root cause has been a broker that in
> > response to an HTTP request is spawning HTTP requests to other
> > brokers, and some of these requests are going to itself.
> >
> > These kinds of deadlock are unrelated to the number of HTTP threads in
> > the pool and must be dealt with by making the underlying operation
> > asynchronous.
> >
> > Finally each user is free to change the default to something that
> > suits more, though I really don't think it makes for us to default to
> > 200 threads as:
> >   1. It will not improve the Pulsar performance
> >   2. It will not magically solve any deadlock issue
> >   3. It will just consume much more resources for everyone
> >
> >
> > Replying in-line to some of the points.
> >
> > >
> > > Additional context
> > >
> > > -   Examples of previous issues where Jetty threads have been occupied
> > >     and caused problems: #13666 #4756 #10619
> >
> > All these examples are cases of what I have described above: an HTTP
> > handler triggering a call to the same HTTP server resulting in a
> > deadlock.
> >
> > Having had more threads wouldn't have fixed them.
> >
> > > -   Mailing list thread about “make async” changes:
> > >    https://lists.apache.org/thread/tn7rt59cd1k724l4ytfcmzx1w2sbtw7l
> >
> > The rationale for these changes has not much to do with Jetty but
> > rather to make sure that we are not mixing blocking and non-blocking
> > operations when handling these requests, which is prone to cause
> > deadlocks in broker's internal threads (irrespective of any Jetty
> > thread pool size).
> >
> > > Q: What’s the reason of setting the default value to 200? If the node
> > just have one core, what will happen?
> > >
> > > These are threads. Jetty defaults to 200 maximum threads, to prevent
> > > thread pool starvation. This is recommended when using blocking Servlet
> > > API. The problem is that Pulsar uses the blocking servlet API and
> > > doesn’t have a sufficient number of threads which are needed and
> > > recommended.
> >
> > We need to keep in mind that typical usage of Jetty is as a Servlet
> > container, dedicated to that. You have servlets doing disk IO reads
> > and blocking calls on JDBC drivers.
> >
> > Pulsar usage is *very* different from that. Admin API is a tiny
> > portion of what a broker does. Dedicating 200 threads to that would be
> > a massive waste of resources (CPU & memory).
> >
> > > The value 200 doesn’t mean that there will be 200 threads to start with.
> > > This is the maximum size for the thread pool. When the value is more
> > > than 8, Jetty will start with 8 initial threads and add more threads to
> > > the pool when all threads are occupied.
> > >
> > > Q: Do we need to take the number of system cores into consideration for
> > the maximum threads of the thread pool?
> > >
> > > No. Jetty is different from Netty in this aspect. In Netty, everything
> > > should be asynchronous and “thou shall never block”. In Jetty, the
> > > maximum number of threads for the thread pool should be set to 50-500
> > > threads and blocking operations are fine.
> >
> > I disagree. In a container configured with 0.5 CPU cores, it would be
> > very bad to run 200 threads, and it will have no other advantage
> > whatsoever.
> >
> > That is the reason why the default thread pool size is pegged to the
> > number of cores.
> >
> > > The recommendation for the thread pool is explained in Jetty
> > > documentation
> > >
> > https://www.eclipse.org/jetty/documentation/jetty-9/index.html#_thread_pool
> > > > Thread Pool > Configure with goal of limiting memory usage maximum
> > > available. Typically this is >50 and <500
> >
> > The recommendations for Jetty are relative to typical Jetty workloads,
> > something the Pulsar broker is definitely not.
> >
> 

Re: [DISCUSS] PIP-142 Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by PengHui Li <pe...@apache.org>.
Hi Lari,

Thanks for raising a proposal and starting the discussion thread here.
Increasing the thread does not solve the underlying problem.
but also brings additional resource consumption.

Thanks, Matteo provided a great explanation,
and, I totally agree with his explanation.
What should we try to optimize thread usage,
reduce block calls, rather than going in the opposite direction.

For the current REST API implementation,
I saw 2 main problems

1. The API defined we are using the Jetty async mode, but
is actually a blocking call. I think 100% this is bad code,
confuse contributors, and bring potential risks
2. We don't know the exact return type from the interface definition [1]
until reading the implementation code here[2]

So, I think to change the current implementation to purely async
and avoid passing the `AsyncResponse` to the implementation
is the right way to improve the REST API and fix the potential problems.

[1]
https://github.com/apache/pulsar/blob/master/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/v2/PersistentTopics.java#L1073
[2]
https://github.com/apache/pulsar/blob/df9a12dc90316e0d5c1d9effb416ec83fbcf9b5e/pulsar-broker/src/main/java/org/apache/pulsar/broker/admin/impl/PersistentTopicsBase.java#L1259

Thanks,
Penghui

On Thu, Feb 17, 2022 at 10:00 AM Matteo Merli <ma...@gmail.com>
wrote:

> Hi Lari,
>
> thanks for the proposal, though I don't think it would be good to
> change this default value.
>
> Pulsar HTTP API is an administrative service and it is not on the
> critical data path.
> If all the threads in the pool are busy, it is good to apply back
> pressure and slow down the incoming new requests.
>
> In the past, we had several cases of deadlocks involving HTTP Jetty
> threads. In all these cases, the root cause has been a broker that in
> response to an HTTP request is spawning HTTP requests to other
> brokers, and some of these requests are going to itself.
>
> These kinds of deadlock are unrelated to the number of HTTP threads in
> the pool and must be dealt with by making the underlying operation
> asynchronous.
>
> Finally each user is free to change the default to something that
> suits more, though I really don't think it makes for us to default to
> 200 threads as:
>   1. It will not improve the Pulsar performance
>   2. It will not magically solve any deadlock issue
>   3. It will just consume much more resources for everyone
>
>
> Replying in-line to some of the points.
>
> >
> > Additional context
> >
> > -   Examples of previous issues where Jetty threads have been occupied
> >     and caused problems: #13666 #4756 #10619
>
> All these examples are cases of what I have described above: an HTTP
> handler triggering a call to the same HTTP server resulting in a
> deadlock.
>
> Having had more threads wouldn't have fixed them.
>
> > -   Mailing list thread about “make async” changes:
> >    https://lists.apache.org/thread/tn7rt59cd1k724l4ytfcmzx1w2sbtw7l
>
> The rationale for these changes has not much to do with Jetty but
> rather to make sure that we are not mixing blocking and non-blocking
> operations when handling these requests, which is prone to cause
> deadlocks in broker's internal threads (irrespective of any Jetty
> thread pool size).
>
> > Q: What’s the reason of setting the default value to 200? If the node
> just have one core, what will happen?
> >
> > These are threads. Jetty defaults to 200 maximum threads, to prevent
> > thread pool starvation. This is recommended when using blocking Servlet
> > API. The problem is that Pulsar uses the blocking servlet API and
> > doesn’t have a sufficient number of threads which are needed and
> > recommended.
>
> We need to keep in mind that typical usage of Jetty is as a Servlet
> container, dedicated to that. You have servlets doing disk IO reads
> and blocking calls on JDBC drivers.
>
> Pulsar usage is *very* different from that. Admin API is a tiny
> portion of what a broker does. Dedicating 200 threads to that would be
> a massive waste of resources (CPU & memory).
>
> > The value 200 doesn’t mean that there will be 200 threads to start with.
> > This is the maximum size for the thread pool. When the value is more
> > than 8, Jetty will start with 8 initial threads and add more threads to
> > the pool when all threads are occupied.
> >
> > Q: Do we need to take the number of system cores into consideration for
> the maximum threads of the thread pool?
> >
> > No. Jetty is different from Netty in this aspect. In Netty, everything
> > should be asynchronous and “thou shall never block”. In Jetty, the
> > maximum number of threads for the thread pool should be set to 50-500
> > threads and blocking operations are fine.
>
> I disagree. In a container configured with 0.5 CPU cores, it would be
> very bad to run 200 threads, and it will have no other advantage
> whatsoever.
>
> That is the reason why the default thread pool size is pegged to the
> number of cores.
>
> > The recommendation for the thread pool is explained in Jetty
> > documentation
> >
> https://www.eclipse.org/jetty/documentation/jetty-9/index.html#_thread_pool
> > > Thread Pool > Configure with goal of limiting memory usage maximum
> > available. Typically this is >50 and <500
>
> The recommendations for Jetty are relative to typical Jetty workloads,
> something the Pulsar broker is definitely not.
>

Re: [DISCUSS] PIP-142 Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by Matteo Merli <ma...@gmail.com>.
Hi Lari,

thanks for the proposal, though I don't think it would be good to
change this default value.

Pulsar HTTP API is an administrative service and it is not on the
critical data path.
If all the threads in the pool are busy, it is good to apply back
pressure and slow down the incoming new requests.

In the past, we had several cases of deadlocks involving HTTP Jetty
threads. In all these cases, the root cause has been a broker that in
response to an HTTP request is spawning HTTP requests to other
brokers, and some of these requests are going to itself.

These kinds of deadlock are unrelated to the number of HTTP threads in
the pool and must be dealt with by making the underlying operation
asynchronous.

Finally each user is free to change the default to something that
suits more, though I really don't think it makes for us to default to
200 threads as:
  1. It will not improve the Pulsar performance
  2. It will not magically solve any deadlock issue
  3. It will just consume much more resources for everyone


Replying in-line to some of the points.

>
> Additional context
>
> -   Examples of previous issues where Jetty threads have been occupied
>     and caused problems: #13666 #4756 #10619

All these examples are cases of what I have described above: an HTTP
handler triggering a call to the same HTTP server resulting in a
deadlock.

Having had more threads wouldn't have fixed them.

> -   Mailing list thread about “make async” changes:
>    https://lists.apache.org/thread/tn7rt59cd1k724l4ytfcmzx1w2sbtw7l

The rationale for these changes has not much to do with Jetty but
rather to make sure that we are not mixing blocking and non-blocking
operations when handling these requests, which is prone to cause
deadlocks in broker's internal threads (irrespective of any Jetty
thread pool size).

> Q: What’s the reason of setting the default value to 200? If the node just have one core, what will happen?
>
> These are threads. Jetty defaults to 200 maximum threads, to prevent
> thread pool starvation. This is recommended when using blocking Servlet
> API. The problem is that Pulsar uses the blocking servlet API and
> doesn’t have a sufficient number of threads which are needed and
> recommended.

We need to keep in mind that typical usage of Jetty is as a Servlet
container, dedicated to that. You have servlets doing disk IO reads
and blocking calls on JDBC drivers.

Pulsar usage is *very* different from that. Admin API is a tiny
portion of what a broker does. Dedicating 200 threads to that would be
a massive waste of resources (CPU & memory).

> The value 200 doesn’t mean that there will be 200 threads to start with.
> This is the maximum size for the thread pool. When the value is more
> than 8, Jetty will start with 8 initial threads and add more threads to
> the pool when all threads are occupied.
>
> Q: Do we need to take the number of system cores into consideration for the maximum threads of the thread pool?
>
> No. Jetty is different from Netty in this aspect. In Netty, everything
> should be asynchronous and “thou shall never block”. In Jetty, the
> maximum number of threads for the thread pool should be set to 50-500
> threads and blocking operations are fine.

I disagree. In a container configured with 0.5 CPU cores, it would be
very bad to run 200 threads, and it will have no other advantage
whatsoever.

That is the reason why the default thread pool size is pegged to the
number of cores.

> The recommendation for the thread pool is explained in Jetty
> documentation
> https://www.eclipse.org/jetty/documentation/jetty-9/index.html#_thread_pool
> > Thread Pool > Configure with goal of limiting memory usage maximum
> available. Typically this is >50 and <500

The recommendations for Jetty are relative to typical Jetty workloads,
something the Pulsar broker is definitely not.

Re: [DISCUSS] PIP-142 Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by Matteo Merli <ma...@gmail.com>.
On Thu, Feb 17, 2022 at 3:12 AM Yunze Xu <yz...@streamnative.io.invalid> wrote:
> And we should make it clear whether numHttpServerThread=200 could
> solve the existing problem like
> https://github.com/apache/pulsar/pull/13666 <https://github.com/apache/pulsar/pull/13666>

Changing the number of threads won't do anything to improve the above
issue. The deadlock there is completely unrelated to the HTTP threads.

The Jetty thread getting stuck is just a side effect of the deadlock
caused by mixing sync & async operations.

Re: [DISCUSS] PIP-142 Increase default numHttpServerThreads value to 200 to prevent Admin API unavailability

Posted by Yunze Xu <yz...@streamnative.io.INVALID>.
My only concern is the same with Matteo:

> Dedicating 200 threads to that would be
> a massive waste of resources (CPU & memory).

And we should make it clear whether numHttpServerThread=200 could
solve the existing problem like
https://github.com/apache/pulsar/pull/13666 <https://github.com/apache/pulsar/pull/13666> 

Thanks,
Yunze


> 2022年2月17日 上午1:32,Lari Hotari <lh...@apache.org> 写道:
> 
> URL: https://github.com/apache/pulsar/issues/14329
> 
> Motivation
> 
> Since Pulsar Admin API uses the blocking servlet API, all Jetty threads
> might be occupied and this causes unavailability of the Pulsar Admin
> API. The default value for the maximum number of threads for Jetty is
> too low in Pulsar. That is the root cause of many problems where Pulsar
> Admin API is unavailable when all threads are in use.
> 
> Additional context
> 
> -   Examples of previous issues where Jetty threads have been occupied
>    and caused problems: #13666 #4756 #10619
> -   Mailing list thread about “make async” changes:
>    https://lists.apache.org/thread/tn7rt59cd1k724l4ytfcmzx1w2sbtw7l
> 
> Implementation
> 
> -   Jetty defaults to 200 maximum threads, to prevent thread pool
>    starvation. Make Pulsar use the same default value by setting
>    numHttpServerThreads=200.
> -   Update the documentation for numHttpServerThreads
>    -   The PR is already in place:
>        https://github.com/apache/pulsar/pull/14320
> -   Set Jetty selectors and acceptors parameters to -1 so that Jetty
>    automatically chooses optimal values based on available cores. The
>    rationale is explained in the Q&A below.
>    -   A separate PR will be made for this change.
> 
> Q&A
> 
> Q: What’s the reason of setting the default value to 200? If the node just have one core, what will happen?
> 
> These are threads. Jetty defaults to 200 maximum threads, to prevent
> thread pool starvation. This is recommended when using blocking Servlet
> API. The problem is that Pulsar uses the blocking servlet API and
> doesn’t have a sufficient number of threads which are needed and
> recommended.
> 
> The value 200 doesn’t mean that there will be 200 threads to start with.
> This is the maximum size for the thread pool. When the value is more
> than 8, Jetty will start with 8 initial threads and add more threads to
> the pool when all threads are occupied.
> 
> Q: Do we need to take the number of system cores into consideration for the maximum threads of the thread pool?
> 
> No. Jetty is different from Netty in this aspect. In Netty, everything
> should be asynchronous and “thou shall never block”. In Jetty, the
> maximum number of threads for the thread pool should be set to 50-500
> threads and blocking operations are fine.
> 
> The recommendation for the thread pool is explained in Jetty
> documentation
> https://www.eclipse.org/jetty/documentation/jetty-9/index.html#_thread_pool
>> Thread Pool > Configure with goal of limiting memory usage maximum
> available. Typically this is >50 and <500
> 
> However, there are separate settings which should take the number of
> available processors (cores) into account in Jetty.
> 
> http port acceptor and selector count:
> https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L88
> 
> https port acceptor and selector count:
> https://github.com/apache/pulsar/blob/b540523b474e4194e30c1acab65dfafdd11d3210/pulsar-broker/src/main/java/org/apache/pulsar/broker/web/WebService.java#L125
> 
> Jetty documentantion for acceptors: > Acceptors > The standard rule of
> thumb for the number of Accepters to configure is one per CPU on a given
> machine.
> 
> Jetty documentation for selectors: > Selectors > The default number of
> selectors is equal to half of the number of processors available to the
> JVM, which should allow optimal performance even if all the connections
> used are performing significant non-blocking work in the callback tasks.
> 
> The settings in jetty are the “acceptor” and “selector” thread count
> settings. These have been fixed to 1 in Pulsar. The acceptors and
> selectors settings should be both set to -1. Jetty would pick the
> recommended count based on cores in that case.