You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tomcat.apache.org by Rémy Maucherat <re...@apache.org> on 2016/06/03 13:36:22 UTC

HTTP/2 optimizations and edge cases

Hi,

With direct connect having been hacked in (err, I mean, "implemented"), it
is (a lot) easier to do meaningful performance tests. h2load is a drop in
replacement of ab that uses HTTP/2, and it allowed doing some easy
profiling.

The good news is that the code seems to be well optimized already with few
visible problems. The only issue is a very heavy sync contention on the
socket wrapper object in Http2UpgradeHandler.writeHeaders and
Http2UpgradeHandler.writeBody.

The reason for that is when you do:
h2load -c 1 -n 100 http://127.0.0.1:8080/tomcat.gif
It ends up being translated in Tomcat into: process one hundred concurrent
streams over one connection. Although h2load is not real world use, that's
something that would need to be solved as a client can use of a lot of
threads.

There are two main issues in HTTP/2 that could be improved:
1) Ideally, there should be a way to limit stream concurrency to some
extent and queue. But then there's a risk to stall a useful stream (that's
where stream priority comes in of course). Not easy.
2) All reads/writes are blocking mid frame. It's not too bad in practice,
but it's a useless risk, that's where async IO can provide an "easy"
solution using a dedicated NIO2 implementation.

Comments ?

Rémy

Re: HTTP/2 optimizations and edge cases

Posted by Mark Thomas <ma...@apache.org>.

On 05/06/2016 18:28, Christopher Schultz wrote:

<snip/>

> Can we use separate monitors for read versus write operations?

In theory, yes. We had that in 8.0.x but it created a lot of complexity
around error handling since you never know what the other thread might
be doing. Maybe there is a cleaner way to handle that with the
refactoring we've done but I haven't looked.

<snip/>

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: HTTP/2 optimizations and edge cases

Posted by Christopher Schultz <ch...@christopherschultz.net>.

Rémy and Mark,

On 6/3/16 10:11 AM, Mark Thomas wrote:
> On 03/06/2016 14:36, Rémy Maucherat wrote:
>> Hi,
>>
>> With direct connect having been hacked in (err, I mean, "implemented"), it
>> is (a lot) easier to do meaningful performance tests. h2load is a drop in
>> replacement of ab that uses HTTP/2, and it allowed doing some easy
>> profiling.
>>
>> The good news is that the code seems to be well optimized already with few
>> visible problems. The only issue is a very heavy sync contention on the
>> socket wrapper object in Http2UpgradeHandler.writeHeaders and
>> Http2UpgradeHandler.writeBody.
> 
> I suspect that is inevitable given the nature of the test. There is only
> one connection and if you have 100 streams all trying to write to the
> one connection at the same time you have to synchronise on something.

Can we use separate monitors for read versus write operations?

>> The reason for that is when you do:
>> h2load -c 1 -n 100 http://127.0.0.1:8080/tomcat.gif
>> It ends up being translated in Tomcat into: process one hundred concurrent
>> streams over one connection. Although h2load is not real world use, that's
>> something that would need to be solved as a client can use of a lot of
>> threads.
> 
> Hmm. We might be able to do something if we buffer writes on the server
> side (I'm thinking a buffer for streams to write into with a dedicated
> thread to do the writing) but I suspect that the bottleneck will quickly
> switch to the network in that case.

You can't really do any better than filling the network, right?

>> There are two main issues in HTTP/2 that could be improved:
>> 1) Ideally, there should be a way to limit stream concurrency to some
>> extent and queue. But then there's a risk to stall a useful stream (that's
>> where stream priority comes in of course). Not easy.
> 
> That should already be supported. Currently the default for concurrent
> streams is unlimited but we can make it whatever we think is reasonable.
> The HTTP/2 spec suggests it should be no lower than 100.

I was thinking about this the other day, too... a single H2 connection
can theoretically use every request-processing thread. That's not much
different than a single client making maxConnections HTTP/1.1
connections to the server, except that once the H2 connection is open,
there's no way to prevent it from monopolizing all the available
threads. With H1, it's theoretically possible to throttle the
connection-rate (or count) using mod_security, mod_qos, etc.

>> 2) All reads/writes are blocking mid frame. It's not too bad in practice,
>> but it's a useless risk, that's where async IO can provide an "easy"
>> solution using a dedicated NIO2 implementation.
> 
> They are blocking mid-frame but given the flow control provided by
> HTTP/2 the risk should be zero unless the client advertises a larger
> window than it can handle which would be the client's problem in my view.

+1

Also proxies and network gear can interfere with all protocol-level
attempts to manage packet sizes, so everything becomes very chaotic very
quickly. I'm not sure there are many opportunities to really tune things
reliably, here.

-chris

Re: HTTP/2 optimizations and edge cases

Posted by Rémy Maucherat <re...@apache.org>.

2016-06-15 13:27 GMT+02:00 Mark Thomas <ma...@apache.org>:

> On 13/06/2016 21:05, Rémy Maucherat wrote:
> > My conclusion is that some sort of optional mechanism should be added.
>
> Makes sense to me. How did you implement this for your testing?
>
> I'm not happy about it so I added a BZ to attach patches and discuss the
best strategy.

Rémy

Re: HTTP/2 optimizations and edge cases

Posted by Mark Thomas <ma...@apache.org>.

On 13/06/2016 21:05, R�my Maucherat wrote:
> 2016-06-03 17:36 GMT+02:00 Mark Thomas <ma...@apache.org>:
> 
>> On 03/06/2016 15:59, R�my Maucherat wrote:
>>>> I am not talking about a limit on concurrent streams where things are
>> being
>>> refused (and this is exposed through the settings), rather on streams
>> which
>>> are effectively being processed concurrently (= for example, in
>> headersEnd,
>>> we put the StreamProcessor in a queue rather than executing it
>> immediately
>>> ? unless it's a high priority stream, right ?). h2load allows comparing
>>> with other servers, and JF told me httpd has a lower HTTP/2 performance
>>> impact compared to Tomcat. Given the profiling, the problem is the heavy
>>> lock contention (no surprise, this is something that is very expensive)
>> and
>>> we could get better performance by controlling the contention. JF's
>>> original "HTTP/2 torture test" HTML page with 1000 images probably also
>>> runs into this. IMO we will eventually need a better execution strategy
>>> than what is in place at the moment, since all dumb benchmarks will run
>>> into that edge case. But I agree that it's only partially legitimate, the
>>> client has the opportunity to control it.
>>
>> Ah. Got it.
>>
> I added some rudimentary concurrency control for testing, the h2load
> results are immediately up to 30% better when using low concurrency levels
> (8 streams concurrently executed per connection). When allowing a larger
> but still reasonable amount of concurrency, like 32 [that would correspond
> to 32 connections for a single client for HTTP/1.1, so that's a lot], the
> performance is up 20% over the default. As a sync contention gets higher,
> the performance degrades fast and I verified using VM monitoring most of
> the threads in the pool spend most of the time blocked while the test runs.
> This depends on the thread pool size and the client behavior obviously, so
> the impact can be arbitrarily large.
> 
> My conclusion is that some sort of optional mechanism should be added.

Makes sense to me. How did you implement this for your testing?

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: HTTP/2 optimizations and edge cases

Posted by Rémy Maucherat <re...@apache.org>.

2016-06-03 17:36 GMT+02:00 Mark Thomas <ma...@apache.org>:

> On 03/06/2016 15:59, Rémy Maucherat wrote:
> >> I am not talking about a limit on concurrent streams where things are
> being
> > refused (and this is exposed through the settings), rather on streams
> which
> > are effectively being processed concurrently (= for example, in
> headersEnd,
> > we put the StreamProcessor in a queue rather than executing it
> immediately
> > ? unless it's a high priority stream, right ?). h2load allows comparing
> > with other servers, and JF told me httpd has a lower HTTP/2 performance
> > impact compared to Tomcat. Given the profiling, the problem is the heavy
> > lock contention (no surprise, this is something that is very expensive)
> and
> > we could get better performance by controlling the contention. JF's
> > original "HTTP/2 torture test" HTML page with 1000 images probably also
> > runs into this. IMO we will eventually need a better execution strategy
> > than what is in place at the moment, since all dumb benchmarks will run
> > into that edge case. But I agree that it's only partially legitimate, the
> > client has the opportunity to control it.
>
> Ah. Got it.
>
> I added some rudimentary concurrency control for testing, the h2load
results are immediately up to 30% better when using low concurrency levels
(8 streams concurrently executed per connection). When allowing a larger
but still reasonable amount of concurrency, like 32 [that would correspond
to 32 connections for a single client for HTTP/1.1, so that's a lot], the
performance is up 20% over the default. As a sync contention gets higher,
the performance degrades fast and I verified using VM monitoring most of
the threads in the pool spend most of the time blocked while the test runs.
This depends on the thread pool size and the client behavior obviously, so
the impact can be arbitrarily large.

My conclusion is that some sort of optional mechanism should be added.

Rémy

Re: HTTP/2 optimizations and edge cases

Posted by Mark Thomas <ma...@apache.org>.

On 03/06/2016 15:59, R�my Maucherat wrote:
> 2016-06-03 16:11 GMT+02:00 Mark Thomas <ma...@apache.org>:
> 
>> On 03/06/2016 14:36, R�my Maucherat wrote:
>>> Hi,
>>>
>>> With direct connect having been hacked in (err, I mean, "implemented"),
>> it
>>> is (a lot) easier to do meaningful performance tests. h2load is a drop in
>>> replacement of ab that uses HTTP/2, and it allowed doing some easy
>>> profiling.
>>>
>>> The good news is that the code seems to be well optimized already with
>> few
>>> visible problems. The only issue is a very heavy sync contention on the
>>> socket wrapper object in Http2UpgradeHandler.writeHeaders and
>>> Http2UpgradeHandler.writeBody.
>>
>> I suspect that is inevitable given the nature of the test. There is only
>> one connection and if you have 100 streams all trying to write to the
>> one connection at the same time you have to synchronise on something.
>>
>>> The reason for that is when you do:
>>> h2load -c 1 -n 100 http://127.0.0.1:8080/tomcat.gif
>>> It ends up being translated in Tomcat into: process one hundred
>> concurrent
>>> streams over one connection. Although h2load is not real world use,
>> that's
>>> something that would need to be solved as a client can use of a lot of
>>> threads.
>>
>> Hmm. We might be able to do something if we buffer writes on the server
>> side (I'm thinking a buffer for streams to write into with a dedicated
>> thread to do the writing) but I suspect that the bottleneck will quickly
>> switch to the network in that case.
>>
>>> There are two main issues in HTTP/2 that could be improved:
>>> 1) Ideally, there should be a way to limit stream concurrency to some
>>> extent and queue. But then there's a risk to stall a useful stream
>> (that's
>>> where stream priority comes in of course). Not easy.
>>
>> That should already be supported. Currently the default for concurrent
>> streams is unlimited but we can make it whatever we think is reasonable.
>> The HTTP/2 spec suggests it should be no lower than 100.
>>
> 
> I am not talking about a limit on concurrent streams where things are being
> refused (and this is exposed through the settings), rather on streams which
> are effectively being processed concurrently (= for example, in headersEnd,
> we put the StreamProcessor in a queue rather than executing it immediately
> ? unless it's a high priority stream, right ?). h2load allows comparing
> with other servers, and JF told me httpd has a lower HTTP/2 performance
> impact compared to Tomcat. Given the profiling, the problem is the heavy
> lock contention (no surprise, this is something that is very expensive) and
> we could get better performance by controlling the contention. JF's
> original "HTTP/2 torture test" HTML page with 1000 images probably also
> runs into this. IMO we will eventually need a better execution strategy
> than what is in place at the moment, since all dumb benchmarks will run
> into that edge case. But I agree that it's only partially legitimate, the
> client has the opportunity to control it.

Ah. Got it.

>>> 2) All reads/writes are blocking mid frame. It's not too bad in practice,
>>> but it's a useless risk, that's where async IO can provide an "easy"
>>> solution using a dedicated NIO2 implementation.
>>
>> They are blocking mid-frame but given the flow control provided by
>> HTTP/2 the risk should be zero unless the client advertises a larger
>> window than it can handle which would be the client's problem in my view.
>>
> 
> I'm only half convinced since it's not very modern :) We have to experiment
> with our "better"/fancier async tech at some point and see the benefits. A
> "selling" item of the NIO1 connector was its non blocking HTTP/1.1 headers
> reading, and now we no longer have the feature in HTTP/2.
> With async IO and reads, the frame complete check code can be in the
> completion handler, which will then only "complete" when the frame is fully
> read. It's a simple and generic solution to the problem. Writes are simpler
> (I think). The main pitfall in both cases is the buffering and what to do
> with the socket buffer [it's probably better to use it with SSL, and better
> to ignore it when unencrypted]. Of course this won't provide the full
> benefits if the user code is not using Servlet 3.1 IO.

I'm not sure how much benefit there will be for real world use cases but
we won't know unless we try.

Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org

Re: HTTP/2 optimizations and edge cases

Posted by Rémy Maucherat <re...@apache.org>.

2016-06-03 16:11 GMT+02:00 Mark Thomas <ma...@apache.org>:

> On 03/06/2016 14:36, Rémy Maucherat wrote:
> > Hi,
> >
> > With direct connect having been hacked in (err, I mean, "implemented"),
> it
> > is (a lot) easier to do meaningful performance tests. h2load is a drop in
> > replacement of ab that uses HTTP/2, and it allowed doing some easy
> > profiling.
> >
> > The good news is that the code seems to be well optimized already with
> few
> > visible problems. The only issue is a very heavy sync contention on the
> > socket wrapper object in Http2UpgradeHandler.writeHeaders and
> > Http2UpgradeHandler.writeBody.
>
> I suspect that is inevitable given the nature of the test. There is only
> one connection and if you have 100 streams all trying to write to the
> one connection at the same time you have to synchronise on something.
>
> > The reason for that is when you do:
> > h2load -c 1 -n 100 http://127.0.0.1:8080/tomcat.gif
> > It ends up being translated in Tomcat into: process one hundred
> concurrent
> > streams over one connection. Although h2load is not real world use,
> that's
> > something that would need to be solved as a client can use of a lot of
> > threads.
>
> Hmm. We might be able to do something if we buffer writes on the server
> side (I'm thinking a buffer for streams to write into with a dedicated
> thread to do the writing) but I suspect that the bottleneck will quickly
> switch to the network in that case.
>
> > There are two main issues in HTTP/2 that could be improved:
> > 1) Ideally, there should be a way to limit stream concurrency to some
> > extent and queue. But then there's a risk to stall a useful stream
> (that's
> > where stream priority comes in of course). Not easy.
>
> That should already be supported. Currently the default for concurrent
> streams is unlimited but we can make it whatever we think is reasonable.
> The HTTP/2 spec suggests it should be no lower than 100.
>

I am not talking about a limit on concurrent streams where things are being
refused (and this is exposed through the settings), rather on streams which
are effectively being processed concurrently (= for example, in headersEnd,
we put the StreamProcessor in a queue rather than executing it immediately
? unless it's a high priority stream, right ?). h2load allows comparing
with other servers, and JF told me httpd has a lower HTTP/2 performance
impact compared to Tomcat. Given the profiling, the problem is the heavy
lock contention (no surprise, this is something that is very expensive) and
we could get better performance by controlling the contention. JF's
original "HTTP/2 torture test" HTML page with 1000 images probably also
runs into this. IMO we will eventually need a better execution strategy
than what is in place at the moment, since all dumb benchmarks will run
into that edge case. But I agree that it's only partially legitimate, the
client has the opportunity to control it.

>
> > 2) All reads/writes are blocking mid frame. It's not too bad in practice,
> > but it's a useless risk, that's where async IO can provide an "easy"
> > solution using a dedicated NIO2 implementation.
>
> They are blocking mid-frame but given the flow control provided by
> HTTP/2 the risk should be zero unless the client advertises a larger
> window than it can handle which would be the client's problem in my view.
>

I'm only half convinced since it's not very modern :) We have to experiment
with our "better"/fancier async tech at some point and see the benefits. A
"selling" item of the NIO1 connector was its non blocking HTTP/1.1 headers
reading, and now we no longer have the feature in HTTP/2.
With async IO and reads, the frame complete check code can be in the
completion handler, which will then only "complete" when the frame is fully
read. It's a simple and generic solution to the problem. Writes are simpler
(I think). The main pitfall in both cases is the buffering and what to do
with the socket buffer [it's probably better to use it with SSL, and better
to ignore it when unencrypted]. Of course this won't provide the full
benefits if the user code is not using Servlet 3.1 IO.

Rémy

Re: HTTP/2 optimizations and edge cases

Posted by Mark Thomas <ma...@apache.org>.

On 03/06/2016 14:36, R�my Maucherat wrote:
> Hi,
> 
> With direct connect having been hacked in (err, I mean, "implemented"), it
> is (a lot) easier to do meaningful performance tests. h2load is a drop in
> replacement of ab that uses HTTP/2, and it allowed doing some easy
> profiling.
> 
> The good news is that the code seems to be well optimized already with few
> visible problems. The only issue is a very heavy sync contention on the
> socket wrapper object in Http2UpgradeHandler.writeHeaders and
> Http2UpgradeHandler.writeBody.

I suspect that is inevitable given the nature of the test. There is only
one connection and if you have 100 streams all trying to write to the
one connection at the same time you have to synchronise on something.

> The reason for that is when you do:
> h2load -c 1 -n 100 http://127.0.0.1:8080/tomcat.gif
> It ends up being translated in Tomcat into: process one hundred concurrent
> streams over one connection. Although h2load is not real world use, that's
> something that would need to be solved as a client can use of a lot of
> threads.

Hmm. We might be able to do something if we buffer writes on the server
side (I'm thinking a buffer for streams to write into with a dedicated
thread to do the writing) but I suspect that the bottleneck will quickly
switch to the network in that case.

> There are two main issues in HTTP/2 that could be improved:
> 1) Ideally, there should be a way to limit stream concurrency to some
> extent and queue. But then there's a risk to stall a useful stream (that's
> where stream priority comes in of course). Not easy.

That should already be supported. Currently the default for concurrent
streams is unlimited but we can make it whatever we think is reasonable.
The HTTP/2 spec suggests it should be no lower than 100.

> 2) All reads/writes are blocking mid frame. It's not too bad in practice,
> but it's a useless risk, that's where async IO can provide an "easy"
> solution using a dedicated NIO2 implementation.

They are blocking mid-frame but given the flow control provided by
HTTP/2 the risk should be zero unless the client advertises a larger
window than it can handle which would be the client's problem in my view.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@tomcat.apache.org
For additional commands, e-mail: dev-help@tomcat.apache.org