You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by Oleg Kalnichevski <ol...@apache.org> on 2021/03/30 20:51:54 UTC

connection leak reproducer; was Re: [VOTE] Release HttpClient 5.1 based on RC1

On Tue, 2021-03-30 at 13:35 -0700, Ryan Schmitt wrote:
> Good news, actually: I think I *just* reproduced it now. I ran a
> hacked up
> benchmark that sends 100,000 HTTPS requests across 50 threads with
> various
> randomized timeouts and delays, and after everything was done there
> were
> still two "leased" connections in the thread pool. This is exactly
> what I
> was looking for. A turnkey repro and a fix might not be far off now.
> 

All connections have a unique ID assigned to them at construction time
which is also used in the context logs as a correlation id.

If you could dump the ids of the connections still leased from the pool
at the end of a benchmark run, you could look for abnormalities in
message exchanges over those connections.

Oleg

> On Tue, Mar 30, 2021 at 1:24 PM Oleg Kalnichevski <ol...@apache.org>
> wrote:
> 
> > On Tue, 2021-03-30 at 13:08 -0700, Ryan Schmitt wrote:
> > > Whether the release is blocked or not is up to the other voting
> > > PMC
> > > members.
> > 
> > With just 4 active PMC members -1 is effectively a veto.
> > 
> > I will call off the release.
> > 
> > Oleg
> > 
> > 
> > >  My preference is to at least try to fix the current branch
> > > before
> > > introducing another one we have to maintain (under strict binary
> > > compatibility). I expect to have something within the next few
> > > days,
> > > and if
> > > I don't then we might as well go ahead and release anyway, but
> > > experience
> > > tells me that these bugs are real and they are not going away.
> > > 
> > > On Tue, Mar 30, 2021 at 12:42 PM Oleg Kalnichevski <
> > > olegk@apache.org>
> > > wrote:
> > > 
> > > > On Tue, 2021-03-30 at 12:24 -0700, Ryan Schmitt wrote:
> > > > > -1. I can't sign off on a release when multiple show-stopping
> > > > > bugs
> > > > > remain
> > > > > at large in 5.0.
> > > > > 
> > > > 
> > > > Ryan
> > > > 
> > > > Please confirm your intention to block all releases while those
> > > > bugs
> > > > are being investigated.
> > > > 
> > > > Also I cannot pull miracles out of my rectum and fix bugs that
> > > > we
> > > > collectively are unable to reproduce.
> > > > 
> > > > Oleg
> > > > 
> > > > 
> > > > > On Tue, Mar 30, 2021 at 11:56 AM Oleg Kalnichevski <
> > > > > olegk@apache.org>
> > > > > wrote:
> > > > > 
> > > > > > Please vote on releasing these packages as HttpClient 5.1.
> > > > > > The vote is open for the at least 72 hours, and only votes
> > > > > > from
> > > > > > HttpComponents PMC members are binding. The vote passes if
> > > > > > at
> > > > > > least
> > > > > > three binding +1 votes are cast and there are more +1 than
> > > > > > -1
> > > > > > votes.
> > > > > > 
> > > > > > Release notes:
> > > > > > 
> > > > > > 
> > https://dist.apache.org/repos/dist/dev/httpcomponents/httpclient-5.1-RC1/RELEASE_NOTES-5.1.x.txt
> > > > > > Maven artefacts:
> > > > > > 
> > > > > > 
> > https://repository.apache.org/content/repositories/orgapachehttpcomponents-1130/org/apache/httpcomponents/client5/
> > > > > > Git Tag: 5.1-RC1
> > > > > >  
> > > > > > https://github.com/apache/httpcomponents-client/tree/5.1-RC1
> > > > > > 
> > > > > > Packages:
> > > > > > 
> > > > > > 
> > https://dist.apache.org/repos/dist/dev/httpcomponents/httpclient-5.1-RC1
> > > > > >  revision 46817
> > > > > > 
> > > > > > Hashes:
> > > > > >  06f4fe645cf75b4a9d5824313ddfb2206b90d94877142214e340934786
> > > > > > 6652
> > > > > > 2c56
> > > > > > ed85c0d3e11f043605fd06b2758f5ea0c13f398cff6b58802b2bbf8b80b
> > > > > > 633
> > > > > > httpcomponents-client-5.1-bin.tar.gz
> > > > > >  484c5a16a43e711a9a816717077b2251e315f9818862ab0af21d2a7a79
> > > > > > bc72
> > > > > > d74a
> > > > > > baeafd0826542414e24ccef7d031498eae83ed737858b4818f702f3b1d2
> > > > > > 105
> > > > > > httpcomponents-client-5.1-bin.zip
> > > > > >  00f8714ae85299c826b4f30bbcece0eda4b44aeed93e42d99b4e76365e
> > > > > > ff15
> > > > > > f98e
> > > > > > a5a181503d941c56ccc9543a73121f22e41e8dbd3e1195c86ba457f0204
> > > > > > 9d4
> > > > > > httpcomponents-client-5.1-src.tar.gz
> > > > > >  2a2de6861e8c8f816eb8fbf6c840f2d685a7fad5512d92102b017db26d
> > > > > > 2c09
> > > > > > ef8e
> > > > > > c380a168c62a2eb1db7cb5741df66005c18df9733d0177c1d5fc30a973a
> > > > > > 835
> > > > > > httpcomponents-client-5.1-src.zip
> > > > > > 
> > > > > > Keys:
> > > > > >  https://www.apache.org/dist/httpcomponents/httpclient/KEYS
> > > > > > 
> > > > > > ---------------------------------------------------------
> > > > > > ----
> > > > > > ----
> > > > > > ---------
> > > > > > Vote: HttpClient 5.1 release
> > > > > > [ ] +1 Release the packages as HttpClient 5.1.
> > > > > > [ ] -1 I am against releasing the packages (must include a
> > > > > > reason).
> > > > > > 
> > > > > > 
> > > > > > ---------------------------------------------------------
> > > > > > ----
> > > > > > ----
> > > > > > ----
> > > > > > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > > > > > For additional commands, e-mail: dev-help@hc.apache.org
> > > > > > 
> > > > > > 
> > > > 
> > > > -------------------------------------------------------------
> > > > ----
> > > > ----
> > > > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > > > For additional commands, e-mail: dev-help@hc.apache.org
> > > > 
> > > > 
> > 
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > For additional commands, e-mail: dev-help@hc.apache.org
> > 
> > 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Proposed fix; Re: connection leak reproducer;

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2021-03-31 at 15:31 +0200, Oleg Kalnichevski wrote:
> On Wed, 2021-03-31 at 13:42 +0200, Oleg Kalnichevski wrote:
> > On Tue, 2021-03-30 at 15:54 -0700, Ryan Schmitt wrote:
> > > Reproducer just dropped:
> > > 
> > > https://github.com/rschmitt/httpclient-benchmark
> > > 
> > > Just run the `benchmark` target as usual, using JDK11. I've
> > > tested
> > > it
> > > on
> > > Linux and macOS; Windows *will not work*. The output should look
> > > something
> > > like this:
> > > 
> > 
> > I can reproduce the issue and am trying to find its cause.
> > 
> > Oleg
> > 
> 
> Hi Ryan
> 
> I believe I have found the root cause of the leak. It is a classic
> race
> condition when a connection request completes and its requester gives
> up on it (request times out) at the about same time.
> 
> I am working on a fix.
> 
Hi Ryan

Here's the change-set that fixes the leak for me. It is probably not
the best solution but something that looked the simplest and the most
straight-forward to me. I will be trying to come up with something
better.


https://github.com/ok2c/httpcomponents-core/commit/68174a4faea82e3c9559da63438e4bfa9721a688

Oleg

> Did you have any luck reproducing the other defect?
> 
> Oleg
> 
> 
> > > > > Task :benchmark
> > > > =================================
> > > > HTTP agent: Apache HttpClient (ver: 5.0)
> > > > =================================
> > > > 12800 GET requests
> > > > ---------------------------------
> > > > No connection leak detected...
> > > > Connection leak detected!
> > > > Connection leak detected
> > > > [leased: 3; pending: 0; available: 0; max: 8]
> > > > Document URI:           http://localhost:8888/rnd?c=2000
> > > > Document Length:        0 bytes
> > > > 
> > > > Concurrency level:      64
> > > > Time taken for tests:   0.349 seconds
> > > > Complete requests:      0
> > > > Failed requests:        128
> > > > Content transferred:    0 bytes
> > > > Requests per second:    0.0 [#/sec] (mean)
> > > > 
> > > > BUILD SUCCESSFUL in 4s
> > > > 3 actionable tasks: 2 executed, 1 up-to-date
> > > 
> > > On Tue, Mar 30, 2021 at 3:08 PM Ryan Schmitt <rs...@pobox.com>
> > > wrote:
> > > 
> > > > No need to exchange messages. It turns out that you can
> > > > reproduce
> > > > this
> > > > issue purely with connection timeouts, or TLS handshake
> > > > timeouts.
> > > > It
> > > > appears that both the strict and the lax connection pools can
> > > > leak
> > > > connections, but it appears easier to reproduce with the strict
> > > > one.
> > > > 
> > > > On Tue, Mar 30, 2021 at 1:52 PM Oleg Kalnichevski <
> > > > olegk@apache.org
> > > > wrote:
> > > > 
> > > > > On Tue, 2021-03-30 at 13:35 -0700, Ryan Schmitt wrote:
> > > > > > Good news, actually: I think I *just* reproduced it now. I
> > > > > > ran
> > > > > > a
> > > > > > hacked up
> > > > > > benchmark that sends 100,000 HTTPS requests across 50
> > > > > > threads
> > > > > > with
> > > > > > various
> > > > > > randomized timeouts and delays, and after everything was
> > > > > > done
> > > > > > there
> > > > > > were
> > > > > > still two "leased" connections in the thread pool. This is
> > > > > > exactly
> > > > > > what I
> > > > > > was looking for. A turnkey repro and a fix might not be far
> > > > > > off
> > > > > > now.
> > > > > > 
> > > > > 
> > > > > All connections have a unique ID assigned to them at
> > > > > construction
> > > > > time
> > > > > which is also used in the context logs as a correlation id.
> > > > > 
> > > > > If you could dump the ids of the connections still leased
> > > > > from
> > > > > the pool
> > > > > at the end of a benchmark run, you could look for
> > > > > abnormalities
> > > > > in
> > > > > message exchanges over those connections.
> > > > > 
> > > > > Oleg
> > > > > 
> > 
> > -----------------------------------------------------------------
> > ----
> > To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> > For additional commands, e-mail: dev-help@hc.apache.org
> > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: connection leak reproducer; was Re: [VOTE] Release HttpClient 5.1 based on RC1

Posted by Ryan Schmitt <rs...@apache.org>.
The other defect (IOReactor crashes) is next on my list. I'm less convinced
that that defect is a regression, so I don't plan on blocking the 5.1
release over it.

On Wed, Mar 31, 2021 at 6:32 AM Oleg Kalnichevski <ol...@apache.org> wrote:

> Hi Ryan
>
> I believe I have found the root cause of the leak. It is a classic race
> condition when a connection request completes and its requester gives
> up on it (request times out) at the about same time.
>
> I am working on a fix.
>
> Did you have any luck reproducing the other defect?
>
> Oleg
>

Re: connection leak reproducer; was Re: [VOTE] Release HttpClient 5.1 based on RC1

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2021-03-31 at 13:42 +0200, Oleg Kalnichevski wrote:
> On Tue, 2021-03-30 at 15:54 -0700, Ryan Schmitt wrote:
> > Reproducer just dropped:
> > 
> > https://github.com/rschmitt/httpclient-benchmark
> > 
> > Just run the `benchmark` target as usual, using JDK11. I've tested
> > it
> > on
> > Linux and macOS; Windows *will not work*. The output should look
> > something
> > like this:
> > 
> 
> I can reproduce the issue and am trying to find its cause.
> 
> Oleg
> 

Hi Ryan

I believe I have found the root cause of the leak. It is a classic race
condition when a connection request completes and its requester gives
up on it (request times out) at the about same time.

I am working on a fix.

Did you have any luck reproducing the other defect?

Oleg


> > > > Task :benchmark
> > > =================================
> > > HTTP agent: Apache HttpClient (ver: 5.0)
> > > =================================
> > > 12800 GET requests
> > > ---------------------------------
> > > No connection leak detected...
> > > Connection leak detected!
> > > Connection leak detected
> > > [leased: 3; pending: 0; available: 0; max: 8]
> > > Document URI:           http://localhost:8888/rnd?c=2000
> > > Document Length:        0 bytes
> > > 
> > > Concurrency level:      64
> > > Time taken for tests:   0.349 seconds
> > > Complete requests:      0
> > > Failed requests:        128
> > > Content transferred:    0 bytes
> > > Requests per second:    0.0 [#/sec] (mean)
> > > 
> > > BUILD SUCCESSFUL in 4s
> > > 3 actionable tasks: 2 executed, 1 up-to-date
> > 
> > On Tue, Mar 30, 2021 at 3:08 PM Ryan Schmitt <rs...@pobox.com>
> > wrote:
> > 
> > > No need to exchange messages. It turns out that you can reproduce
> > > this
> > > issue purely with connection timeouts, or TLS handshake timeouts.
> > > It
> > > appears that both the strict and the lax connection pools can
> > > leak
> > > connections, but it appears easier to reproduce with the strict
> > > one.
> > > 
> > > On Tue, Mar 30, 2021 at 1:52 PM Oleg Kalnichevski <
> > > olegk@apache.org
> > > wrote:
> > > 
> > > > On Tue, 2021-03-30 at 13:35 -0700, Ryan Schmitt wrote:
> > > > > Good news, actually: I think I *just* reproduced it now. I
> > > > > ran
> > > > > a
> > > > > hacked up
> > > > > benchmark that sends 100,000 HTTPS requests across 50 threads
> > > > > with
> > > > > various
> > > > > randomized timeouts and delays, and after everything was done
> > > > > there
> > > > > were
> > > > > still two "leased" connections in the thread pool. This is
> > > > > exactly
> > > > > what I
> > > > > was looking for. A turnkey repro and a fix might not be far
> > > > > off
> > > > > now.
> > > > > 
> > > > 
> > > > All connections have a unique ID assigned to them at
> > > > construction
> > > > time
> > > > which is also used in the context logs as a correlation id.
> > > > 
> > > > If you could dump the ids of the connections still leased from
> > > > the pool
> > > > at the end of a benchmark run, you could look for abnormalities
> > > > in
> > > > message exchanges over those connections.
> > > > 
> > > > Oleg
> > > > 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: connection leak reproducer; was Re: [VOTE] Release HttpClient 5.1 based on RC1

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Tue, 2021-03-30 at 15:54 -0700, Ryan Schmitt wrote:
> Reproducer just dropped:
> 
> https://github.com/rschmitt/httpclient-benchmark
> 
> Just run the `benchmark` target as usual, using JDK11. I've tested it
> on
> Linux and macOS; Windows *will not work*. The output should look
> something
> like this:
> 

I can reproduce the issue and am trying to find its cause.

Oleg

> > > Task :benchmark
> > =================================
> > HTTP agent: Apache HttpClient (ver: 5.0)
> > =================================
> > 12800 GET requests
> > ---------------------------------
> > No connection leak detected...
> > Connection leak detected!
> > Connection leak detected
> > [leased: 3; pending: 0; available: 0; max: 8]
> > Document URI:           http://localhost:8888/rnd?c=2000
> > Document Length:        0 bytes
> > 
> > Concurrency level:      64
> > Time taken for tests:   0.349 seconds
> > Complete requests:      0
> > Failed requests:        128
> > Content transferred:    0 bytes
> > Requests per second:    0.0 [#/sec] (mean)
> > 
> > BUILD SUCCESSFUL in 4s
> > 3 actionable tasks: 2 executed, 1 up-to-date
> 
> On Tue, Mar 30, 2021 at 3:08 PM Ryan Schmitt <rs...@pobox.com>
> wrote:
> 
> > No need to exchange messages. It turns out that you can reproduce
> > this
> > issue purely with connection timeouts, or TLS handshake timeouts.
> > It
> > appears that both the strict and the lax connection pools can leak
> > connections, but it appears easier to reproduce with the strict
> > one.
> > 
> > On Tue, Mar 30, 2021 at 1:52 PM Oleg Kalnichevski <olegk@apache.org
> > >
> > wrote:
> > 
> > > On Tue, 2021-03-30 at 13:35 -0700, Ryan Schmitt wrote:
> > > > Good news, actually: I think I *just* reproduced it now. I ran
> > > > a
> > > > hacked up
> > > > benchmark that sends 100,000 HTTPS requests across 50 threads
> > > > with
> > > > various
> > > > randomized timeouts and delays, and after everything was done
> > > > there
> > > > were
> > > > still two "leased" connections in the thread pool. This is
> > > > exactly
> > > > what I
> > > > was looking for. A turnkey repro and a fix might not be far off
> > > > now.
> > > > 
> > > 
> > > All connections have a unique ID assigned to them at construction
> > > time
> > > which is also used in the context logs as a correlation id.
> > > 
> > > If you could dump the ids of the connections still leased from
> > > the pool
> > > at the end of a benchmark run, you could look for abnormalities
> > > in
> > > message exchanges over those connections.
> > > 
> > > Oleg
> > > 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: connection leak reproducer; was Re: [VOTE] Release HttpClient 5.1 based on RC1

Posted by Ryan Schmitt <rs...@apache.org>.
Reproducer just dropped:

https://github.com/rschmitt/httpclient-benchmark

Just run the `benchmark` target as usual, using JDK11. I've tested it on
Linux and macOS; Windows *will not work*. The output should look something
like this:

> > Task :benchmark
> =================================
> HTTP agent: Apache HttpClient (ver: 5.0)
> =================================
> 12800 GET requests
> ---------------------------------
> No connection leak detected...
> Connection leak detected!
> Connection leak detected
> [leased: 3; pending: 0; available: 0; max: 8]
> Document URI:           http://localhost:8888/rnd?c=2000
> Document Length:        0 bytes
>
> Concurrency level:      64
> Time taken for tests:   0.349 seconds
> Complete requests:      0
> Failed requests:        128
> Content transferred:    0 bytes
> Requests per second:    0.0 [#/sec] (mean)
>
> BUILD SUCCESSFUL in 4s
> 3 actionable tasks: 2 executed, 1 up-to-date

On Tue, Mar 30, 2021 at 3:08 PM Ryan Schmitt <rs...@pobox.com> wrote:

> No need to exchange messages. It turns out that you can reproduce this
> issue purely with connection timeouts, or TLS handshake timeouts. It
> appears that both the strict and the lax connection pools can leak
> connections, but it appears easier to reproduce with the strict one.
>
> On Tue, Mar 30, 2021 at 1:52 PM Oleg Kalnichevski <ol...@apache.org>
> wrote:
>
>> On Tue, 2021-03-30 at 13:35 -0700, Ryan Schmitt wrote:
>> > Good news, actually: I think I *just* reproduced it now. I ran a
>> > hacked up
>> > benchmark that sends 100,000 HTTPS requests across 50 threads with
>> > various
>> > randomized timeouts and delays, and after everything was done there
>> > were
>> > still two "leased" connections in the thread pool. This is exactly
>> > what I
>> > was looking for. A turnkey repro and a fix might not be far off now.
>> >
>>
>> All connections have a unique ID assigned to them at construction time
>> which is also used in the context logs as a correlation id.
>>
>> If you could dump the ids of the connections still leased from the pool
>> at the end of a benchmark run, you could look for abnormalities in
>> message exchanges over those connections.
>>
>> Oleg
>>
>

Re: connection leak reproducer; was Re: [VOTE] Release HttpClient 5.1 based on RC1

Posted by Ryan Schmitt <rs...@pobox.com>.
No need to exchange messages. It turns out that you can reproduce this
issue purely with connection timeouts, or TLS handshake timeouts. It
appears that both the strict and the lax connection pools can leak
connections, but it appears easier to reproduce with the strict one.

On Tue, Mar 30, 2021 at 1:52 PM Oleg Kalnichevski <ol...@apache.org> wrote:

> On Tue, 2021-03-30 at 13:35 -0700, Ryan Schmitt wrote:
> > Good news, actually: I think I *just* reproduced it now. I ran a
> > hacked up
> > benchmark that sends 100,000 HTTPS requests across 50 threads with
> > various
> > randomized timeouts and delays, and after everything was done there
> > were
> > still two "leased" connections in the thread pool. This is exactly
> > what I
> > was looking for. A turnkey repro and a fix might not be far off now.
> >
>
> All connections have a unique ID assigned to them at construction time
> which is also used in the context logs as a correlation id.
>
> If you could dump the ids of the connections still leased from the pool
> at the end of a benchmark run, you could look for abnormalities in
> message exchanges over those connections.
>
> Oleg
>