You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hc.apache.org by "Idzerda, Edan" <Ed...@PremierInc.com> on 2016/12/12 21:15:23 UTC

How to tell if Async dispatcher thread is busy?

Hello!  Our reverse proxy uses the Async Client pool to handle connections to backend servers.  We've been tracking a problem for a while where we observe the initial TCP connection is made, but no thread is available to handle the SSL setup before a 10 second timeout expires.  We get into trouble because some of our backend servers are very slow, and some of our clients download very slowly.


I'm experimenting with a patch to AbstractMultiworkerIOReactor.addChannel() to determine whether the next dispatcher thread is "busy."  My first try was to look at bufferedSessions from the BaseIOReactor, and go through the list of dispatchers one time to see if I can find a free one.


        int i = Math.abs(this.currentWorker++ % this.workerCount);

        for (int j = 0; j < this.workerCount; j++) {

            if (this.dispatchers[i].getSessionCount() == 0) {

                break;

            }

            i = Math.abs(this.currentWorker++ % this.workerCount);

        }

        this.dispatchers[i].addChannel(entry);

This seems to help us in MOST of the cases we see this issue in production, but there still seem to be a small number of threads which collide.  I'm testing a different version which looks at AbstractIOReactor "sessions" to determine thread busy state, but it never seems to show more than "1" session if I look at the size after piling up slow connections on top of each other.

I have two questions:
    Is there a better way to determine whether a thread is busy?
    Would you be willing to accept a patch to make the dispatchers array in AbstractMultiworkerIOReactor "protected" so I can implement my own ConnectingIOReactor that overrides addChannel() with my own thread selection model?

Thanks a lot for your help and for providing such a great library to the community!

- edan




Re: How to tell if Async dispatcher thread is busy?

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Wed, 2016-12-14 at 03:38 +0000, Idzerda, Edan wrote:
> 
> > On Dec 13, 2016, at 4:59 AM, Oleg Kalnichevski <ol...@apache.org> wrote:
> > 
> > On Mon, 2016-12-12 at 21:15 +0000, Idzerda, Edan wrote:
> >> Hello!  Our reverse proxy uses the Async Client pool to handle connections to backend servers.  We've been tracking a problem for a while where we observe the initial TCP connection is made, but no thread is available to handle the SSL setup before a 10 second timeout expires.  We get into trouble because some of our backend servers are very slow, and some of our clients download very slowly.
> >> 
> >> 
> >> I'm experimenting with a patch to AbstractMultiworkerIOReactor.addChannel() to determine whether the next dispatcher thread is "busy."  My first try was to look at bufferedSessions from the BaseIOReactor, and go through the list of dispatchers one time to see if I can find a free one.
> >> 
> >> 
> >>        int i = Math.abs(this.currentWorker++ % this.workerCount);
> >> 
> >>        for (int j = 0; j < this.workerCount; j++) {
> >>            if (this.dispatchers[i].getSessionCount() == 0) {
> >>                break;
> >>            }
> >>            i = Math.abs(this.currentWorker++ % this.workerCount);
> >>        }
> >>        this.dispatchers[i].addChannel(entry);
> >> 
> >> This seems to help us in MOST of the cases we see this issue in production, but there still seem to be a small number of threads which collide.  I'm testing a different version which looks at AbstractIOReactor "sessions" to determine thread busy state, but it never seems to show more than "1" session if I look at the size after piling up slow connections on top of each other.
> >> 
> >> I have two questions:
> >>    Is there a better way to determine whether a thread is busy?
> >>    Would you be willing to accept a patch to make the dispatchers array in AbstractMultiworkerIOReactor "protected" so I can implement my own ConnectingIOReactor that overrides addChannel() with my own thread selection model?
> >> 
> >> Thanks a lot for your help and for providing such a great library to the community!
> >> 
> >> - edan
> >> 
> > 
> > Hi Edan
> > 
> > What I do not quite understand is why i/o dispatch threads get blocked
> > for 10 seconds or longer. This sounds awfully suspicious.
> > 
> > I could imagine exposing the list of i/o dispatchers to subclasses of
> > AbstractMultiworkerIOReactor in 4.4.x branch but would rather prefer to
> > keep it as a last resort.
> > 
> > Oleg
> 
> Thanks..  I would prefer not to have to patch httpcore-nio like this if I could work out the root cause.  Since I am still seeing connections failing to complete SSL within 10 seconds with my first patch (above), I am trying a new one now that uses an AtomicInteger for currentWorker.  We are seeing far less connection problems with the patch, but there are still enough apparent thread selection collisions that some requests fail.
> 
> The only way I have been able to reproduce this problem is by using an artificially rate limited connection (ex, curl --limit-rate 1m) and downloading a relatively large file.  If I use a small file, say 50K, I notice that the dispatchers thread do not get stuck. I can download more files than I have worker threads, and AbstractIOReactor\u2019s \u201csessions\u201d set count stays at 0.  With a larger file, like 500k, the sessions size goes to 1, and I can only download the same number of files as I have worker threads.
> 
> Does this make any sense to you?  Is it possible the higher level proxy library is hanging on to the HttpResponse\u2019s Entity too long?  I see they call HttpEntity.getContent() and create an InputStream out of it\u2026

This is likely to be the cause of your grief. InputStream / OutputStream
interfaces are inherently blocking and they do not mix well with event
driven i/o without quite bit of effort and complex code. By using
blocking i/o to produce requests or consume response the higher level
proxy library likely blocks i/o dispatch threads and starves other
connections managed by the same dispatcher.

I would recommend rewriting your code based on native
HttpAsyncRequestProducer / HttpAsyncResponseConsumer for more optimal
results.

Oleg

>  But why would that make a worker thread become non-responsive until it finishes?   I see a note on IOEventDispatch suggesting that \u201call methods of this interface are executed on the dispatch thread of the I/O reactor \u2026 it is important that processing that takes place in the event methods will not block the dispatch thread for too long, as the I/O reactor will be unable to react to other events\u201d
> 
> Is that worth pursuing?  Any suggestions on how to debug this would be appreciated!
> 
> - edan
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
> For additional commands, e-mail: dev-help@hc.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org


Re: How to tell if Async dispatcher thread is busy?

Posted by "Idzerda, Edan" <Ed...@PremierInc.com>.
> On Dec 13, 2016, at 4:59 AM, Oleg Kalnichevski <ol...@apache.org> wrote:
> 
> On Mon, 2016-12-12 at 21:15 +0000, Idzerda, Edan wrote:
>> Hello!  Our reverse proxy uses the Async Client pool to handle connections to backend servers.  We've been tracking a problem for a while where we observe the initial TCP connection is made, but no thread is available to handle the SSL setup before a 10 second timeout expires.  We get into trouble because some of our backend servers are very slow, and some of our clients download very slowly.
>> 
>> 
>> I'm experimenting with a patch to AbstractMultiworkerIOReactor.addChannel() to determine whether the next dispatcher thread is "busy."  My first try was to look at bufferedSessions from the BaseIOReactor, and go through the list of dispatchers one time to see if I can find a free one.
>> 
>> 
>>        int i = Math.abs(this.currentWorker++ % this.workerCount);
>> 
>>        for (int j = 0; j < this.workerCount; j++) {
>>            if (this.dispatchers[i].getSessionCount() == 0) {
>>                break;
>>            }
>>            i = Math.abs(this.currentWorker++ % this.workerCount);
>>        }
>>        this.dispatchers[i].addChannel(entry);
>> 
>> This seems to help us in MOST of the cases we see this issue in production, but there still seem to be a small number of threads which collide.  I'm testing a different version which looks at AbstractIOReactor "sessions" to determine thread busy state, but it never seems to show more than "1" session if I look at the size after piling up slow connections on top of each other.
>> 
>> I have two questions:
>>    Is there a better way to determine whether a thread is busy?
>>    Would you be willing to accept a patch to make the dispatchers array in AbstractMultiworkerIOReactor "protected" so I can implement my own ConnectingIOReactor that overrides addChannel() with my own thread selection model?
>> 
>> Thanks a lot for your help and for providing such a great library to the community!
>> 
>> - edan
>> 
> 
> Hi Edan
> 
> What I do not quite understand is why i/o dispatch threads get blocked
> for 10 seconds or longer. This sounds awfully suspicious.
> 
> I could imagine exposing the list of i/o dispatchers to subclasses of
> AbstractMultiworkerIOReactor in 4.4.x branch but would rather prefer to
> keep it as a last resort.
> 
> Oleg

Thanks..  I would prefer not to have to patch httpcore-nio like this if I could work out the root cause.  Since I am still seeing connections failing to complete SSL within 10 seconds with my first patch (above), I am trying a new one now that uses an AtomicInteger for currentWorker.  We are seeing far less connection problems with the patch, but there are still enough apparent thread selection collisions that some requests fail.

The only way I have been able to reproduce this problem is by using an artificially rate limited connection (ex, curl --limit-rate 1m) and downloading a relatively large file.  If I use a small file, say 50K, I notice that the dispatchers thread do not get stuck. I can download more files than I have worker threads, and AbstractIOReactor’s “sessions” set count stays at 0.  With a larger file, like 500k, the sessions size goes to 1, and I can only download the same number of files as I have worker threads.

Does this make any sense to you?  Is it possible the higher level proxy library is hanging on to the HttpResponse’s Entity too long?  I see they call HttpEntity.getContent() and create an InputStream out of it… But why would that make a worker thread become non-responsive until it finishes?   I see a note on IOEventDispatch suggesting that “all methods of this interface are executed on the dispatch thread of the I/O reactor … it is important that processing that takes place in the event methods will not block the dispatch thread for too long, as the I/O reactor will be unable to react to other events”

Is that worth pursuing?  Any suggestions on how to debug this would be appreciated!

- edan



Re: How to tell if Async dispatcher thread is busy?

Posted by Oleg Kalnichevski <ol...@apache.org>.
On Mon, 2016-12-12 at 21:15 +0000, Idzerda, Edan wrote:
> Hello!  Our reverse proxy uses the Async Client pool to handle connections to backend servers.  We've been tracking a problem for a while where we observe the initial TCP connection is made, but no thread is available to handle the SSL setup before a 10 second timeout expires.  We get into trouble because some of our backend servers are very slow, and some of our clients download very slowly.
> 
> 
> I'm experimenting with a patch to AbstractMultiworkerIOReactor.addChannel() to determine whether the next dispatcher thread is "busy."  My first try was to look at bufferedSessions from the BaseIOReactor, and go through the list of dispatchers one time to see if I can find a free one.
> 
> 
>         int i = Math.abs(this.currentWorker++ % this.workerCount);
> 
>         for (int j = 0; j < this.workerCount; j++) {
> 
>             if (this.dispatchers[i].getSessionCount() == 0) {
> 
>                 break;
> 
>             }
> 
>             i = Math.abs(this.currentWorker++ % this.workerCount);
> 
>         }
> 
>         this.dispatchers[i].addChannel(entry);
> 
> This seems to help us in MOST of the cases we see this issue in production, but there still seem to be a small number of threads which collide.  I'm testing a different version which looks at AbstractIOReactor "sessions" to determine thread busy state, but it never seems to show more than "1" session if I look at the size after piling up slow connections on top of each other.
> 
> I have two questions:
>     Is there a better way to determine whether a thread is busy?
>     Would you be willing to accept a patch to make the dispatchers array in AbstractMultiworkerIOReactor "protected" so I can implement my own ConnectingIOReactor that overrides addChannel() with my own thread selection model?
> 
> Thanks a lot for your help and for providing such a great library to the community!
> 
> - edan
> 

Hi Edan

What I do not quite understand is why i/o dispatch threads get blocked
for 10 seconds or longer. This sounds awfully suspicious.

I could imagine exposing the list of i/o dispatchers to subclasses of
AbstractMultiworkerIOReactor in 4.4.x branch but would rather prefer to
keep it as a last resort.

Oleg


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@hc.apache.org
For additional commands, e-mail: dev-help@hc.apache.org