You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Justin Erenkrantz <ju...@erenkrantz.com> on 2006/02/27 09:35:56 UTC

httpd-trunk with MSIE (async read broken?) and AJP problems was Re: httpd-trunk sucking CPU on ajax

On Sun, Feb 26, 2006 at 11:55:31PM -0800, Sander Temme wrote:
> Issues.apache.org, which is running trunk or something very close to  
> it to debug the JIRA issues, has some child processes sucking extreme  
> quantities of CPU. For instance:

Also, we're getting reports that IE clients on Windows consistently get
408s (server timeout) immediately from httpd.  Some users with Safari and
Firefox can see this problem occassionally, but MSIE triggers it a good
portion of the time.

Coupled with the EOR brokenness (infinite loop?), this sounds like the
async-read code is severely busted.  Either we should fix it, or we need to
revert it.  =)

There are also still problems with proxy code corrupting data from the AJP
side of things.  See:

https://issues.apache.org/jira/browse/INFRA-697

for the infrastructure issue open for this.  We were at 2.2.0 but the proxy
code is just not working right.  We confirmed network traces that Tomcat is
sending the right bits back, but httpd is corrupting data near the end of
the AJP send_body_chunk responses such that the client sees junk.

My initial thought was that the we could have an issue with the transient
buckets, but when I switched them to be a heap bucket - the corruption
continued.  Only when I did the SSL flush patch did the corruption largely
go away - but some folks have still reported occassional problems since I
put that change live Friday night.

Any and all eyes appreciated, but we've got serious brokenness and its time
to fix up our dogfood...  ;-)  -- justin

Re: httpd-trunk with MSIE (async read broken?) and AJP problems was Re: httpd-trunk sucking CPU on ajax

Posted by Brian Pane <br...@apache.org>.
Justin Erenkrantz wrote:

>On Sun, Feb 26, 2006 at 11:55:31PM -0800, Sander Temme wrote:
>  
>
>>Issues.apache.org, which is running trunk or something very close to  
>>it to debug the JIRA issues, has some child processes sucking extreme  
>>quantities of CPU. For instance:
>>    
>>
>
>Also, we're getting reports that IE clients on Windows consistently get
>408s (server timeout) immediately from httpd.  Some users with Safari and
>Firefox can see this problem occassionally, but MSIE triggers it a good
>portion of the time.
>
>Coupled with the EOR brokenness (infinite loop?), this sounds like the
>async-read code is severely busted.  Either we should fix it, or we need to
>revert it.  =)
>  
>

I recommend reverting it, unless someone else has time to look at it in 
the near future.
I haven't been able to look at the async code in a few weeks due to 
other schedule
commitments, and I expect that will be the case for a few more weeks.

I'll revert the code on the trunk this evening (though if anybody wants 
to do so earlier,
no problem).

Brian


Re: httpd-trunk with MSIE (async read broken?) and AJP problems was Re: httpd-trunk sucking CPU on ajax

Posted by Brian Pane <br...@apache.org>.
On Feb 27, 2006, at 11:42 AM, Ruediger Pluem wrote:
>
> Do we really have the async read code on trunk? Maybe I missed some  
> commits, but
> I thought that we only have the async write stuff in the trunk.

The trunk had the async write implementation plus a refactored version
of the synchronous read code (ported from the async-read-dev branch
but still using blocking reads).  I've just reverted the latter to  
eliminate
one possible source of problems.  The async write completion is still
in place in the trunk.

Brian


Re: httpd-trunk with MSIE (async read broken?) and AJP problems was Re: httpd-trunk sucking CPU on ajax

Posted by Ruediger Pluem <rp...@apache.org>.

On 02/27/2006 10:20 PM, Ruediger Pluem wrote:
> 
> On 02/27/2006 08:58 PM, Justin Erenkrantz wrote:

> 
> 
> Meanwhile I doubt that Tomcat sent the correct response. But maybe it is just me,
> so lets see what I found from the dump Justin sent over to me:
> 
> The following is the start of an AJP send_body_chunk (packet 1902):
> 
> 41 42 1f fc 03 20  00 0a
> 
> 41 42   are the id bytes => OK
> 1f fc   says that the whole message has 1FCFC = 8188 bytes which is correct as we have to
>         substract the 4 bytes at the start (41 42 1f fc). => OK
> 
> 03      says send_body_chunk => OK
> 20 00   is the length of the body chunk in byte 2000 = 8192 bytes => WRONG. The length of the
>         body chunk is only 8188 - 3 (3 bytes for 03 20 00) = 8185 bytes long. Keep in mind that
>         this length is actually used as the length of the transient buffer.
> 
> 0a      First data byte of chunk => OK
> 
> Now the next AJP send_body_chunk in row (Packet 1902 at the end)
> 
> 41 42 00 0c 03 20 00 20
> 
> 41 42   are the id bytes => OK
> 00 0C   says that the whole message has 000C = 12 bytes which is correct as we have to
>         substract the 4 bytes at the start (41 42 1f fc). => OK
> 03      says send_body_chunk => OK
> 20 00   is the length of the body chunk in byte 2000 = 8192 bytes => WRONG. The length of the
>         body chunk is only 12 - 3 (3 bytes for 03 20 00) = 9 bytes long. Keep in mind that
>         this length is actually used as the length of the transient buffer.
> 
> The next AJP send_body_chunk in row (Packet 1903) is similar to the first I analysed:
> 
> Message length 8188, chunk length 8192.
> 

After some chat with Justin on IRC, we had a look in the code of Tomcat 5.5.15.
We believe that the root cause for the problem can found in line 1265 of
AjpAprProcessor.java (thats within the doWrite method).

Instead of

outputBuffer.putShort((short) chunk.getLength());

it should be

outputBuffer.putShort((short) thisTime);


Regards

Rüdiger







Re: httpd-trunk with MSIE (async read broken?) and AJP problems was Re: httpd-trunk sucking CPU on ajax

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On 2/27/06, Ruediger Pluem <rp...@apache.org> wrote:
>
> Meanwhile I doubt that Tomcat sent the correct response. But maybe it is
> just me,
> so lets see what I found from the dump Justin sent over to me:


I'm still digesting this, but some quick questions and answers:

Were the other packets in this stream okay?


> Which Tomcat version are we actually running on Ajax? Maybe I could have a
> look into
> the AJP connector code on Tomcat side.


As far as I can tell, we're using 5.5.15.

BTW: Justins idea to exchange the transient buckets with heap buckets was
> also correct
> as it seems that the transient buckets might get overwritten in some
> situations.
> So Justin could you please commit this to the trunk?


Sure, I will do so tonight.  -- justin

Re: httpd-trunk with MSIE (async read broken?) and AJP problems was Re: httpd-trunk sucking CPU on ajax

Posted by Ruediger Pluem <rp...@apache.org>.

On 02/27/2006 08:58 PM, Justin Erenkrantz wrote:
> 
>>There are also still problems with proxy code corrupting data from the AJP
>>
>>>side of things.  See:
>>>
>>>https://issues.apache.org/jira/browse/INFRA-697
>>>
>>>for the infrastructure issue open for this.  We were at 2.2.0 but the
>>
>>proxy
>>
>>>code is just not working right.  We confirmed network traces that Tomcat
>>
>>is
>>
>>>sending the right bits back, but httpd is corrupting data near the end
>>
>>of
>>
>>>the AJP send_body_chunk responses such that the client sees junk.

Meanwhile I doubt that Tomcat sent the correct response. But maybe it is just me,
so lets see what I found from the dump Justin sent over to me:

The following is the start of an AJP send_body_chunk (packet 1902):

41 42 1f fc 03 20  00 0a

41 42   are the id bytes => OK
1f fc   says that the whole message has 1FCFC = 8188 bytes which is correct as we have to
        substract the 4 bytes at the start (41 42 1f fc). => OK

03      says send_body_chunk => OK
20 00   is the length of the body chunk in byte 2000 = 8192 bytes => WRONG. The length of the
        body chunk is only 8188 - 3 (3 bytes for 03 20 00) = 8185 bytes long. Keep in mind that
        this length is actually used as the length of the transient buffer.

0a      First data byte of chunk => OK

Now the next AJP send_body_chunk in row (Packet 1902 at the end)

41 42 00 0c 03 20 00 20

41 42   are the id bytes => OK
00 0C   says that the whole message has 000C = 12 bytes which is correct as we have to
        substract the 4 bytes at the start (41 42 1f fc). => OK
03      says send_body_chunk => OK
20 00   is the length of the body chunk in byte 2000 = 8192 bytes => WRONG. The length of the
        body chunk is only 12 - 3 (3 bytes for 03 20 00) = 9 bytes long. Keep in mind that
        this length is actually used as the length of the transient buffer.

The next AJP send_body_chunk in row (Packet 1903) is similar to the first I analysed:

Message length 8188, chunk length 8192.

Thoughts on this? Did I see something wrong?

Which Tomcat version are we actually running on Ajax? Maybe I could have a look into
the AJP connector code on Tomcat side.

BTW: Justins idea to exchange the transient buckets with heap buckets was also correct
as it seems that the transient buckets might get overwritten in some situations.
So Justin could you please commit this to the trunk?


Regards

Rüdiger



Re: httpd-trunk with MSIE (async read broken?) and AJP problems was Re: httpd-trunk sucking CPU on ajax

Posted by Justin Erenkrantz <ju...@erenkrantz.com>.
On 2/27/06, Ruediger Pluem <rp...@apache.org> wrote:
>
> > Also, we're getting reports that IE clients on Windows consistently get
> > 408s (server timeout) immediately from httpd.  Some users with Safari
> and
> > Firefox can see this problem occassionally, but MSIE triggers it a good
> > portion of the time.
>
> Is this only with https or also with http?


AFAIK, both https and http.


> > Coupled with the EOR brokenness (infinite loop?), this sounds like the
> > async-read code is severely busted.  Either we should fix it, or we need
> to
> > revert it.  =)
>
> Do we really have the async read code on trunk? Maybe I missed some
> commits, but
> I thought that we only have the async write stuff in the trunk.


I thought it was the opposite, but I could very well be wrong.

I do know that the 408s are happening way down in the core - mod_proxy never
even sees the requests that get a 408.  So, anything that touched the
reading of requests by the core is suspect.  ;-)

> There are also still problems with proxy code corrupting data from the AJP
> > side of things.  See:
> >
> > https://issues.apache.org/jira/browse/INFRA-697
> >
> > for the infrastructure issue open for this.  We were at 2.2.0 but the
> proxy
> > code is just not working right.  We confirmed network traces that Tomcat
> is
> > sending the right bits back, but httpd is corrupting data near the end
> of
> > the AJP send_body_chunk responses such that the client sees junk.
>
> I saw in in jira ticket that you updated to trunk on ajax. Which revision
> is
> running on trunk?


Whatever was there on Friday night.  =)

'svn info' says the last change to trunk was 380878.

> My initial thought was that the we could have an issue with the transient
> > buckets, but when I switched them to be a heap bucket - the corruption
> > continued.  Only when I did the SSL flush patch did the corruption
> largely
>
> I don't think that transient buckets matter on prefork, but I think that
> they become
> a problem with Event MPM and async write. So if you have this patch still
> at hand
> I think it will be a good idea to commit it.


Agreed.

I will note that the only plausible way I can see that this corruption seems
to happen is that AJP message buffer gets trounced some how.  As far as I
can tell, the HTTP protocol (i.e. C-L headers) are correct as-is the SSL -
so the lengths are fine - but the buffer we write to the client is trashed.
I'm not sure how though.  Very weird.  It happens with 2.2.0, so I know that
this isn't a regression in trunk (like the 408s are).  -- justin

Re: httpd-trunk with MSIE (async read broken?) and AJP problems was Re: httpd-trunk sucking CPU on ajax

Posted by Ruediger Pluem <rp...@apache.org>.

On 02/27/2006 09:35 AM, Justin Erenkrantz wrote:
> On Sun, Feb 26, 2006 at 11:55:31PM -0800, Sander Temme wrote:

> 
> 
> Also, we're getting reports that IE clients on Windows consistently get
> 408s (server timeout) immediately from httpd.  Some users with Safari and
> Firefox can see this problem occassionally, but MSIE triggers it a good
> portion of the time.

Is this only with https or also with http?

> 
> Coupled with the EOR brokenness (infinite loop?), this sounds like the
> async-read code is severely busted.  Either we should fix it, or we need to
> revert it.  =)

Do we really have the async read code on trunk? Maybe I missed some commits, but
I thought that we only have the async write stuff in the trunk.

> 
> There are also still problems with proxy code corrupting data from the AJP
> side of things.  See:
> 
> https://issues.apache.org/jira/browse/INFRA-697
> 
> for the infrastructure issue open for this.  We were at 2.2.0 but the proxy
> code is just not working right.  We confirmed network traces that Tomcat is
> sending the right bits back, but httpd is corrupting data near the end of
> the AJP send_body_chunk responses such that the client sees junk.

I saw in in jira ticket that you updated to trunk on ajax. Which revision is
running on trunk?

> 
> My initial thought was that the we could have an issue with the transient
> buckets, but when I switched them to be a heap bucket - the corruption
> continued.  Only when I did the SSL flush patch did the corruption largely

I don't think that transient buckets matter on prefork, but I think that they become
a problem with Event MPM and async write. So if you have this patch still at hand
I think it will be a good idea to commit it.


Regards

Rüdiger