You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@httpd.apache.org by Michael Caplan <mi...@eggplant.ws> on 2008/09/13 16:02:11 UTC

[users@httpd] Chunked + Gzip

Hi,

I have a question about how Transfer-Encoding: chunked works with a 
Content-Encoding gzip.  Reading the HTTP 1.1 RFC, 
http://en.wikipedia.org/wiki/Chunked_transfer_encoding and other 
discussions on the net that touch on this subject I'm a little confused 
on how the web server and browser client handles preparing and reading 
the data.

The RFC isn't clear on this point (or at least I'm not finding the right 
information), but what I have gathered is that:

 1. the gzip content encoding happens on the entire body before it is 
chunked.
 2. the ungzipping happens on the entire body after it is dechunked.

If I got this right (which I don't think I do), the web server would 
need to first dechunk data produced from a dynamic source (PHP) before 
it can apply the gzip content encoding.  For example, mod_gzip would not 
apply the content encoding until it dechunked the data 
(http://schroepl.net/projekte/mod_gzip/config.htm) and then delivered it 
to the client.

Likewise, on the client end, it would only be able to begin interpreting 
HTML following receiving the entire chucked payload, dechunk it, and 
then ungzip it.

But, that seems contrary to what Apache + mod_deflate actually does, as 
well as my browser (Firefox).  For example, I can create a PHP script 
that controls the chunks created by calling the flush() function:

<html>
    <head>
        <title>Hi</title>
        <link rel="stylesheet" href="style.css" type="text/css" 
media="all" />
    </head>
<?php flush(); sleep(5); ?>
    <body onload="loaded();">
        <h1>Hi</h1>
    </body>
</html>

The complete client server communication looks like this:

GET /samples/flush/test.php HTTP/1.1
Host: ***
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.16) 
Gecko/20080703 Mandriva/2.0.0.16-1.1mdv2008.1 (2008.1) Firefox/2.0.0.16 
FirePHP/0.1.0
Accept: 
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 

Accept-Language: en-ca,en;q=0.7,en-us;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.1 200 OK
Date: Sat, 13 Sep 2008 13:03:26 GMT
Server: Apache
Vary: User-Agent,Accept-Encoding
Content-Encoding: gzip
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html
 
6c
..........D.. .0..{..<.Y 
I...<..A....AI$.../.g.5...g..a.&...x...T...h.b..d...../.\
.='..ns..X$.N.L.;].......
32
....S*...r..Sl.@dj.........`..).n..;......F.L=....
0

If I have Apache + mod_deflate configured to gzip up the output, the 
chunks created reflect where I flush in the PHP script -- in this case 
two chunks -- one for the header, and another for the body.  If I put in 
an artificial time delay that will delay the delivery of the second 
chunk (as I did above with sleep(5)), I can also see two other 
interesting things. 

 1. The first gzip compressed chunk is delivered independently of the 
second chunk (that comes 5 seconds later).  Which indicates the gzipping 
is happening chunk by chunk, not on the entire body at once.
 2. Second, the browser receives the first gzip chunk and is able to 
interpret it _before_ it gets the entire payload and dechunks it all.  I 
say this because while the browser is waiting on the second chunk, it 
will download the referenced CSS file.

This seems to fly in he face of what I've read on how it is supposed to 
work.  Instead, it appears to be working like this:

 1. the gzip content encoding happens on the entire body, chunk by chunk.
 2. the ungzipping happens on the entire body, chunk by chunk.


Is this behavior noted mean I am mis interpreting the HTTP RFC, or is 
the implementation not compliant?  Can anyone shine some light on the 
subject?


Thanks,

Mike

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Chunked + Gzip

Posted by Michael Caplan <mi...@eggplant.ws>.
Nick Kew wrote:
> On Sat, 13 Sep 2008 15:00:44 -0300
> Michael Caplan <mi...@eggplant.ws> wrote:
>
>   
>> I guess I am hung up on the legacy of mod_gzip, which forced
>> dynamically generated data to be dechunked before gzipped.
>>     
>
> WTF are you talking about?
>
>   
From: http://schroepl.net/projekte/mod_gzip/config.htm

##########################
### transfer encodings ###
##########################

# ---------------------------------------------------------------------
# Allow mod_gzip to eliminate the HTTP header
#    'Transfer-encoding: chunked'
# and join the chunks to one (compressable) packet
  *mod_gzip_dechunk              Yes*
# (this is required for handling several types of dynamically generated
# contents, especially for CGI and SSI pages, but also for pages produced
# by some Java Servlet interpreters.
# ---------------------------------------------------------------------


 From http://schroepl.net/projekte/mod_gzip/status.htm

SEND_AS_IS:DECHUNK_OPTION_IS_OFF

originated by    
mod_gzip_sendfile2

meaning
A Transfer-Encoding: chunked was detectet, but in the configuration 
mod_gzip has not been allowed to remove this encoding (i. e. collect all 
chunks and join them to one packet, whose content would then be 
compressable). The directive mod_gzip_dechunk Yes would have allowed 
mod_gzip to compress this request.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Chunked + Gzip

Posted by Nick Kew <ni...@webthing.com>.
On Sat, 13 Sep 2008 15:00:44 -0300
Michael Caplan <mi...@eggplant.ws> wrote:

> I guess I am hung up on the legacy of mod_gzip, which forced
> dynamically generated data to be dechunked before gzipped.

WTF are you talking about?

> PS - I agree that PHP doesn't concern itself with chunking,

Exactly.  So why the suggestion that mod_gzip should want
to dechunk anything?

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Chunked + Gzip

Posted by Michael Caplan <mi...@eggplant.ws>.
Hi Nick,

Thanks for the response.  Pipelining of deflate and chunking filters is
exactly what I am seeing Apache perform -- just didn't know what to call
it.  To rephrase the question, does the HTTP 1.1 RFC address how to
handle the layering of a chunked transfer encoding on top of a gzip
content encoding?

I guess I am hung up on the legacy of mod_gzip, which forced dynamically
generated data to be dechunked before gzipped. Perhaps an implementation
limitation, I assumed that this was a matter of protocol than anything
else.  Sounds like I am mistaken about that.

Best,

Mike

PS - I agree that PHP doesn't concern itself with chunking, but it can
influence the web server in this regard.  By flush()ing PHP's output
buffers, it "tries to push all the output so far to the user's
browser."  Depending on the web server and web server buffing scheme, it
may result (does in Apache + mod_php) in the web server packaging up a
chunk and firing it off.

Nick Kew wrote:
> On Sat, 13 Sep 2008 15:53:10 +0100
> Nick Kew <ni...@webthing.com> wrote:
>
>   
>> On Sat, 13 Sep 2008 11:02:11 -0300
>> Michael Caplan <mi...@eggplant.ws> wrote:
>>
>>     
>>>  1. the gzip content encoding happens on the entire body before it
>>> is chunked.
>>>  2. the ungzipping happens on the entire body after it is dechunked.
>>>       
>> Exactly.
>>     
>
> Oops, I think I misread your question.  Specifically, those "entire
> body" references.  Nope, both the deflate and chunking filters
> pipeline their data.
>
>
>   



---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Chunked + Gzip

Posted by Nick Kew <ni...@webthing.com>.
On Sat, 13 Sep 2008 15:53:10 +0100
Nick Kew <ni...@webthing.com> wrote:

> On Sat, 13 Sep 2008 11:02:11 -0300
> Michael Caplan <mi...@eggplant.ws> wrote:
> 
> >  1. the gzip content encoding happens on the entire body before it
> > is chunked.
> >  2. the ungzipping happens on the entire body after it is dechunked.
> 
> Exactly.

Oops, I think I misread your question.  Specifically, those "entire
body" references.  Nope, both the deflate and chunking filters
pipeline their data.


-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Re: [users@httpd] Chunked + Gzip

Posted by Nick Kew <ni...@webthing.com>.
On Sat, 13 Sep 2008 11:02:11 -0300
Michael Caplan <mi...@eggplant.ws> wrote:

>  1. the gzip content encoding happens on the entire body before it is 
> chunked.
>  2. the ungzipping happens on the entire body after it is dechunked.

Exactly.

> If I got this right (which I don't think I do), the web server would 
> need to first dechunk data produced from a dynamic source (PHP)
> before it can apply the gzip content encoding.

Why would PHP want to concern itself with chunked encoding?  That's
the business of the webserver, not the application.

> Likewise, on the client end, it would only be able to begin
> interpreting HTML following receiving the entire chucked payload,
> dechunk it, and then ungzip it.

You don't need the entire contents to start displaying it.
Unless you rely on something whole-document, like parsing to a DOM.

> But, that seems contrary to what Apache + mod_deflate actually does,

What seems contrary?

Read up on the apache filter chain.  That's the basis for ordering
different encodings, and indeed other transformations from content
manipulation like SSI or XSLT through to SSL.

-- 
Nick Kew

Application Development with Apache - the Apache Modules Book
http://www.apachetutor.org/

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org