You are viewing a plain text version of this content. The canonical link for it is here.
Posted to xmlrpc-auto@ws.apache.org by "Balázs Póka (JIRA)" <xm...@ws.apache.org> on 2008/10/24 19:15:46 UTC

[jira] Issue Comment Edited: (XMLRPC-153) content-length header incorrect when using gzip

    [ https://issues.apache.org/jira/browse/XMLRPC-153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642497#action_12642497 ] 

poka edited comment on XMLRPC-153 at 10/24/08 10:14 AM:
---------------------------------------------------------------

Reading http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html over and over again, I think I'm beginning to have an impression what all of this means.

First of all, a quick recap of the relevant definitions.

In section 14.11, it says: "The Content-Encoding entity-header field is used as a modifier to the media-type. When present, its value indicates what additional content codings have been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field. Content-Encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type."
-> "Content codings are defined in section 3.5."

Section 3.5 states: "Content coding values indicate an encoding transformation that has been or can be applied to an entity. Content codings are primarily used to allow a document to be compressed or otherwise usefully transformed without losing the identity of its underlying media type and without loss of information. Frequently, the entity is stored in coded form, transmitted directly, and only decoded by the recipient."

Section 14.41: "The Transfer-Encoding general-header field indicates what (if any) type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient. This differs from the content-coding in that the transfer-coding is a property of the message, not of the entity."
-> "Transfer-codings are defined in section 3.6."

Section 3.6 states: " Transfer-coding values are used to indicate an encoding transformation that has been, can be, or may need to be applied to an entity-body in order to ensure "safe transport" through the network. This differs from a content coding in that the transfer-coding is a property of the message, not of the original entity."

It is declared that the set of supported Content-codings ("identity", "gzip", "compress", "deflate") is actually a subset of available Transfer-codings ("chunked", "identity", "gzip", "compress", "deflate") and that "Transfer-codings are analogous to the Content-Transfer-Encoding values of MIME..."

Let's consider an example where both fields are relevant. Say we have some URI whose content a browser is able to display directly. The browser decides whether it can do that based on the mime type. That could be an image, a flash application, a simple html or xml document, whatever. Suppose this document is compressible so it makes sense to compress it. Using the vocabulary of the RFC, we modify its media type, which is specified to be "text/xml", using gzip Content-encoding. Now the entity the browser downloads from the URI has a property of having been filtered through gzip. But underneath that it's still "text/xml" so the browser can still use it after applying reverse transformations. Suppose now there is a proxy between the server and the browser. Some proxies don't handle missing Content-Length headers too well. :) In that case, if it's HTTP/1.1 compatible, "chunked" Transfer-encoding may be used.

There is a section regarding Message length (4.4), which helps to understand that content-encoding is done in an other layer than transfer-encoding, and is totally unrelated.

" The transfer-length of a message is the length of the message-body as it appears in the message; that is, after any transfer-codings have been applied. When a message-body is included with a message, the transfer-length of that body is determined by one of the following (in order of precedence):
...
2. If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (section 3.6), unless the message is terminated by closing the connection. [Meaning that a chunked transfer-encoding implicitly specifies the total message length, which is irrelevant here since we don't use chunked transfers.]

3. If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored. [So the Content-Length field should EXACTLY match the number of bytes transferred. Since there is no Transfer-Encoding header (no chunked or anything), this is the key piece of information.]
...
5. By the server closing the connection. (Closing the connection cannot be used to indicate the end of a request body, since that would leave no possibility for the server to send back a response.) [Very important since this is what causes the problem in the first place.]
"

So, I think that the following two examples of headers are functionally equivalent, but only the first is supported by HTTP/1.0:
1)
Content-Encoding: gzip
Content-Length: 1234

[Content-Length is the exact number of bytes sent over the wire.]

2) Transfer-Encoding: gzip; chunked

[chunked is mandatory if there is a Transfer-Encoding header, and no Content-Length is needed since the size of the message can be calculated because of the chunked transfer-encoding. This is why it must be ignored.]

Thanks for reading though this. Hope this helps. :)

      was (Author: poka):
    Reading http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html over and over again, I think I'm beginning have an impression what all of this means.

First of all, a quick recap of the relevant definitions.

In section 14.11, it says: "The Content-Encoding entity-header field is used as a modifier to the media-type. When present, its value indicates what additional content codings have been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field. Content-Encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type."
-> "Content codings are defined in section 3.5."

Section 3.5 states: "Content coding values indicate an encoding transformation that has been or can be applied to an entity. Content codings are primarily used to allow a document to be compressed or otherwise usefully transformed without losing the identity of its underlying media type and without loss of information. Frequently, the entity is stored in coded form, transmitted directly, and only decoded by the recipient."

Section 14.41: "The Transfer-Encoding general-header field indicates what (if any) type of transformation has been applied to the message body in order to safely transfer it between the sender and the recipient. This differs from the content-coding in that the transfer-coding is a property of the message, not of the entity."
-> "Transfer-codings are defined in section 3.6."

Section 3.6 states: " Transfer-coding values are used to indicate an encoding transformation that has been, can be, or may need to be applied to an entity-body in order to ensure "safe transport" through the network. This differs from a content coding in that the transfer-coding is a property of the message, not of the original entity."

It is declared that the set of supported Content-codings ("identity", "gzip", "compress", "deflate") is actually a subset of available Transfer-codings ("chunked", "identity", "gzip", "compress", "deflate") and that "Transfer-codings are analogous to the Content-Transfer-Encoding values of MIME..."

Let's consider an example where both fields are relevant. Say we have some URI whose content a browser is able to display directly. The browser decides whether it can do that based on the mime type. That could be an image, a flash application, a simple html or xml document, whatever. Suppose this document is compressible so it makes sense to compress it. Using the vocabulary of the RFC, we modify its media type, which is specified to be "text/xml", using gzip Content-encoding. Now the entity the browser downloads from the URI has a property of having been filtered through gzip. But underneath that it's still "text/xml" so the browser can still use it after applying reverse transformations. Suppose now there is a proxy between the server and the browser. Some proxies don't handle missing Content-Length headers too well. :) In that case, if it's HTTP/1.1 compatible, "chunked" Transfer-encoding may be used.

There is a section regarding Message length (4.4), which helps to understand that content-encoding is done in an other layer than transfer-encoding, and is totally unrelated.

" The transfer-length of a message is the length of the message-body as it appears in the message; that is, after any transfer-codings have been applied. When a message-body is included with a message, the transfer-length of that body is determined by one of the following (in order of precedence):
...
2. If a Transfer-Encoding header field (section 14.41) is present and has any value other than "identity", then the transfer-length is defined by use of the "chunked" transfer-coding (section 3.6), unless the message is terminated by closing the connection. [Meaning that a chunked transfer-encoding implicitly specifies the total message length, which is irrelevant here since we don't use chunked transfers.]

3. If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different (i.e., if a Transfer-Encoding header field is present). If a message is received with both a Transfer-Encoding header field and a Content-Length header field, the latter MUST be ignored. [So the Content-Length field should EXACTLY match the number of bytes transferred. Since there is no Transfer-Encoding header (no chunked or anything), this is the key piece of information.]
...
5. By the server closing the connection. (Closing the connection cannot be used to indicate the end of a request body, since that would leave no possibility for the server to send back a response.) [Very important since this is what causes the problem in the first place.]
"

So, I think that the following two examples of headers are functionally equivalent, but only the first is supported by HTTP/1.0:
1)
Content-Encoding: gzip
Content-Length: 1234

[Content-Length is the exact number of bytes sent over the wire.]

2) Transfer-Encoding: gzip; chunked

[chunked is mandatory if there is a Transfer-Encoding header, and no Content-Length is needed since the size of the message can be calculated because of the chunked transfer-encoding. This is why it must be ignored.]

Thanks for reading though this. Hope this helps. :)
  
> content-length header incorrect when using gzip
> -----------------------------------------------
>
>                 Key: XMLRPC-153
>                 URL: https://issues.apache.org/jira/browse/XMLRPC-153
>             Project: XML-RPC
>          Issue Type: Bug
>    Affects Versions: 3.0, 3.1
>         Environment: UNIX (FC3), Sun JDK1.5.0_10
>            Reporter: Andy Meyer
>         Attachments: patch.txt
>
>
> When doing some testing using the ws-xmlrpc client libraries I ran across a bug in its calculation of the content-length HTTP header when using gzip compression but not HTTP chunked transfer. The client incorrectly sets the content-length to the length of the uncompressed data, rather than the compressed data it sends. This happens using both 3.0 and 3.1 client libraries.
> I see some activity on ws-xmlrpc-dev from September 2007 but no mention of any resolution. I did a quick bug search and found nothing - my apologies if this is already being tracked somewhere else and I missed it.
> From the mail thread, a link to the relevant part of the HTTP spec:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.