You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@thrift.apache.org by GitBox <gi...@apache.org> on 2021/08/06 17:10:35 UTC

[GitHub] [thrift] EnigmaTriton commented on pull request #2417: Fix specification to match implementation

EnigmaTriton commented on pull request #2417:
URL: https://github.com/apache/thrift/pull/2417#issuecomment-894397040


   (Note: I tried with & without subaddressing but my mails won't get on the mailing list in any case.)
   
   In light of the recent proposal for IETF standardization of Thrift
   protocol [1] and the observation that the Compact specification does
   not match the implementations [2] (all languages are consistent so the
   specification file has to be updated).
   
   I opened this pull request [3] in order to fix that but a wording
   question remains regarding the element-types (types encoded at the
   beginning of lists, sets, and map) by contrast to the field types in
   the structures.
   
   In binary protocol, the same values are used on both sides, which is
   made completely explicit with the following statement:
   "The element-type values are the same as field-types. The full list is
   included in the struct section above."
   
   The current description of Compact protocol indicates that element
   types and field-types are different but the issue is that the values
   are incorrect.
   
   The current description indicates (binary added for reference)
   
   | Value | Binary | Elements | Fields |
   |------:|--------|----------|--------|
   |     0 |        |          |        |
   |     1 |        |          | TRUE   |
   |     2 | BOOL   | BOOL     | FALSE  |
   |     3 | I8     | I8       | I8     |
   |     4 | DOUBLE | DOUBLE   | I16    |
   |     5 |        |          | I32    |
   |     6 | I16    | I16      | I64    |
   |     7 |        |          | DOUBLE |
   |     8 | I32    | I32      | BINARY |
   |     9 |        |          | LIST   |
   |    10 | I64    | I64      | SET    |
   |    11 | STRING | STRING   | MAP    |
   |    12 | STRUCT | STRUCT   | STRUCT |
   |    13 | MAP    | MAP      |        |
   |    14 | SET    | SET      |        |
   |    15 | LIST   | LIST     |        |
   
   As we can see, the description indicates that compact element types
   are identical to binary types.
   Unfortunately, the implementations do not follow this structure (but
   are consistent as shown by the cross-language tests), but the followin
   one:
   
   | Value | Binary | Elements | Fields |
   |------:|--------|----------|--------|
   |     0 |        |          |        |
   |     1 |        |          | TRUE   |
   |     2 | BOOL   | BOOL     | FALSE  |
   |     3 | I8     | I8       | I8     |
   |     4 | DOUBLE | I16      | I16    |
   |     5 |        | I32      | I32    |
   |     6 | I16    | I64      | I64    |
   |     7 |        | DOUBLE   | DOUBLE |
   |     8 | I32    | STRING   | BINARY |
   |     9 |        | LIST     | LIST   |
   |    10 | I64    | SET      | SET    |
   |    11 | STRING | MAP      | MAP    |
   |    12 | STRUCT | STRUCT   | STRUCT |
   |    13 | MAP    |          |        |
   |    14 | SET    |          |        |
   |    15 | LIST   |          |        |
   
   It's clear in this case that, except for boolean where both values are
   encoded as different types, the element-types and field-types are
   consistent within compact protocol and have no link to the binary
   values.
   
   Proposal 1: Wording indicates that element-types and field-types (for
   compact) are using independent values and could diverge more in the
   future.
   Proposal 2: Wording indicates that element-types and field-types (for
   compact) are using similar values and will remain aligned in the
   future (and also that element-type value 1 is invalid).
   
   My vote would go to the 2nd proposal.
   
   As a side note, I propose to align the "BINARY"/"STRING" type to
   "BINARY" on both sides as documentation explicitly indicates that
   "Strings are first encoded to UTF-8, and then send as binary." [5]  
   (This change is not applied on Binary Protocol side yet but I can update this PR.)
   
   Please comment and I will update my PR with the commonly-preferred
   wording (also do not hesitate to propose a specific wording for the
   compact specification with regards to the element-types [6].
   
   Consistently,
   
   Triton.
   
   [1] http://mail-archives.apache.org/mod_mbox/thrift-user/202107.mbox/raw/%3C03439030-2F4C-4A2A-AE21-5DAEAAC2D99B%40contoso.com%3E/1
   [2] https://issues.apache.org/jira/browse/THRIFT-5300
   [3] https://github.com/apache/thrift/pull/2417
   [4] https://github.com/apache/thrift/blame/master/doc/specs/thrift-binary-protocol.md#L214
   [5] https://github.com/apache/thrift/blame/master/doc/specs/thrift-binary-protocol.md#L78
   [6] https://github.com/apache/thrift/pull/2417/files#diff-1dd22d4f1230706bd4cdef9b71f0559886aeae62640292d6fc30c94ca160437fL248-R248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@thrift.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org