You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by Triton Circonflexe <tr...@kumal.info> on 2021/08/13 19:10:04 UTC

Fwd: Thrift Compact protocol specification

In light of the recent proposal for IETF standardization of Thrift
protocol [1] and the observation that the Compact specification does
not match the implementations [2] (all languages are consistent so the
specification file has to be updated).

I opened a pull request [3] in order to fix that but a wording
question remains regarding the element-types (types encoded at the
beginning of lists, sets, and map) by contrast to the field types in
the structures.

In binary protocol, the same values are used on both sides, which is
made completely explicit with the following statement:
"The element-type values are the same as field-types. The full list is
included in the struct section above."

The current description of Compact protocol indicates that element
types and field-types are different but the issue is that the values
are incorrect.

The current description indicates (binary added for reference)

| Value | Binary | Elements | Fields |
+-------+--------+----------+--------+
|     0 |        |          |        |
|     1 |        |          | TRUE   |
|     2 | BOOL   | BOOL     | FALSE  |
|     3 | I8     | I8       | I8     |
|     4 | DOUBLE | DOUBLE   | I16    |
|     5 |        |          | I32    |
|     6 | I16    | I16      | I64    |
|     7 |        |          | DOUBLE |
|     8 | I32    | I32      | BINARY |
|     9 |        |          | LIST   |
|    10 | I64    | I64      | SET    |
|    11 | STRING | STRING   | MAP    |
|    12 | STRUCT | STRUCT   | STRUCT |
|    13 | MAP    | MAP      |        |
|    14 | SET    | SET      |        |
|    15 | LIST   | LIST     |        |

As we can see, the description indicates that compact element types
are identical to binary types.
Unfortunately, the implementations do not follow this structure (but
are consistent as shown by the cross-language tests), but the followin
one:

| Value | Binary | Elements | Fields |
+-------+--------+----------+--------+
|     0 |        |          |        |
|     1 |        |          | TRUE   |
|     2 | BOOL   | BOOL     | FALSE  |
|     3 | I8     | I8       | I8     |
|     4 | DOUBLE | I16      | I16    |
|     5 |        | I32      | I32    |
|     6 | I16    | I64      | I64    |
|     7 |        | DOUBLE   | DOUBLE |
|     8 | I32    | STRING   | BINARY |
|     9 |        | LIST     | LIST   |
|    10 | I64    | SET      | SET    |
|    11 | STRING | MAP      | MAP    |
|    12 | STRUCT | STRUCT   | STRUCT |
|    13 | MAP    |          |        |
|    14 | SET    |          |        |
|    15 | LIST   |          |        |

It's clear in this case that, except for boolean where both values are
encoded as different types, the element-types and field-types are
consistent within compact protocol and have no link to the binary
values.

Proposal 1: Wording indicates that element-types and field-types (for
compact) are using independent values and could diverge more in the
future.
Proposal 2: Wording indicates that element-types and field-types (for
compact) are using similar values and will remain aligned in the
future (and also that element-type value 1 is invalid).

(My vote would go to the 2nd proposal.)

As a side note, I propose to align the "BINARY"/"STRING" type to
"BINARY" on both sides as documentation explicitly indicates that
"Strings are first encoded to UTF-8, and then send as binary." [5]

Please comment and I will update my PR with the commonly-preferred
wording (also do not hesitate to propose a specific wording for the
compact specification with regards to the element-types [6].

Consistently,

Triton.

[1] http://mail-archives.apache.org/mod_mbox/thrift-user/202107.mbox/raw/%3C03439030-2F4C-4A2A-AE21-5DAEAAC2D99B%40contoso.com%3E/1
[2] https://issues.apache.org/jira/browse/THRIFT-5300
[3] https://github.com/apache/thrift/pull/2417
[4] https://github.com/apache/thrift/blame/master/doc/specs/thrift-binary-protocol.md#L214
[5] https://github.com/apache/thrift/blame/master/doc/specs/thrift-binary-protocol.md#L78
[6] https://github.com/apache/thrift/pull/2417/files#diff-1dd22d4f1230706bd4cdef9b71f0559886aeae62640292d6fc30c94ca160437fL248-R248

Re: Thrift Compact protocol specification

Posted by Triton Circonflexe <tr...@kumal.info>.
> +1 for 1
>
> Sent from mobile device. You know what that means...

As you are the maintainer, it means that if no one else steps up in
the coming days, that will be the final call. :D

Proposal for the wording, then:
"The following element-types are used (see note below):"

and then, after the list between lines 261 and 262:
"*Note*: Although field-types and element-types lists are currently
very similar, there is _no guarantee_ that this will remain true after
new types are added."

This way this is more explicit than the previous wording (with the
addition of "we have seen the similarity" to avoid the "maybe they
forgot to update" reaction) while keeping the parenthesis short.

I'll update the PR as soon as wording is agreed.

Consistently,

-- 
Triton.

Re: Thrift Compact protocol specification

Posted by Jens Geyer <je...@hotmail.com>.
+1 for 1

Sent from mobile device. You know what that means...

________________________________
From: Triton Circonflexe <tr...@kumal.info>
Sent: Friday, August 13, 2021 9:10:04 PM
To: dev@thrift.apache.org <de...@thrift.apache.org>
Subject: Fwd: Thrift Compact protocol specification

In light of the recent proposal for IETF standardization of Thrift
protocol [1] and the observation that the Compact specification does
not match the implementations [2] (all languages are consistent so the
specification file has to be updated).

I opened a pull request [3] in order to fix that but a wording
question remains regarding the element-types (types encoded at the
beginning of lists, sets, and map) by contrast to the field types in
the structures.

In binary protocol, the same values are used on both sides, which is
made completely explicit with the following statement:
"The element-type values are the same as field-types. The full list is
included in the struct section above."

The current description of Compact protocol indicates that element
types and field-types are different but the issue is that the values
are incorrect.

The current description indicates (binary added for reference)

| Value | Binary | Elements | Fields |
+-------+--------+----------+--------+
|     0 |        |          |        |
|     1 |        |          | TRUE   |
|     2 | BOOL   | BOOL     | FALSE  |
|     3 | I8     | I8       | I8     |
|     4 | DOUBLE | DOUBLE   | I16    |
|     5 |        |          | I32    |
|     6 | I16    | I16      | I64    |
|     7 |        |          | DOUBLE |
|     8 | I32    | I32      | BINARY |
|     9 |        |          | LIST   |
|    10 | I64    | I64      | SET    |
|    11 | STRING | STRING   | MAP    |
|    12 | STRUCT | STRUCT   | STRUCT |
|    13 | MAP    | MAP      |        |
|    14 | SET    | SET      |        |
|    15 | LIST   | LIST     |        |

As we can see, the description indicates that compact element types
are identical to binary types.
Unfortunately, the implementations do not follow this structure (but
are consistent as shown by the cross-language tests), but the followin
one:

| Value | Binary | Elements | Fields |
+-------+--------+----------+--------+
|     0 |        |          |        |
|     1 |        |          | TRUE   |
|     2 | BOOL   | BOOL     | FALSE  |
|     3 | I8     | I8       | I8     |
|     4 | DOUBLE | I16      | I16    |
|     5 |        | I32      | I32    |
|     6 | I16    | I64      | I64    |
|     7 |        | DOUBLE   | DOUBLE |
|     8 | I32    | STRING   | BINARY |
|     9 |        | LIST     | LIST   |
|    10 | I64    | SET      | SET    |
|    11 | STRING | MAP      | MAP    |
|    12 | STRUCT | STRUCT   | STRUCT |
|    13 | MAP    |          |        |
|    14 | SET    |          |        |
|    15 | LIST   |          |        |

It's clear in this case that, except for boolean where both values are
encoded as different types, the element-types and field-types are
consistent within compact protocol and have no link to the binary
values.

Proposal 1: Wording indicates that element-types and field-types (for
compact) are using independent values and could diverge more in the
future.
Proposal 2: Wording indicates that element-types and field-types (for
compact) are using similar values and will remain aligned in the
future (and also that element-type value 1 is invalid).

(My vote would go to the 2nd proposal.)

As a side note, I propose to align the "BINARY"/"STRING" type to
"BINARY" on both sides as documentation explicitly indicates that
"Strings are first encoded to UTF-8, and then send as binary." [5]

Please comment and I will update my PR with the commonly-preferred
wording (also do not hesitate to propose a specific wording for the
compact specification with regards to the element-types [6].

Consistently,

Triton.

[1] http://mail-archives.apache.org/mod_mbox/thrift-user/202107.mbox/raw/%3C03439030-2F4C-4A2A-AE21-5DAEAAC2D99B%40contoso.com%3E/1
[2] https://issues.apache.org/jira/browse/THRIFT-5300
[3] https://github.com/apache/thrift/pull/2417
[4] https://github.com/apache/thrift/blame/master/doc/specs/thrift-binary-protocol.md#L214
[5] https://github.com/apache/thrift/blame/master/doc/specs/thrift-binary-protocol.md#L78
[6] https://github.com/apache/thrift/pull/2417/files#diff-1dd22d4f1230706bd4cdef9b71f0559886aeae62640292d6fc30c94ca160437fL248-R248