You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@thrift.apache.org by Juan Cruz Viotti <jv...@jviotti.com> on 2020/10/28 14:01:21 UTC

Understanding the reasoning and theoretical limits of compact encoding field ids

Hey there,

I'm studying Thrift's compact protocol spec's struct encoding section
[1] and I have some questions that I couldn't answer from just the spec.

The spec describes two types of field header encodings:

- A 4-bit unsigned integer field identifier delta followed by a 4-bit
  unsigned integer type id

- A 4-bit unsigned integer type id followed by a 16-bit signed
  Zigzag-encoded integer absolute field identifier (for when the field
  delta exceeds 15)

My first question is: Why is the longer form using a signed integer? It
doesn't seem like Apache Thrift supports negative field identifiers.

Then, assuming the longer-form encodes absolute field identifiers and
not deltas like in the shorter form, the largest positive zigzag-encoded
integer that fits into 16-bits is 32767. As a consequence, the
longer-form encoding seems to impose a theoretical limit on the amount
of fields that can be included in a struct. On the other hand, the
delta-based shorter form would, in theory, let a struct grow
indefinitely.

Why does the longer-form encoding abandon the delta-based approach,
which seems to be superior in all respects?

Do implementations provide an upper limit on the amount of struct fields
when using the delta-based approach?

Thanks in advance for the clarifications,

[1]: https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md#struct-encoding

-- 
Juan Cruz Viotti
https://www.jviotti.com

Re: Understanding the reasoning and theoretical limits of compact encoding field ids

Posted by Randy Abernethy <ra...@rx-m.com>.
On negative field Ids, quoting "the Programmer's Guide to Apache Thrift":

Field Ids, occasionally called keys, are 16 bit integers which can be
explicit or implicit. The Thrift framework uses the field Id to uniquely
identify fields in many situations. For example, when calling an RPC
function, arguments can be passed in any order. The receiving side will use
the field Ids to match the parameters passed with the correct arguments.

Explicit field Ids must be positive integers. In the TimeStamp example
struct above, the field Ids are 1, 2 and 3. It is almost always
advantageous to define field Ids explicitly. Implicit field identifiers are
assigned by the IDL Compiler when explicit Ids are not provided. Implicit
Ids are negative (beginning at -1 and decrementing). Changing the order of
fields in a struct without explicit Ids will almost always break
compatibility with existing code because the implicit Ids generated for the
new order will likely be different from the previous implicit Ids. For
example, given struct {i16 a, i16 b} the IDL Compiler will generate Ids -1
and -2 for a and b respectively. However, given struct {i16 b, i16 a} the
IDL Compiler will generate Ids -2 and -1 for a and b respectively.

The IDL Compiler provides the “-allow-neg-keys” switch which allows
negative Ids to be assigned explicitly. This should only be used to solve
interoperability problems with existing Apache Thrift systems reliant on
predefined negative Ids.

On Wed, Oct 28, 2020 at 7:01 AM Juan Cruz Viotti <jv...@jviotti.com> wrote:

> Hey there,
>
> I'm studying Thrift's compact protocol spec's struct encoding section
> [1] and I have some questions that I couldn't answer from just the spec.
>
> The spec describes two types of field header encodings:
>
> - A 4-bit unsigned integer field identifier delta followed by a 4-bit
>   unsigned integer type id
>
> - A 4-bit unsigned integer type id followed by a 16-bit signed
>   Zigzag-encoded integer absolute field identifier (for when the field
>   delta exceeds 15)
>
> My first question is: Why is the longer form using a signed integer? It
> doesn't seem like Apache Thrift supports negative field identifiers.
>
> Then, assuming the longer-form encodes absolute field identifiers and
> not deltas like in the shorter form, the largest positive zigzag-encoded
> integer that fits into 16-bits is 32767. As a consequence, the
> longer-form encoding seems to impose a theoretical limit on the amount
> of fields that can be included in a struct. On the other hand, the
> delta-based shorter form would, in theory, let a struct grow
> indefinitely.
>
> Why does the longer-form encoding abandon the delta-based approach,
> which seems to be superior in all respects?
>
> Do implementations provide an upper limit on the amount of struct fields
> when using the delta-based approach?
>
> Thanks in advance for the clarifications,
>
> [1]:
> https://github.com/apache/thrift/blob/master/doc/specs/thrift-compact-protocol.md#struct-encoding
>
> --
> Juan Cruz Viotti
> https://www.jviotti.com
>


-- 

-- 
Randy Abernethy
Managing Partner
RX-M, LLCrandy.abernethy@rx-m.com
o 415-800-2922
c 415-624-6447