You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@thrift.apache.org by je...@apache.org on 2021/02/14 10:42:38 UTC

[thrift] branch master updated (b04e39a -> 2e7f39f)

This is an automated email from the ASF dual-hosted git repository.

jensg pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/thrift.git.


    from b04e39a  THRIFT-5318: Update PHP thrift_protocol extension for PHP 8 Client: php Patch: Tyler Christensen & Rasmus Lerdorf
     new 47b3d3b  Make it clear that strings are not NUL-delimited Patch: Juan Cruz Viotti
     new 2e7f39f  Clarify Compact Protocol var int encoding definition Patch: Juan Cruz Viotti

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 doc/specs/thrift-compact-protocol.md | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)


[thrift] 01/02: Make it clear that strings are not NUL-delimited Patch: Juan Cruz Viotti

Posted by je...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jensg pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/thrift.git

commit 47b3d3b148c5181c02f4f871444fe93ad4ec65f2
Author: Juan Cruz Viotti <jv...@jviotti.com>
AuthorDate: Thu Jan 21 12:22:47 2021 -0400

    Make it clear that strings are not NUL-delimited
    Patch: Juan Cruz Viotti
    
    This closes #2313
    
    It might not be obvious from the existing description. I had to run some
    experiments to double-check it and this might save some time to the next
    interested reader.
    
    Signed-off-by: Juan Cruz Viotti <jv...@jviotti.com>
---
 doc/specs/thrift-compact-protocol.md | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/doc/specs/thrift-compact-protocol.md b/doc/specs/thrift-compact-protocol.md
index 6be2a62..001bb12 100644
--- a/doc/specs/thrift-compact-protocol.md
+++ b/doc/specs/thrift-compact-protocol.md
@@ -92,7 +92,8 @@ Where:
 
 ### String encoding
 
-*String*s are first encoded to UTF-8, and then send as binary.
+*String*s are first encoded to UTF-8, and then send as binary. They do not
+include a NUL delimiter.
 
 ### Double encoding
 


[thrift] 02/02: Clarify Compact Protocol var int encoding definition Patch: Juan Cruz Viotti

Posted by je...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

jensg pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/thrift.git

commit 2e7f39f6b69d98fccba714266f3fa92bbce934cd
Author: Juan Cruz Viotti <jv...@jviotti.com>
AuthorDate: Wed Jan 20 17:05:19 2021 -0400

    Clarify Compact Protocol var int encoding definition
    Patch: Juan Cruz Viotti
    
    This closes #2312
    
    I'm having problems following the var int explanation from the Compact
    Protocol spec. Here is an attempt to clarify it with more precise
    encoding steps and with an example.
    
    I'm also mentioning, for completeness, that the formal name of such
    variable-length integer encoding is Unsigned LEB128 (Unsigned Little
    Endian Base-128).
    
    Signed-off-by: Juan Cruz Viotti <jv...@jviotti.com>
---
 doc/specs/thrift-compact-protocol.md | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/doc/specs/thrift-compact-protocol.md b/doc/specs/thrift-compact-protocol.md
index 001bb12..89301eb 100644
--- a/doc/specs/thrift-compact-protocol.md
+++ b/doc/specs/thrift-compact-protocol.md
@@ -61,9 +61,21 @@ def longToZigZag(n: Long): Long = (n << 1) ^ (n >> 63)
 def zigzagToLong(n: Long): Long = (n >>> 1) ^ - (n & 1)
 ```
 
-The zigzag int is then encoded as a *var int*. Var ints take 1 to 5 bytes (int32) or 1 to 10 bytes (int64). The most
-significant bit of each byte indicates if more bytes follow. The concatenation of the least significant 7 bits from each
-byte form the number, where the first byte has the most significant bits (so they are in big endian or network order).
+The zigzag int is then encoded as a *var int*, also known as *Unsigned LEB128*.  Var ints take 1 to 5 bytes (int32) or 
+1 to 10 bytes (int64). The process consists in taking a Big Endian unsigned integer, left-padding the bit-string to 
+make it a multiple of 7 bits, splitting it into 7-bit groups, prefixing the most-significant 7-bit group with the 0 
+bit, prefixing the remaining 7-bit groups with the 1 bit and encoding the resulting bit-string in Little Endian.
+
+For example, the integer 50399 is encoded as follows:
+
+```
+50399 = 1100 0100 1101 1111         (Big Endian representation)
+      = 00000 1100 0100 1101 1111   (Left-padding)
+      = 0000011 0001001 1011111     (7-bit groups)
+      = 00000011 10001001 11011111  (Most-significant bit prefixes)
+      = 11011111 10001001 00000011  (Little Endian representation)
+      = 0xDF 0x89 0x03
+```
 
 Var ints are sometimes used directly inside the compact protocol to represent positive numbers.