You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by ty...@apache.org on 2015/02/14 00:04:44 UTC
cassandra git commit: Add type serialization formats to native protocol spec

Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.0 b1825e6f8 -> 782b0b616


Add type serialization formats to native protocol spec

Patch by Tyler Hobbs; reviewed by Benjamin Lerer for CASSANDRA-8495


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/782b0b61
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/782b0b61
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/782b0b61

Branch: refs/heads/cassandra-2.0
Commit: 782b0b616f871c90ec6a09b2fc27bd1d2d33caa0
Parents: b1825e6
Author: Tyler Hobbs <ty...@datastax.com>
Authored: Fri Feb 13 17:03:54 2015 -0600
Committer: Tyler Hobbs <ty...@datastax.com>
Committed: Fri Feb 13 17:03:54 2015 -0600

----------------------------------------------------------------------
 doc/native_protocol_v1.spec | 136 +++++++++++++++++++++++++++++++++-----
 doc/native_protocol_v2.spec | 137 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 239 insertions(+), 34 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/cassandra/blob/782b0b61/doc/native_protocol_v1.spec
----------------------------------------------------------------------
diff --git a/doc/native_protocol_v1.spec b/doc/native_protocol_v1.spec
index 08cb91e..bc2bb78 100644
--- a/doc/native_protocol_v1.spec
+++ b/doc/native_protocol_v1.spec
@@ -34,7 +34,7 @@ Table of Contents
         4.2.5.5. Schema_change
       4.2.6. EVENT
   5. Compression
-  6. Collection types
+  6. Data Type Serialization Formats
   7. Error codes
 
 
@@ -169,8 +169,8 @@ Table of Contents
   To describe the layout of the frame body for the messages in Section 4, we
   define the following:
 
-    [int]          A 4 bytes integer
-    [short]        A 2 bytes unsigned integer
+    [int]          A 4 byte integer
+    [short]        A 2 byte unsigned integer
     [string]       A [short] n, followed by n bytes representing an UTF-8
                    string.
     [long string]  An [int] n, followed by n bytes representing an UTF-8 string.
@@ -525,22 +525,124 @@ Table of Contents
   flag (see Section 2.2) is set.
 
 
-6. Collection types
+6. Data Type Serialization Formats
 
-  This section describe the serialization format for the collection types:
-  list, map and set. This serialization format is both useful to decode values
-  returned in RESULT messages but also to encode values for EXECUTE ones.
+  This sections describes the serialization formats for all CQL data types
+  supported by Cassandra through the native protocol.  These serialization
+  formats should be used by client drivers to encode values for EXECUTE
+  messages.  Cassandra will use these formats when returning values in
+  RESULT messages.
 
-  The serialization formats are:
-     List: a [short] n indicating the size of the list, followed by n elements.
-           Each element is [short bytes] representing the serialized element
-           value.
-     Map: a [short] n indicating the size of the map, followed by n entries.
-          Each entry is composed of two [short bytes] representing the key and
-          the value of the entry map.
-     Set: a [short] n indicating the size of the set, followed by n elements.
-          Each element is [short bytes] representing the serialized element
-          value.
+  All values are represented as [bytes] in EXECUTE and RESULT messages.
+  The [bytes] format includes an int prefix denoting the length of the value.
+  For that reason, the serialization formats described here will not include
+  a length component.
+
+  For legacy compatibility reasons, note that most non-string types support
+  "empty" values (i.e. a value with zero length).  An empty value is distinct
+  from NULL, which is encoded with a negative length.
+
+  As with the rest of the native protocol, all encodings are big-endian.
+
+6.1. ascii
+
+  A sequence of bytes in the ASCII range [0, 127].  Bytes with values outside of
+  this range will result in a validation error.
+
+6.2 bigint
+
+  An eight-byte two's complement integer.
+
+6.3 blob
+
+  Any sequence of bytes.
+
+6.4 boolean
+
+  A single byte.  A value of 0 denotes "false"; any other value denotes "true".
+  (However, it is recommended that a value of 1 be used to represent "true".)
+
+6.5 decimal
+
+  The decimal format represents an arbitrary-precision number.  It contains an
+  [int] "scale" component followed by a varint encoding (see section 6.17)
+  of the unscaled value.  The encoded value represents "<unscaled>E<-scale>".
+  In other words, "<unscaled> * 10 ^ (-1 * <scale>)".
+
+6.6 double
+
+  An eight-byte floating point number in the IEEE 754 binary64 format.
+
+6.7 float
+
+  An four-byte floating point number in the IEEE 754 binary32 format.
+
+6.8 inet
+
+  A 4 byte or 16 byte sequence denoting an IPv4 or IPv6 address, respectively.
+
+6.9 int
+
+  A four-byte two's complement integer.
+
+6.10 list
+
+  A [short] n indicating the number of elements in the list, followed by n
+  elements.  Each element is [short bytes] representing the serialized value.
+
+6.11 map
+
+  A [short] n indicating the number of key/value pairs in the map, followed by
+  n entries.  Each entry is composed of two [short bytes] representing the key
+  and value.
+
+6.12 set
+
+  A [short] n indicating the number of elements in the set, followed by n
+  elements.  Each element is [short bytes] representing the serialized value.
+
+6.13 text
+
+  A sequence of bytes conforming to the UTF-8 specifications.
+
+6.14 timestamp
+
+  An eight-byte two's complement integer representing a millisecond-precision
+  offset from the unix epoch (00:00:00, January 1st, 1970).  Negative values
+  represent a negative offset from the epoch.
+
+6.15 uuid
+
+  A 16 byte sequence representing any valid UUID as defined by RFC 4122.
+
+6.16 varchar
+
+  An alias of the "text" type.
+
+6.17 varint
+
+  A variable-length two's complement encoding of a signed integer.
+
+  The following examples may help implementors of this spec:
+
+  Value | Encoding
+  ------|---------
+      0 |     0x00
+      1 |     0x01
+    127 |     0x7F
+    128 |   0x0080
+     -1 |     0xFF
+   -128 |     0x80
+   -129 |   0xFF7F
+
+  Note that positive numbers must use a most-significant byte with a value
+  less than 0x80, because a most-significant bit of 1 indicates a negative
+  value.  Implementors should pad positive values that have a MSB >= 0x80
+  with a leading 0x00 byte.
+
+6.18 timeuuid
+
+  A 16 byte sequence representing a version 1 UUID as defined by RFC 4122.
 
 
 7. Error codes

http://git-wip-us.apache.org/repos/asf/cassandra/blob/782b0b61/doc/native_protocol_v2.spec
----------------------------------------------------------------------
diff --git a/doc/native_protocol_v2.spec b/doc/native_protocol_v2.spec
index 11d380f..ef54099 100644
--- a/doc/native_protocol_v2.spec
+++ b/doc/native_protocol_v2.spec
@@ -37,7 +37,7 @@ Table of Contents
       4.2.7. AUTH_CHALLENGE
       4.2.8. AUTH_SUCCESS
   5. Compression
-  6. Collection types
+  6. Data Type Serialization Formats
   7. Result paging
   8. Error codes
   9. Changes from v1
@@ -186,8 +186,8 @@ Table of Contents
   To describe the layout of the frame body for the messages in Section 4, we
   define the following:
 
-    [int]          A 4 bytes integer
-    [short]        A 2 bytes unsigned integer
+    [int]          A 4 byte integer
+    [short]        A 2 byte unsigned integer
     [string]       A [short] n, followed by n bytes representing an UTF-8
                    string.
     [long string]  An [int] n, followed by n bytes representing an UTF-8 string.
@@ -673,22 +673,125 @@ Table of Contents
       avaivable on some installation.
 
 
-6. Collection types
+6. Data Type Serialization Formats
 
-  This section describe the serialization format for the collection types:
-  list, map and set. This serialization format is both useful to decode values
-  returned in RESULT messages but also to encode values for EXECUTE ones.
+  This sections describes the serialization formats for all CQL data types
+  supported by Cassandra through the native protocol.  These serialization
+  formats should be used by client drivers to encode values for EXECUTE
+  messages.  Cassandra will use these formats when returning values in
+  RESULT messages.
 
-  The serialization formats are:
-     List: a [short] n indicating the size of the list, followed by n elements.
-           Each element is [short bytes] representing the serialized element
-           value.
-     Map: a [short] n indicating the size of the map, followed by n entries.
-          Each entry is composed of two [short bytes] representing the key and
-          the value of the entry map.
-     Set: a [short] n indicating the size of the set, followed by n elements.
-          Each element is [short bytes] representing the serialized element
-          value.
+  All values are represented as [bytes] in EXECUTE and RESULT messages.
+  The [bytes] format includes an int prefix denoting the length of the value.
+  For that reason, the serialization formats described here will not include
+  a length component.
+
+  For legacy compatibility reasons, note that most non-string types support
+  "empty" values (i.e. a value with zero length).  An empty value is distinct
+  from NULL, which is encoded with a negative length.
+
+  As with the rest of the native protocol, all encodings are big-endian.
+
+6.1. ascii
+
+  A sequence of bytes in the ASCII range [0, 127].  Bytes with values outside of
+  this range will result in a validation error.
+
+6.2 bigint
+
+  An eight-byte two's complement integer.
+
+6.3 blob
+
+  Any sequence of bytes.
+
+6.4 boolean
+
+  A single byte.  A value of 0 denotes "false"; any other value denotes "true".
+  (However, it is recommended that a value of 1 be used to represent "true".)
+
+6.5 decimal
+
+  The decimal format represents an arbitrary-precision number.  It contains an
+  [int] "scale" component followed by a varint encoding (see section 6.17)
+  of the unscaled value.  The encoded value represents "<unscaled>E<-scale>".
+  In other words, "<unscaled> * 10 ^ (-1 * <scale>)".
+
+6.6 double
+
+  An eight-byte floating point number in the IEEE 754 binary64 format.
+
+6.7 float
+
+  An four-byte floating point number in the IEEE 754 binary32 format.
+
+6.8 inet
+
+  A 4 byte or 16 byte sequence denoting an IPv4 or IPv6 address, respectively.
+
+6.9 int
+
+  A four-byte two's complement integer.
+
+6.10 list
+
+  A [short] n indicating the number of elements in the list, followed by n
+  elements.  Each element is [short bytes] representing the serialized value.
+
+6.11 map
+
+  A [short] n indicating the number of key/value pairs in the map, followed by
+  n entries.  Each entry is composed of two [short bytes] representing the key
+  and value.
+
+6.12 set
+
+  A [short] n indicating the number of elements in the set, followed by n
+  elements.  Each element is [short bytes] representing the serialized value.
+
+6.13 text
+
+  A sequence of bytes conforming to the UTF-8 specifications.
+
+6.14 timestamp
+
+  An eight-byte two's complement integer representing a millisecond-precision
+  offset from the unix epoch (00:00:00, January 1st, 1970).  Negative values
+  represent a negative offset from the epoch.
+
+6.15 uuid
+
+  A 16 byte sequence representing any valid UUID as defined by RFC 4122.
+
+6.16 varchar
+
+  An alias of the "text" type.
+
+6.17 varint
+
+  A variable-length two's complement encoding of a signed integer.
+
+  The following examples may help implementors of this spec:
+
+  Value | Encoding
+  ------|---------
+      0 |     0x00
+      1 |     0x01
+    127 |     0x7F
+    128 |   0x0080
+    129 |   0x0081
+     -1 |     0xFF
+   -128 |     0x80
+   -129 |   0xFF7F
+
+  Note that positive numbers must use a most-significant byte with a value
+  less than 0x80, because a most-significant bit of 1 indicates a negative
+  value.  Implementors should pad positive values that have a MSB >= 0x80
+  with a leading 0x00 byte.
+
+6.18 timeuuid
+
+  A 16 byte sequence representing a version 1 UUID as defined by RFC 4122.
 
 
 7. Result paging