You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by ty...@apache.org on 2015/02/14 00:04:44 UTC
cassandra git commit: Add type serialization formats to native
protocol spec
Repository: cassandra
Updated Branches:
refs/heads/cassandra-2.0 b1825e6f8 -> 782b0b616
Add type serialization formats to native protocol spec
Patch by Tyler Hobbs; reviewed by Benjamin Lerer for CASSANDRA-8495
Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/782b0b61
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/782b0b61
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/782b0b61
Branch: refs/heads/cassandra-2.0
Commit: 782b0b616f871c90ec6a09b2fc27bd1d2d33caa0
Parents: b1825e6
Author: Tyler Hobbs <ty...@datastax.com>
Authored: Fri Feb 13 17:03:54 2015 -0600
Committer: Tyler Hobbs <ty...@datastax.com>
Committed: Fri Feb 13 17:03:54 2015 -0600
----------------------------------------------------------------------
doc/native_protocol_v1.spec | 136 +++++++++++++++++++++++++++++++++-----
doc/native_protocol_v2.spec | 137 ++++++++++++++++++++++++++++++++++-----
2 files changed, 239 insertions(+), 34 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/cassandra/blob/782b0b61/doc/native_protocol_v1.spec
----------------------------------------------------------------------
diff --git a/doc/native_protocol_v1.spec b/doc/native_protocol_v1.spec
index 08cb91e..bc2bb78 100644
--- a/doc/native_protocol_v1.spec
+++ b/doc/native_protocol_v1.spec
@@ -34,7 +34,7 @@ Table of Contents
4.2.5.5. Schema_change
4.2.6. EVENT
5. Compression
- 6. Collection types
+ 6. Data Type Serialization Formats
7. Error codes
@@ -169,8 +169,8 @@ Table of Contents
To describe the layout of the frame body for the messages in Section 4, we
define the following:
- [int] A 4 bytes integer
- [short] A 2 bytes unsigned integer
+ [int] A 4 byte integer
+ [short] A 2 byte unsigned integer
[string] A [short] n, followed by n bytes representing an UTF-8
string.
[long string] An [int] n, followed by n bytes representing an UTF-8 string.
@@ -525,22 +525,124 @@ Table of Contents
flag (see Section 2.2) is set.
-6. Collection types
+6. Data Type Serialization Formats
- This section describe the serialization format for the collection types:
- list, map and set. This serialization format is both useful to decode values
- returned in RESULT messages but also to encode values for EXECUTE ones.
+ This sections describes the serialization formats for all CQL data types
+ supported by Cassandra through the native protocol. These serialization
+ formats should be used by client drivers to encode values for EXECUTE
+ messages. Cassandra will use these formats when returning values in
+ RESULT messages.
- The serialization formats are:
- List: a [short] n indicating the size of the list, followed by n elements.
- Each element is [short bytes] representing the serialized element
- value.
- Map: a [short] n indicating the size of the map, followed by n entries.
- Each entry is composed of two [short bytes] representing the key and
- the value of the entry map.
- Set: a [short] n indicating the size of the set, followed by n elements.
- Each element is [short bytes] representing the serialized element
- value.
+ All values are represented as [bytes] in EXECUTE and RESULT messages.
+ The [bytes] format includes an int prefix denoting the length of the value.
+ For that reason, the serialization formats described here will not include
+ a length component.
+
+ For legacy compatibility reasons, note that most non-string types support
+ "empty" values (i.e. a value with zero length). An empty value is distinct
+ from NULL, which is encoded with a negative length.
+
+ As with the rest of the native protocol, all encodings are big-endian.
+
+6.1. ascii
+
+ A sequence of bytes in the ASCII range [0, 127]. Bytes with values outside of
+ this range will result in a validation error.
+
+6.2 bigint
+
+ An eight-byte two's complement integer.
+
+6.3 blob
+
+ Any sequence of bytes.
+
+6.4 boolean
+
+ A single byte. A value of 0 denotes "false"; any other value denotes "true".
+ (However, it is recommended that a value of 1 be used to represent "true".)
+
+6.5 decimal
+
+ The decimal format represents an arbitrary-precision number. It contains an
+ [int] "scale" component followed by a varint encoding (see section 6.17)
+ of the unscaled value. The encoded value represents "<unscaled>E<-scale>".
+ In other words, "<unscaled> * 10 ^ (-1 * <scale>)".
+
+6.6 double
+
+ An eight-byte floating point number in the IEEE 754 binary64 format.
+
+6.7 float
+
+ An four-byte floating point number in the IEEE 754 binary32 format.
+
+6.8 inet
+
+ A 4 byte or 16 byte sequence denoting an IPv4 or IPv6 address, respectively.
+
+6.9 int
+
+ A four-byte two's complement integer.
+
+6.10 list
+
+ A [short] n indicating the number of elements in the list, followed by n
+ elements. Each element is [short bytes] representing the serialized value.
+
+6.11 map
+
+ A [short] n indicating the number of key/value pairs in the map, followed by
+ n entries. Each entry is composed of two [short bytes] representing the key
+ and value.
+
+6.12 set
+
+ A [short] n indicating the number of elements in the set, followed by n
+ elements. Each element is [short bytes] representing the serialized value.
+
+6.13 text
+
+ A sequence of bytes conforming to the UTF-8 specifications.
+
+6.14 timestamp
+
+ An eight-byte two's complement integer representing a millisecond-precision
+ offset from the unix epoch (00:00:00, January 1st, 1970). Negative values
+ represent a negative offset from the epoch.
+
+6.15 uuid
+
+ A 16 byte sequence representing any valid UUID as defined by RFC 4122.
+
+6.16 varchar
+
+ An alias of the "text" type.
+
+6.17 varint
+
+ A variable-length two's complement encoding of a signed integer.
+
+ The following examples may help implementors of this spec:
+
+ Value | Encoding
+ ------|---------
+ 0 | 0x00
+ 1 | 0x01
+ 127 | 0x7F
+ 128 | 0x0080
+ -1 | 0xFF
+ -128 | 0x80
+ -129 | 0xFF7F
+
+ Note that positive numbers must use a most-significant byte with a value
+ less than 0x80, because a most-significant bit of 1 indicates a negative
+ value. Implementors should pad positive values that have a MSB >= 0x80
+ with a leading 0x00 byte.
+
+6.18 timeuuid
+
+ A 16 byte sequence representing a version 1 UUID as defined by RFC 4122.
7. Error codes
http://git-wip-us.apache.org/repos/asf/cassandra/blob/782b0b61/doc/native_protocol_v2.spec
----------------------------------------------------------------------
diff --git a/doc/native_protocol_v2.spec b/doc/native_protocol_v2.spec
index 11d380f..ef54099 100644
--- a/doc/native_protocol_v2.spec
+++ b/doc/native_protocol_v2.spec
@@ -37,7 +37,7 @@ Table of Contents
4.2.7. AUTH_CHALLENGE
4.2.8. AUTH_SUCCESS
5. Compression
- 6. Collection types
+ 6. Data Type Serialization Formats
7. Result paging
8. Error codes
9. Changes from v1
@@ -186,8 +186,8 @@ Table of Contents
To describe the layout of the frame body for the messages in Section 4, we
define the following:
- [int] A 4 bytes integer
- [short] A 2 bytes unsigned integer
+ [int] A 4 byte integer
+ [short] A 2 byte unsigned integer
[string] A [short] n, followed by n bytes representing an UTF-8
string.
[long string] An [int] n, followed by n bytes representing an UTF-8 string.
@@ -673,22 +673,125 @@ Table of Contents
avaivable on some installation.
-6. Collection types
+6. Data Type Serialization Formats
- This section describe the serialization format for the collection types:
- list, map and set. This serialization format is both useful to decode values
- returned in RESULT messages but also to encode values for EXECUTE ones.
+ This sections describes the serialization formats for all CQL data types
+ supported by Cassandra through the native protocol. These serialization
+ formats should be used by client drivers to encode values for EXECUTE
+ messages. Cassandra will use these formats when returning values in
+ RESULT messages.
- The serialization formats are:
- List: a [short] n indicating the size of the list, followed by n elements.
- Each element is [short bytes] representing the serialized element
- value.
- Map: a [short] n indicating the size of the map, followed by n entries.
- Each entry is composed of two [short bytes] representing the key and
- the value of the entry map.
- Set: a [short] n indicating the size of the set, followed by n elements.
- Each element is [short bytes] representing the serialized element
- value.
+ All values are represented as [bytes] in EXECUTE and RESULT messages.
+ The [bytes] format includes an int prefix denoting the length of the value.
+ For that reason, the serialization formats described here will not include
+ a length component.
+
+ For legacy compatibility reasons, note that most non-string types support
+ "empty" values (i.e. a value with zero length). An empty value is distinct
+ from NULL, which is encoded with a negative length.
+
+ As with the rest of the native protocol, all encodings are big-endian.
+
+6.1. ascii
+
+ A sequence of bytes in the ASCII range [0, 127]. Bytes with values outside of
+ this range will result in a validation error.
+
+6.2 bigint
+
+ An eight-byte two's complement integer.
+
+6.3 blob
+
+ Any sequence of bytes.
+
+6.4 boolean
+
+ A single byte. A value of 0 denotes "false"; any other value denotes "true".
+ (However, it is recommended that a value of 1 be used to represent "true".)
+
+6.5 decimal
+
+ The decimal format represents an arbitrary-precision number. It contains an
+ [int] "scale" component followed by a varint encoding (see section 6.17)
+ of the unscaled value. The encoded value represents "<unscaled>E<-scale>".
+ In other words, "<unscaled> * 10 ^ (-1 * <scale>)".
+
+6.6 double
+
+ An eight-byte floating point number in the IEEE 754 binary64 format.
+
+6.7 float
+
+ An four-byte floating point number in the IEEE 754 binary32 format.
+
+6.8 inet
+
+ A 4 byte or 16 byte sequence denoting an IPv4 or IPv6 address, respectively.
+
+6.9 int
+
+ A four-byte two's complement integer.
+
+6.10 list
+
+ A [short] n indicating the number of elements in the list, followed by n
+ elements. Each element is [short bytes] representing the serialized value.
+
+6.11 map
+
+ A [short] n indicating the number of key/value pairs in the map, followed by
+ n entries. Each entry is composed of two [short bytes] representing the key
+ and value.
+
+6.12 set
+
+ A [short] n indicating the number of elements in the set, followed by n
+ elements. Each element is [short bytes] representing the serialized value.
+
+6.13 text
+
+ A sequence of bytes conforming to the UTF-8 specifications.
+
+6.14 timestamp
+
+ An eight-byte two's complement integer representing a millisecond-precision
+ offset from the unix epoch (00:00:00, January 1st, 1970). Negative values
+ represent a negative offset from the epoch.
+
+6.15 uuid
+
+ A 16 byte sequence representing any valid UUID as defined by RFC 4122.
+
+6.16 varchar
+
+ An alias of the "text" type.
+
+6.17 varint
+
+ A variable-length two's complement encoding of a signed integer.
+
+ The following examples may help implementors of this spec:
+
+ Value | Encoding
+ ------|---------
+ 0 | 0x00
+ 1 | 0x01
+ 127 | 0x7F
+ 128 | 0x0080
+ 129 | 0x0081
+ -1 | 0xFF
+ -128 | 0x80
+ -129 | 0xFF7F
+
+ Note that positive numbers must use a most-significant byte with a value
+ less than 0x80, because a most-significant bit of 1 indicates a negative
+ value. Implementors should pad positive values that have a MSB >= 0x80
+ with a leading 0x00 byte.
+
+6.18 timeuuid
+
+ A 16 byte sequence representing a version 1 UUID as defined by RFC 4122.
7. Result paging