You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by sl...@apache.org on 2013/04/11 15:57:10 UTC
git commit: Fork spec of the native protocol for v2

Updated Branches:
  refs/heads/trunk be78b3a5b -> 3fdd46476


Fork spec of the native protocol for v2


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/3fdd4647
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/3fdd4647
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/3fdd4647

Branch: refs/heads/trunk
Commit: 3fdd464766a0d86e3d88a2d210cf9031c1c15876
Parents: be78b3a
Author: Sylvain Lebresne <sy...@datastax.com>
Authored: Thu Apr 11 15:57:01 2013 +0200
Committer: Sylvain Lebresne <sy...@datastax.com>
Committed: Thu Apr 11 15:57:01 2013 +0200

----------------------------------------------------------------------
 doc/native_protocol.spec    |  635 -------------------------------------
 doc/native_protocol_v1.spec |  635 +++++++++++++++++++++++++++++++++++++
 doc/native_protocol_v2.spec |  640 ++++++++++++++++++++++++++++++++++++++
 3 files changed, 1275 insertions(+), 635 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/cassandra/blob/3fdd4647/doc/native_protocol.spec
----------------------------------------------------------------------
diff --git a/doc/native_protocol.spec b/doc/native_protocol.spec
deleted file mode 100644
index fb709e3..0000000
--- a/doc/native_protocol.spec
+++ /dev/null
@@ -1,635 +0,0 @@
-
-                             CQL BINARY PROTOCOL v1
-
-
-Table of Contents
-
-  1. Overview
-  2. Frame header
-    2.1. version
-    2.2. flags
-    2.3. stream
-    2.4. opcode
-    2.5. length
-  3. Notations
-  4. Messages
-    4.1. Requests
-      4.1.1. STARTUP
-      4.1.2. CREDENTIALS
-      4.1.3. OPTIONS
-      4.1.4. QUERY
-      4.1.5. PREPARE
-      4.1.6. EXECUTE
-      4.1.7. REGISTER
-    4.2. Responses
-      4.2.1. ERROR
-      4.2.2. READY
-      4.2.3. AUTHENTICATE
-      4.2.4. SUPPORTED
-      4.2.5. RESULT
-        4.2.5.1. Void
-        4.2.5.2. Rows
-        4.2.5.3. Set_keyspace
-        4.2.5.4. Prepared
-        4.2.5.5. Schema_change
-      4.2.6. EVENT
-  5. Compression
-  6. Collection types
-  7. Error codes
-
-
-1. Overview
-
-  The CQL binary protocol is a frame based protocol. Frames are defined as:
-
-      0         8        16        24        32
-      +---------+---------+---------+---------+
-      | version |  flags  | stream  | opcode  |
-      +---------+---------+---------+---------+
-      |                length                 |
-      +---------+---------+---------+---------+
-      |                                       |
-      .            ...  body ...              .
-      .                                       .
-      .                                       .
-      +----------------------------------------
-
-  The protocol is big-endian (network byte order).
-
-  Each frame contains a fixed size header (8 bytes) followed by a variable size
-  body. The header is described in Section 2. The content of the body depends
-  on the header opcode value (the body can in particular be empty for some
-  opcode values). The list of allowed opcode is defined Section 2.3 and the
-  details of each corresponding message is described Section 4.
-
-  The protocol distinguishes 2 types of frames: requests and responses. Requests
-  are those frame sent by the clients to the server, response are the ones sent
-  by the server. Note however that while communication are initiated by the
-  client with the server responding to request, the protocol may likely add
-  server pushes in the future, so responses does not obligatory come right after
-  a client request.
-
-  Note to client implementors: clients library should always assume that the
-  body of a given frame may contain more data than what is described in this
-  document. It will however always be safe to ignore the remaining of the frame
-  body in such cases. The reason is that this may allow to sometimes extend the
-  protocol with optional features without needing to change the protocol
-  version.
-
-
-2. Frame header
-
-2.1. version
-
-  The version is a single byte that indicate both the direction of the message
-  (request or response) and the version of the protocol in use. The up-most bit
-  of version is used to define the direction of the message: 0 indicates a
-  request, 1 indicates a responses. This can be useful for protocol analyzers to
-  distinguish the nature of the packet from the direction which it is moving.
-  The rest of that byte is the protocol version (1 for the protocol defined in
-  this document). In other words, for this version of the protocol, version will
-  have one of:
-    0x01    Request frame for this protocol version
-    0x81    Response frame for this protocol version
-
-
-2.2. flags
-
-  Flags applying to this frame. The flags have the following meaning (described
-  by the mask that allow to select them):
-    0x01: Compression flag. If set, the frame body is compressed. The actual
-          compression to use should have been set up beforehand through the
-          Startup message (which thus cannot be compressed; Section 4.1.1).
-    0x02: Tracing flag. For a request frame, this indicate the client requires
-          tracing of the request. Note that not all requests support tracing.
-          Currently, only QUERY, PREPARE and EXECUTE queries support tracing.
-          Other requests will simply ignore the tracing flag if set. If a
-          request support tracing and the tracing flag was set, the response to
-          this request will have the tracing flag set and contain tracing
-          information.
-          If a response frame has the tracing flag set, its body contains
-          a tracing ID. The tracing ID is a [uuid] and is the first thing in
-          the frame body. The rest of the body will then be the usual body
-          corresponding to the response opcode.
-
-  The rest of the flags is currently unused and ignored.
-
-2.3. stream
-
-  A frame has a stream id (one signed byte). When sending request messages, this
-  stream id must be set by the client to a positive byte (negative stream id
-  are reserved for streams initiated by the server; currently all EVENT messages
-  (section 4.2.6) have a streamId of -1). If a client sends a request message
-  with the stream id X, it is guaranteed that the stream id of the response to
-  that message will be X.
-
-  This allow to deal with the asynchronous nature of the protocol. If a client
-  sends multiple messages simultaneously (without waiting for responses), there
-  is no guarantee on the order of the responses. For instance, if the client
-  writes REQ_1, REQ_2, REQ_3 on the wire (in that order), the server might
-  respond to REQ_3 (or REQ_2) first. Assigning different stream id to these 3
-  requests allows the client to distinguish to which request an received answer
-  respond to. As there can only be 128 different simultaneous stream, it is up
-  to the client to reuse stream id.
-
-  Note that clients are free to use the protocol synchronously (i.e. wait for
-  the response to REQ_N before sending REQ_N+1). In that case, the stream id
-  can be safely set to 0. Clients should also feel free to use only a subset of
-  the 128 maximum possible stream ids if it is simpler for those
-  implementation.
-
-2.4. opcode
-
-  An integer byte that distinguish the actual message:
-    0x00    ERROR
-    0x01    STARTUP
-    0x02    READY
-    0x03    AUTHENTICATE
-    0x04    CREDENTIALS
-    0x05    OPTIONS
-    0x06    SUPPORTED
-    0x07    QUERY
-    0x08    RESULT
-    0x09    PREPARE
-    0x0A    EXECUTE
-    0x0B    REGISTER
-    0x0C    EVENT
-
-  Messages are described in Section 4.
-
-
-2.5. length
-
-  A 4 byte integer representing the length of the body of the frame (note:
-  currently a frame is limited to 256MB in length).
-
-
-3. Notations
-
-  To describe the layout of the frame body for the messages in Section 4, we
-  define the following:
-
-    [int]          A 4 bytes integer
-    [short]        A 2 bytes unsigned integer
-    [string]       A [short] n, followed by n bytes representing an UTF-8
-                   string.
-    [long string]  An [int] n, followed by n bytes representing an UTF-8 string.
-    [uuid]         A 16 bytes long uuid.
-    [string list]  A [short] n, followed by n [string].
-    [bytes]        A [int] n, followed by n bytes if n >= 0. If n < 0,
-                   no byte should follow and the value represented is `null`.
-    [short bytes]  A [short] n, followed by n bytes if n >= 0.
-
-    [option]       A pair of <id><value> where <id> is a [short] representing
-                   the option id and <value> depends on that option (and can be
-                   of size 0). The supported id (and the corresponding <value>)
-                   will be described when this is used.
-    [option list]  A [short] n, followed by n [option].
-    [inet]         An address (ip and port) to a node. It consists of one
-                   [byte] n, that represents the address size, followed by n
-                   [byte] representing the IP address (in practice n can only be
-                   either 4 (IPv4) or 16 (IPv6)), following by one [int]
-                   representing the port.
-    [consistency]  A consistency level specification. This is a [short]
-                   representing a consistency level with the following
-                   correspondance:
-                     0x0000    ANY
-                     0x0001    ONE
-                     0x0002    TWO
-                     0x0003    THREE
-                     0x0004    QUORUM
-                     0x0005    ALL
-                     0x0006    LOCAL_QUORUM
-                     0x0007    EACH_QUORUM
-
-    [string map]      A [short] n, followed by n pair <k><v> where <k> and <v>
-                      are [string].
-    [string multimap] A [short] n, followed by n pair <k><v> where <k> is a
-                      [string] and <v> is a [string list].
-
-
-4. Messages
-
-4.1. Requests
-
-  Note that outside of their normal responses (described below), all requests
-  can get an ERROR message (Section 4.2.1) as response.
-
-4.1.1. STARTUP
-
-  Initialize the connection. The server will respond by either a READY message
-  (in which case the connection is ready for queries) or an AUTHENTICATE message
-  (in which case credentials will need to be provided using CREDENTIALS).
-
-  This must be the first message of the connection, except for OPTIONS that can
-  be sent before to find out the options supported by the server. Once the
-  connection has been initialized, a client should not send any more STARTUP
-  message.
-
-  The body is a [string map] of options. Possible options are:
-    - "CQL_VERSION": the version of CQL to use. This option is mandatory and
-      currenty, the only version supported is "3.0.0". Note that this is
-      different from the protocol version.
-    - "COMPRESSION": the compression algorithm to use for frames (See section 5).
-      This is optional, if not specified no compression will be used.
-
-
-4.1.2. CREDENTIALS
-
-  Provides credentials information for the purpose of identification. This
-  message comes as a response to an AUTHENTICATE message from the server, but
-  can be use later in the communication to change the authentication
-  information.
-
-  The body is a list of key/value informations. It is a [short] n, followed by n
-  pair of [string]. These key/value pairs are passed as is to the Cassandra
-  IAuthenticator and thus the detail of which informations is needed depends on
-  that authenticator.
-
-  The response to a CREDENTIALS is a READY message (or an ERROR message).
-
-
-4.1.3. OPTIONS
-
-  Asks the server to return what STARTUP options are supported. The body of an
-  OPTIONS message should be empty and the server will respond with a SUPPORTED
-  message.
-
-
-4.1.4. QUERY
-
-  Performs a CQL query. The body of the message consists of a CQL query as a [long
-  string] followed by the [consistency] for the operation.
-
-  Note that the consistency is ignored by some queries (USE, CREATE, ALTER,
-  TRUNCATE, ...).
-
-  The server will respond to a QUERY message with a RESULT message, the content
-  of which depends on the query.
-
-
-4.1.5. PREPARE
-
-  Prepare a query for later execution (through EXECUTE). The body consists of
-  the CQL query to prepare as a [long string].
-
-  The server will respond with a RESULT message with a `prepared` kind (0x00003,
-  see Section 4.2.5).
-
-
-4.1.6. EXECUTE
-
-  Executes a prepared query. The body of the message must be:
-    <id><n><value_1>....<value_n><consistency>
-  where:
-    - <id> is the prepared query ID. It's the [short bytes] returned as a
-      response to a PREPARE message.
-    - <n> is a [short] indicating the number of following values.
-    - <value_1>...<value_n> are the [bytes] to use for bound variables in the
-      prepared query.
-    - <consistency> is the [consistency] level for the operation.
-
-  Note that the consistency is ignored by some (prepared) queries (USE, CREATE,
-  ALTER, TRUNCATE, ...).
-
-  The response from the server will be a RESULT message.
-
-
-4.1.7. REGISTER
-
-  Register this connection to receive some type of events. The body of the
-  message is a [string list] representing the event types to register to. See
-  section 4.2.6 for the list of valid event types.
-
-  The response to a REGISTER message will be a READY message.
-
-  Please note that if a client driver maintains multiple connections to a
-  Cassandra node and/or connections to multiple nodes, it is advised to
-  dedicate a handful of connections to receive events, but to *not* register
-  for events on all connections, as this would only result in receiving
-  multiple times the same event messages, wasting bandwidth.
-
-
-4.2. Responses
-
-  This section describes the content of the frame body for the different
-  responses. Please note that to make room for future evolution, clients should
-  support extra informations (that they should simply discard) to the one
-  described in this document at the end of the frame body.
-
-4.2.1. ERROR
-
-  Indicates an error processing a request. The body of the message will be an
-  error code ([int]) followed by a [string] error message. Then, depending on
-  the exception, more content may follow. The error codes are defined in
-  Section 7, along with their additional content if any.
-
-
-4.2.2. READY
-
-  Indicates that the server is ready to process queries. This message will be
-  sent by the server either after a STARTUP message if no authentication is
-  required, or after a successful CREDENTIALS message.
-
-  The body of a READY message is empty.
-
-
-4.2.3. AUTHENTICATE
-
-  Indicates that the server require authentication. This will be sent following
-  a STARTUP message and must be answered by a CREDENTIALS message from the
-  client to provide authentication informations.
-
-  The body consists of a single [string] indicating the full class name of the
-  IAuthenticator in use.
-
-
-4.2.4. SUPPORTED
-
-  Indicates which startup options are supported by the server. This message
-  comes as a response to an OPTIONS message.
-
-  The body of a SUPPORTED message is a [string multimap]. This multimap gives
-  for each of the supported STARTUP options, the list of supported values.
-
-
-4.2.5. RESULT
-
-  The result to a query (QUERY, PREPARE or EXECUTE messages).
-
-  The first element of the body of a RESULT message is an [int] representing the
-  `kind` of result. The rest of the body depends on the kind. The kind can be
-  one of:
-    0x0001    Void: for results carrying no information.
-    0x0002    Rows: for results to select queries, returning a set of rows.
-    0x0003    Set_keyspace: the result to a `use` query.
-    0x0004    Prepared: result to a PREPARE message.
-    0x0005    Schema_change: the result to a schema altering query.
-
-  The body for each kind (after the [int] kind) is defined below.
-
-
-4.2.5.1. Void
-
-  The rest of the body for a Void result is empty. It indicates that a query was
-  successful without providing more information.
-
-
-4.2.5.2. Rows
-
-  Indicates a set of rows. The rest of body of a Rows result is:
-    <metadata><rows_count><rows_content>
-  where:
-    - <metadata> is composed of:
-        <flags><columns_count><global_table_spec>?<col_spec_1>...<col_spec_n>
-      where:
-        - <flags> is an [int]. The bits of <flags> provides information on the
-          formatting of the remaining informations. A flag is set if the bit
-          corresponding to its `mask` is set. Supported flags are, given there
-          mask:
-            0x0001    Global_tables_spec: if set, only one table spec (keyspace
-                      and table name) is provided as <global_table_spec>. If not
-                      set, <global_table_spec> is not present.
-        - <columns_count> is an [int] representing the number of columns selected
-          by the query this result is of. It defines the number of <col_spec_i>
-          elements in and the number of element for each row in <rows_content>.
-        - <global_table_spec> is present if the Global_tables_spec is set in
-          <flags>. If present, it is composed of two [string] representing the
-          (unique) keyspace name and table name the columns return are of.
-        - <col_spec_i> specifies the columns returned in the query. There is
-          <column_count> such column specification that are composed of:
-            (<ksname><tablename>)?<column_name><type>
-          The initial <ksname> and <tablename> are two [string] are only present
-          if the Global_tables_spec flag is not set. The <column_name> is a
-          [string] and <type> is an [option] that correspond to the column name
-          and type. The option for <type> is either a native type (see below),
-          in which case the option has no value, or a 'custom' type, in which
-          case the value is a [string] representing the full qualified class
-          name of the type represented. Valid option ids are:
-            0x0000    Custom: the value is a [string], see above.
-            0x0001    Ascii
-            0x0002    Bigint
-            0x0003    Blob
-            0x0004    Boolean
-            0x0005    Counter
-            0x0006    Decimal
-            0x0007    Double
-            0x0008    Float
-            0x0009    Int
-            0x000A    Text
-            0x000B    Timestamp
-            0x000C    Uuid
-            0x000D    Varchar
-            0x000E    Varint
-            0x000F    Timeuuid
-            0x0010    Inet
-            0x0020    List: the value is an [option], representing the type
-                            of the elements of the list.
-            0x0021    Map: the value is two [option], representing the types of the
-                           keys and values of the map
-            0x0022    Set: the value is an [option], representing the type
-                            of the elements of the set
-    - <rows_count> is an [int] representing the number of rows present in this
-      result. Those rows are serialized in the <rows_content> part.
-    - <rows_content> is composed of <row_1>...<row_m> where m is <rows_count>.
-      Each <row_i> is composed of <value_1>...<value_n> where n is
-      <columns_count> and where <value_j> is a [bytes] representing the value
-      returned for the jth column of the ith row. In other words, <rows_content>
-      is composed of (<rows_count> * <columns_count>) [bytes].
-
-
-4.2.5.3. Set_keyspace
-
-  The result to a `use` query. The body (after the kind [int]) is a single
-  [string] indicating the name of the keyspace that has been set.
-
-
-4.2.5.4. Prepared
-
-  The result to a PREPARE message. The rest of the body of a Prepared result is:
-    <id><metadata>
-  where:
-    - <id> is [short bytes] representing the prepared query ID.
-    - <metadata> is defined exactly as for a Rows RESULT (See section 4.2.5.2).
-
-  Note that prepared query ID return is global to the node on which the query
-  has been prepared. It can be used on any connection to that node and this
-  until the node is restarted (after which the query must be reprepared).
-
-4.2.5.5. Schema_change
-
-  The result to a schema altering query (creation/update/drop of a
-  keyspace/table/index). The body (after the kind [int]) is composed of 3
-  [string]:
-    <change><keyspace><table>
-  where:
-    - <change> describe the type of change that has occured. It can be one of
-      "CREATED", "UPDATED" or "DROPPED".
-    - <keyspace> is the name of the affected keyspace or the keyspace of the
-      affected table.
-    - <table> is the name of the affected table. <table> will be empty (i.e.
-      the empty string "") if the change was affecting a keyspace and not a
-      table.
-
-  Note that queries to create and drop an index are considered as change
-  updating the table the index is on.
-
-
-4.2.6. EVENT
-
-  And event pushed by the server. A client will only receive events for the
-  type it has REGISTER to. The body of an EVENT message will start by a
-  [string] representing the event type. The rest of the message depends on the
-  event type. The valid event types are:
-    - "TOPOLOGY_CHANGE": events related to change in the cluster topology.
-      Currently, events are sent when new nodes are added to the cluster, and
-      when nodes are removed. The body of the message (after the event type)
-      consists of a [string] and an [inet], corresponding respectively to the
-      type of change ("NEW_NODE" or "REMOVED_NODE") followed by the address of
-      the new/removed node.
-    - "STATUS_CHANGE": events related to change of node status. Currently,
-      up/down events are sent. The body of the message (after the event type)
-      consists of a [string] and an [inet], corresponding respectively to the
-      type of status change ("UP" or "DOWN") followed by the address of the
-      concerned node.
-    - "SCHEMA_CHANGE": events related to schema change. The body of the message
-      (after the event type) consists of 3 [string] corresponding respectively
-      to the type of schema change ("CREATED", "UPDATED" or "DROPPED"),
-      followed by the name of the affected keyspace and the name of the
-      affected table within that keyspace. For changes that affect a keyspace
-      directly, the table name will be empty (i.e. the empty string "").
-
-  All EVENT message have a streamId of -1 (Section 2.3).
-
-  Please note that "NEW_NODE" and "UP" events are sent based on internal Gossip
-  communication and as such may be sent a short delay before the binary
-  protocol server on the newly up node is fully started. Clients are thus
-  advise to wait a short time before trying to connect to the node (1 seconds
-  should be enough), otherwise they may experience a connection refusal at
-  first.
-
-
-5. Compression
-
-  Frame compression is supported by the protocol, but then only the frame body
-  is compressed (the frame header should never be compressed).
-
-  Before being used, client and server must agree on a compression algorithm to
-  use, which is done in the STARTUP message. As a consequence, a STARTUP message
-  must never be compressed.  However, once the STARTUP frame has been received
-  by the server can be compressed (including the response to the STARTUP
-  request). Frame do not have to be compressed however, even if compression has
-  been agreed upon (a server may only compress frame above a certain size at its
-  discretion). A frame body should be compressed if and only if the compressed
-  flag (see Section 2.2) is set.
-
-
-6. Collection types
-
-  This section describe the serialization format for the collection types:
-  list, map and set. This serialization format is both useful to decode values
-  returned in RESULT messages but also to encode values for EXECUTE ones.
-
-  The serialization formats are:
-     List: a [short] n indicating the size of the list, followed by n elements.
-           Each element is [short bytes] representing the serialized element
-           value.
-     Map: a [short] n indicating the size of the map, followed by n entries.
-          Each entry is composed of two [short bytes] representing the key and
-          the value of the entry map.
-     Set: a [short] n indicating the size of the set, followed by n elements.
-          Each element is [short bytes] representing the serialized element
-          value.
-
-
-7. Error codes
-
-  The supported error codes are described below:
-    0x0000    Server error: something unexpected happened. This indicates a
-              server-side bug.
-    0x000A    Protocol error: some client message triggered a protocol
-              violation (for instance a QUERY message is sent before a STARTUP
-              one has been sent)
-    0x0100    Bad credentials: CREDENTIALS request failed because Cassandra
-              did not accept the provided credentials.
-
-    0x1000    Unavailable exception. The rest of the ERROR message body will be
-                <cl><required><alive>
-              where:
-                <cl> is the [consistency] level of the query having triggered
-                     the exception.
-                <required> is an [int] representing the number of node that
-                           should be alive to respect <cl>
-                <alive> is an [int] representing the number of replica that
-                        were known to be alive when the request has been
-                        processed (since an unavailable exception has been
-                        triggered, there will be <alive> < <required>)
-    0x1001    Overloaded: the request cannot be processed because the
-              coordinator node is overloaded
-    0x1002    Is_bootstrapping: the request was a read request but the
-              coordinator node is bootstrapping
-    0x1003    Truncate_error: error during a truncation error.
-    0x1100    Write_timeout: Timeout exception during a write request. The rest
-              of the ERROR message body will be
-                <cl><received><blockfor><writeType>
-              where:
-                <cl> is the [consistency] level of the query having triggered
-                     the exception.
-                <received> is an [int] representing the number of nodes having
-                           acknowledged the request.
-                <blockfor> is the number of replica whose acknowledgement is
-                           required to achieve <cl>.
-                <writeType> is a [string] that describe the type of the write
-                            that timeouted. The value of that string can be one
-                            of:
-                             - "SIMPLE": the write was a non-batched
-                               non-counter write.
-                             - "BATCH": the write was a (logged) batch write.
-                               If this type is received, it means the batch log
-                               has been successfully written (otherwise a
-                               "BATCH_LOG" type would have been send instead).
-                             - "UNLOGGED_BATCH": the write was an unlogged
-                               batch. Not batch log write has been attempted.
-                             - "COUNTER": the write was a counter write
-                               (batched or not).
-                             - "BATCH_LOG": the timeout occured during the
-                               write to the batch log when a (logged) batch
-                               write was requested.
-    0x1200    Read_timeout: Timeout exception during a read request. The rest
-              of the ERROR message body will be
-                <cl><received><blockfor><data_present>
-              where:
-                <cl> is the [consistency] level of the query having triggered
-                     the exception.
-                <received> is an [int] representing the number of nodes having
-                           answered the request.
-                <blockfor> is the number of replica whose response is
-                           required to achieve <cl>. Please note that it is
-                           possible to have <received> >= <blockfor> if
-                           <data_present> is false. And also in the (unlikely)
-                           case were <cl> is achieved but the coordinator node
-                           timeout while waiting for read-repair
-                           acknowledgement.
-                <data_present> is a single byte. If its value is 0, it means
-                               the replica that was asked for data has not
-                               responded. Otherwise, the value is != 0.
-
-    0x2000    Syntax_error: The submitted query has a syntax error.
-    0x2100    Unauthorized: The logged user doesn't have the right to perform
-              the query.
-    0x2200    Invalid: The query is syntactically correct but invalid.
-    0x2300    Config_error: The query is invalid because of some configuration issue
-    0x2400    Already_exists: The query attempted to create a keyspace or a
-              table that was already existing. The rest of the ERROR message
-              body will be <ks><table> where:
-                <ks> is a [string] representing either the keyspace that
-                     already exists, or the keyspace in which the table that
-                     already exists is.
-                <table> is a [string] representing the name of the table that
-                        already exists. If the query was attempting to create a
-                        keyspace, <table> will be present but will be the empty
-                        string.
-    0x2500    Unprepared: Can be thrown while a prepared statement tries to be
-              executed if the provide prepared statement ID is not known by
-              this host. The rest of the ERROR message body will be [short
-              bytes] representing the unknown ID.

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3fdd4647/doc/native_protocol_v1.spec
----------------------------------------------------------------------
diff --git a/doc/native_protocol_v1.spec b/doc/native_protocol_v1.spec
new file mode 100644
index 0000000..fb709e3
--- /dev/null
+++ b/doc/native_protocol_v1.spec
@@ -0,0 +1,635 @@
+
+                             CQL BINARY PROTOCOL v1
+
+
+Table of Contents
+
+  1. Overview
+  2. Frame header
+    2.1. version
+    2.2. flags
+    2.3. stream
+    2.4. opcode
+    2.5. length
+  3. Notations
+  4. Messages
+    4.1. Requests
+      4.1.1. STARTUP
+      4.1.2. CREDENTIALS
+      4.1.3. OPTIONS
+      4.1.4. QUERY
+      4.1.5. PREPARE
+      4.1.6. EXECUTE
+      4.1.7. REGISTER
+    4.2. Responses
+      4.2.1. ERROR
+      4.2.2. READY
+      4.2.3. AUTHENTICATE
+      4.2.4. SUPPORTED
+      4.2.5. RESULT
+        4.2.5.1. Void
+        4.2.5.2. Rows
+        4.2.5.3. Set_keyspace
+        4.2.5.4. Prepared
+        4.2.5.5. Schema_change
+      4.2.6. EVENT
+  5. Compression
+  6. Collection types
+  7. Error codes
+
+
+1. Overview
+
+  The CQL binary protocol is a frame based protocol. Frames are defined as:
+
+      0         8        16        24        32
+      +---------+---------+---------+---------+
+      | version |  flags  | stream  | opcode  |
+      +---------+---------+---------+---------+
+      |                length                 |
+      +---------+---------+---------+---------+
+      |                                       |
+      .            ...  body ...              .
+      .                                       .
+      .                                       .
+      +----------------------------------------
+
+  The protocol is big-endian (network byte order).
+
+  Each frame contains a fixed size header (8 bytes) followed by a variable size
+  body. The header is described in Section 2. The content of the body depends
+  on the header opcode value (the body can in particular be empty for some
+  opcode values). The list of allowed opcode is defined Section 2.3 and the
+  details of each corresponding message is described Section 4.
+
+  The protocol distinguishes 2 types of frames: requests and responses. Requests
+  are those frame sent by the clients to the server, response are the ones sent
+  by the server. Note however that while communication are initiated by the
+  client with the server responding to request, the protocol may likely add
+  server pushes in the future, so responses does not obligatory come right after
+  a client request.
+
+  Note to client implementors: clients library should always assume that the
+  body of a given frame may contain more data than what is described in this
+  document. It will however always be safe to ignore the remaining of the frame
+  body in such cases. The reason is that this may allow to sometimes extend the
+  protocol with optional features without needing to change the protocol
+  version.
+
+
+2. Frame header
+
+2.1. version
+
+  The version is a single byte that indicate both the direction of the message
+  (request or response) and the version of the protocol in use. The up-most bit
+  of version is used to define the direction of the message: 0 indicates a
+  request, 1 indicates a responses. This can be useful for protocol analyzers to
+  distinguish the nature of the packet from the direction which it is moving.
+  The rest of that byte is the protocol version (1 for the protocol defined in
+  this document). In other words, for this version of the protocol, version will
+  have one of:
+    0x01    Request frame for this protocol version
+    0x81    Response frame for this protocol version
+
+
+2.2. flags
+
+  Flags applying to this frame. The flags have the following meaning (described
+  by the mask that allow to select them):
+    0x01: Compression flag. If set, the frame body is compressed. The actual
+          compression to use should have been set up beforehand through the
+          Startup message (which thus cannot be compressed; Section 4.1.1).
+    0x02: Tracing flag. For a request frame, this indicate the client requires
+          tracing of the request. Note that not all requests support tracing.
+          Currently, only QUERY, PREPARE and EXECUTE queries support tracing.
+          Other requests will simply ignore the tracing flag if set. If a
+          request support tracing and the tracing flag was set, the response to
+          this request will have the tracing flag set and contain tracing
+          information.
+          If a response frame has the tracing flag set, its body contains
+          a tracing ID. The tracing ID is a [uuid] and is the first thing in
+          the frame body. The rest of the body will then be the usual body
+          corresponding to the response opcode.
+
+  The rest of the flags is currently unused and ignored.
+
+2.3. stream
+
+  A frame has a stream id (one signed byte). When sending request messages, this
+  stream id must be set by the client to a positive byte (negative stream id
+  are reserved for streams initiated by the server; currently all EVENT messages
+  (section 4.2.6) have a streamId of -1). If a client sends a request message
+  with the stream id X, it is guaranteed that the stream id of the response to
+  that message will be X.
+
+  This allow to deal with the asynchronous nature of the protocol. If a client
+  sends multiple messages simultaneously (without waiting for responses), there
+  is no guarantee on the order of the responses. For instance, if the client
+  writes REQ_1, REQ_2, REQ_3 on the wire (in that order), the server might
+  respond to REQ_3 (or REQ_2) first. Assigning different stream id to these 3
+  requests allows the client to distinguish to which request an received answer
+  respond to. As there can only be 128 different simultaneous stream, it is up
+  to the client to reuse stream id.
+
+  Note that clients are free to use the protocol synchronously (i.e. wait for
+  the response to REQ_N before sending REQ_N+1). In that case, the stream id
+  can be safely set to 0. Clients should also feel free to use only a subset of
+  the 128 maximum possible stream ids if it is simpler for those
+  implementation.
+
+2.4. opcode
+
+  An integer byte that distinguish the actual message:
+    0x00    ERROR
+    0x01    STARTUP
+    0x02    READY
+    0x03    AUTHENTICATE
+    0x04    CREDENTIALS
+    0x05    OPTIONS
+    0x06    SUPPORTED
+    0x07    QUERY
+    0x08    RESULT
+    0x09    PREPARE
+    0x0A    EXECUTE
+    0x0B    REGISTER
+    0x0C    EVENT
+
+  Messages are described in Section 4.
+
+
+2.5. length
+
+  A 4 byte integer representing the length of the body of the frame (note:
+  currently a frame is limited to 256MB in length).
+
+
+3. Notations
+
+  To describe the layout of the frame body for the messages in Section 4, we
+  define the following:
+
+    [int]          A 4 bytes integer
+    [short]        A 2 bytes unsigned integer
+    [string]       A [short] n, followed by n bytes representing an UTF-8
+                   string.
+    [long string]  An [int] n, followed by n bytes representing an UTF-8 string.
+    [uuid]         A 16 bytes long uuid.
+    [string list]  A [short] n, followed by n [string].
+    [bytes]        A [int] n, followed by n bytes if n >= 0. If n < 0,
+                   no byte should follow and the value represented is `null`.
+    [short bytes]  A [short] n, followed by n bytes if n >= 0.
+
+    [option]       A pair of <id><value> where <id> is a [short] representing
+                   the option id and <value> depends on that option (and can be
+                   of size 0). The supported id (and the corresponding <value>)
+                   will be described when this is used.
+    [option list]  A [short] n, followed by n [option].
+    [inet]         An address (ip and port) to a node. It consists of one
+                   [byte] n, that represents the address size, followed by n
+                   [byte] representing the IP address (in practice n can only be
+                   either 4 (IPv4) or 16 (IPv6)), following by one [int]
+                   representing the port.
+    [consistency]  A consistency level specification. This is a [short]
+                   representing a consistency level with the following
+                   correspondance:
+                     0x0000    ANY
+                     0x0001    ONE
+                     0x0002    TWO
+                     0x0003    THREE
+                     0x0004    QUORUM
+                     0x0005    ALL
+                     0x0006    LOCAL_QUORUM
+                     0x0007    EACH_QUORUM
+
+    [string map]      A [short] n, followed by n pair <k><v> where <k> and <v>
+                      are [string].
+    [string multimap] A [short] n, followed by n pair <k><v> where <k> is a
+                      [string] and <v> is a [string list].
+
+
+4. Messages
+
+4.1. Requests
+
+  Note that outside of their normal responses (described below), all requests
+  can get an ERROR message (Section 4.2.1) as response.
+
+4.1.1. STARTUP
+
+  Initialize the connection. The server will respond by either a READY message
+  (in which case the connection is ready for queries) or an AUTHENTICATE message
+  (in which case credentials will need to be provided using CREDENTIALS).
+
+  This must be the first message of the connection, except for OPTIONS that can
+  be sent before to find out the options supported by the server. Once the
+  connection has been initialized, a client should not send any more STARTUP
+  message.
+
+  The body is a [string map] of options. Possible options are:
+    - "CQL_VERSION": the version of CQL to use. This option is mandatory and
+      currenty, the only version supported is "3.0.0". Note that this is
+      different from the protocol version.
+    - "COMPRESSION": the compression algorithm to use for frames (See section 5).
+      This is optional, if not specified no compression will be used.
+
+
+4.1.2. CREDENTIALS
+
+  Provides credentials information for the purpose of identification. This
+  message comes as a response to an AUTHENTICATE message from the server, but
+  can be use later in the communication to change the authentication
+  information.
+
+  The body is a list of key/value informations. It is a [short] n, followed by n
+  pair of [string]. These key/value pairs are passed as is to the Cassandra
+  IAuthenticator and thus the detail of which informations is needed depends on
+  that authenticator.
+
+  The response to a CREDENTIALS is a READY message (or an ERROR message).
+
+
+4.1.3. OPTIONS
+
+  Asks the server to return what STARTUP options are supported. The body of an
+  OPTIONS message should be empty and the server will respond with a SUPPORTED
+  message.
+
+
+4.1.4. QUERY
+
+  Performs a CQL query. The body of the message consists of a CQL query as a [long
+  string] followed by the [consistency] for the operation.
+
+  Note that the consistency is ignored by some queries (USE, CREATE, ALTER,
+  TRUNCATE, ...).
+
+  The server will respond to a QUERY message with a RESULT message, the content
+  of which depends on the query.
+
+
+4.1.5. PREPARE
+
+  Prepare a query for later execution (through EXECUTE). The body consists of
+  the CQL query to prepare as a [long string].
+
+  The server will respond with a RESULT message with a `prepared` kind (0x00003,
+  see Section 4.2.5).
+
+
+4.1.6. EXECUTE
+
+  Executes a prepared query. The body of the message must be:
+    <id><n><value_1>....<value_n><consistency>
+  where:
+    - <id> is the prepared query ID. It's the [short bytes] returned as a
+      response to a PREPARE message.
+    - <n> is a [short] indicating the number of following values.
+    - <value_1>...<value_n> are the [bytes] to use for bound variables in the
+      prepared query.
+    - <consistency> is the [consistency] level for the operation.
+
+  Note that the consistency is ignored by some (prepared) queries (USE, CREATE,
+  ALTER, TRUNCATE, ...).
+
+  The response from the server will be a RESULT message.
+
+
+4.1.7. REGISTER
+
+  Register this connection to receive some type of events. The body of the
+  message is a [string list] representing the event types to register to. See
+  section 4.2.6 for the list of valid event types.
+
+  The response to a REGISTER message will be a READY message.
+
+  Please note that if a client driver maintains multiple connections to a
+  Cassandra node and/or connections to multiple nodes, it is advised to
+  dedicate a handful of connections to receive events, but to *not* register
+  for events on all connections, as this would only result in receiving
+  multiple times the same event messages, wasting bandwidth.
+
+
+4.2. Responses
+
+  This section describes the content of the frame body for the different
+  responses. Please note that to make room for future evolution, clients should
+  support extra informations (that they should simply discard) to the one
+  described in this document at the end of the frame body.
+
+4.2.1. ERROR
+
+  Indicates an error processing a request. The body of the message will be an
+  error code ([int]) followed by a [string] error message. Then, depending on
+  the exception, more content may follow. The error codes are defined in
+  Section 7, along with their additional content if any.
+
+
+4.2.2. READY
+
+  Indicates that the server is ready to process queries. This message will be
+  sent by the server either after a STARTUP message if no authentication is
+  required, or after a successful CREDENTIALS message.
+
+  The body of a READY message is empty.
+
+
+4.2.3. AUTHENTICATE
+
+  Indicates that the server require authentication. This will be sent following
+  a STARTUP message and must be answered by a CREDENTIALS message from the
+  client to provide authentication informations.
+
+  The body consists of a single [string] indicating the full class name of the
+  IAuthenticator in use.
+
+
+4.2.4. SUPPORTED
+
+  Indicates which startup options are supported by the server. This message
+  comes as a response to an OPTIONS message.
+
+  The body of a SUPPORTED message is a [string multimap]. This multimap gives
+  for each of the supported STARTUP options, the list of supported values.
+
+
+4.2.5. RESULT
+
+  The result to a query (QUERY, PREPARE or EXECUTE messages).
+
+  The first element of the body of a RESULT message is an [int] representing the
+  `kind` of result. The rest of the body depends on the kind. The kind can be
+  one of:
+    0x0001    Void: for results carrying no information.
+    0x0002    Rows: for results to select queries, returning a set of rows.
+    0x0003    Set_keyspace: the result to a `use` query.
+    0x0004    Prepared: result to a PREPARE message.
+    0x0005    Schema_change: the result to a schema altering query.
+
+  The body for each kind (after the [int] kind) is defined below.
+
+
+4.2.5.1. Void
+
+  The rest of the body for a Void result is empty. It indicates that a query was
+  successful without providing more information.
+
+
+4.2.5.2. Rows
+
+  Indicates a set of rows. The rest of body of a Rows result is:
+    <metadata><rows_count><rows_content>
+  where:
+    - <metadata> is composed of:
+        <flags><columns_count><global_table_spec>?<col_spec_1>...<col_spec_n>
+      where:
+        - <flags> is an [int]. The bits of <flags> provides information on the
+          formatting of the remaining informations. A flag is set if the bit
+          corresponding to its `mask` is set. Supported flags are, given there
+          mask:
+            0x0001    Global_tables_spec: if set, only one table spec (keyspace
+                      and table name) is provided as <global_table_spec>. If not
+                      set, <global_table_spec> is not present.
+        - <columns_count> is an [int] representing the number of columns selected
+          by the query this result is of. It defines the number of <col_spec_i>
+          elements in and the number of element for each row in <rows_content>.
+        - <global_table_spec> is present if the Global_tables_spec is set in
+          <flags>. If present, it is composed of two [string] representing the
+          (unique) keyspace name and table name the columns return are of.
+        - <col_spec_i> specifies the columns returned in the query. There is
+          <column_count> such column specification that are composed of:
+            (<ksname><tablename>)?<column_name><type>
+          The initial <ksname> and <tablename> are two [string] are only present
+          if the Global_tables_spec flag is not set. The <column_name> is a
+          [string] and <type> is an [option] that correspond to the column name
+          and type. The option for <type> is either a native type (see below),
+          in which case the option has no value, or a 'custom' type, in which
+          case the value is a [string] representing the full qualified class
+          name of the type represented. Valid option ids are:
+            0x0000    Custom: the value is a [string], see above.
+            0x0001    Ascii
+            0x0002    Bigint
+            0x0003    Blob
+            0x0004    Boolean
+            0x0005    Counter
+            0x0006    Decimal
+            0x0007    Double
+            0x0008    Float
+            0x0009    Int
+            0x000A    Text
+            0x000B    Timestamp
+            0x000C    Uuid
+            0x000D    Varchar
+            0x000E    Varint
+            0x000F    Timeuuid
+            0x0010    Inet
+            0x0020    List: the value is an [option], representing the type
+                            of the elements of the list.
+            0x0021    Map: the value is two [option], representing the types of the
+                           keys and values of the map
+            0x0022    Set: the value is an [option], representing the type
+                            of the elements of the set
+    - <rows_count> is an [int] representing the number of rows present in this
+      result. Those rows are serialized in the <rows_content> part.
+    - <rows_content> is composed of <row_1>...<row_m> where m is <rows_count>.
+      Each <row_i> is composed of <value_1>...<value_n> where n is
+      <columns_count> and where <value_j> is a [bytes] representing the value
+      returned for the jth column of the ith row. In other words, <rows_content>
+      is composed of (<rows_count> * <columns_count>) [bytes].
+
+
+4.2.5.3. Set_keyspace
+
+  The result to a `use` query. The body (after the kind [int]) is a single
+  [string] indicating the name of the keyspace that has been set.
+
+
+4.2.5.4. Prepared
+
+  The result to a PREPARE message. The rest of the body of a Prepared result is:
+    <id><metadata>
+  where:
+    - <id> is [short bytes] representing the prepared query ID.
+    - <metadata> is defined exactly as for a Rows RESULT (See section 4.2.5.2).
+
+  Note that prepared query ID return is global to the node on which the query
+  has been prepared. It can be used on any connection to that node and this
+  until the node is restarted (after which the query must be reprepared).
+
+4.2.5.5. Schema_change
+
+  The result to a schema altering query (creation/update/drop of a
+  keyspace/table/index). The body (after the kind [int]) is composed of 3
+  [string]:
+    <change><keyspace><table>
+  where:
+    - <change> describe the type of change that has occured. It can be one of
+      "CREATED", "UPDATED" or "DROPPED".
+    - <keyspace> is the name of the affected keyspace or the keyspace of the
+      affected table.
+    - <table> is the name of the affected table. <table> will be empty (i.e.
+      the empty string "") if the change was affecting a keyspace and not a
+      table.
+
+  Note that queries to create and drop an index are considered as change
+  updating the table the index is on.
+
+
+4.2.6. EVENT
+
+  And event pushed by the server. A client will only receive events for the
+  type it has REGISTER to. The body of an EVENT message will start by a
+  [string] representing the event type. The rest of the message depends on the
+  event type. The valid event types are:
+    - "TOPOLOGY_CHANGE": events related to change in the cluster topology.
+      Currently, events are sent when new nodes are added to the cluster, and
+      when nodes are removed. The body of the message (after the event type)
+      consists of a [string] and an [inet], corresponding respectively to the
+      type of change ("NEW_NODE" or "REMOVED_NODE") followed by the address of
+      the new/removed node.
+    - "STATUS_CHANGE": events related to change of node status. Currently,
+      up/down events are sent. The body of the message (after the event type)
+      consists of a [string] and an [inet], corresponding respectively to the
+      type of status change ("UP" or "DOWN") followed by the address of the
+      concerned node.
+    - "SCHEMA_CHANGE": events related to schema change. The body of the message
+      (after the event type) consists of 3 [string] corresponding respectively
+      to the type of schema change ("CREATED", "UPDATED" or "DROPPED"),
+      followed by the name of the affected keyspace and the name of the
+      affected table within that keyspace. For changes that affect a keyspace
+      directly, the table name will be empty (i.e. the empty string "").
+
+  All EVENT message have a streamId of -1 (Section 2.3).
+
+  Please note that "NEW_NODE" and "UP" events are sent based on internal Gossip
+  communication and as such may be sent a short delay before the binary
+  protocol server on the newly up node is fully started. Clients are thus
+  advise to wait a short time before trying to connect to the node (1 seconds
+  should be enough), otherwise they may experience a connection refusal at
+  first.
+
+
+5. Compression
+
+  Frame compression is supported by the protocol, but then only the frame body
+  is compressed (the frame header should never be compressed).
+
+  Before being used, client and server must agree on a compression algorithm to
+  use, which is done in the STARTUP message. As a consequence, a STARTUP message
+  must never be compressed.  However, once the STARTUP frame has been received
+  by the server can be compressed (including the response to the STARTUP
+  request). Frame do not have to be compressed however, even if compression has
+  been agreed upon (a server may only compress frame above a certain size at its
+  discretion). A frame body should be compressed if and only if the compressed
+  flag (see Section 2.2) is set.
+
+
+6. Collection types
+
+  This section describe the serialization format for the collection types:
+  list, map and set. This serialization format is both useful to decode values
+  returned in RESULT messages but also to encode values for EXECUTE ones.
+
+  The serialization formats are:
+     List: a [short] n indicating the size of the list, followed by n elements.
+           Each element is [short bytes] representing the serialized element
+           value.
+     Map: a [short] n indicating the size of the map, followed by n entries.
+          Each entry is composed of two [short bytes] representing the key and
+          the value of the entry map.
+     Set: a [short] n indicating the size of the set, followed by n elements.
+          Each element is [short bytes] representing the serialized element
+          value.
+
+
+7. Error codes
+
+  The supported error codes are described below:
+    0x0000    Server error: something unexpected happened. This indicates a
+              server-side bug.
+    0x000A    Protocol error: some client message triggered a protocol
+              violation (for instance a QUERY message is sent before a STARTUP
+              one has been sent)
+    0x0100    Bad credentials: CREDENTIALS request failed because Cassandra
+              did not accept the provided credentials.
+
+    0x1000    Unavailable exception. The rest of the ERROR message body will be
+                <cl><required><alive>
+              where:
+                <cl> is the [consistency] level of the query having triggered
+                     the exception.
+                <required> is an [int] representing the number of node that
+                           should be alive to respect <cl>
+                <alive> is an [int] representing the number of replica that
+                        were known to be alive when the request has been
+                        processed (since an unavailable exception has been
+                        triggered, there will be <alive> < <required>)
+    0x1001    Overloaded: the request cannot be processed because the
+              coordinator node is overloaded
+    0x1002    Is_bootstrapping: the request was a read request but the
+              coordinator node is bootstrapping
+    0x1003    Truncate_error: error during a truncation error.
+    0x1100    Write_timeout: Timeout exception during a write request. The rest
+              of the ERROR message body will be
+                <cl><received><blockfor><writeType>
+              where:
+                <cl> is the [consistency] level of the query having triggered
+                     the exception.
+                <received> is an [int] representing the number of nodes having
+                           acknowledged the request.
+                <blockfor> is the number of replica whose acknowledgement is
+                           required to achieve <cl>.
+                <writeType> is a [string] that describe the type of the write
+                            that timeouted. The value of that string can be one
+                            of:
+                             - "SIMPLE": the write was a non-batched
+                               non-counter write.
+                             - "BATCH": the write was a (logged) batch write.
+                               If this type is received, it means the batch log
+                               has been successfully written (otherwise a
+                               "BATCH_LOG" type would have been send instead).
+                             - "UNLOGGED_BATCH": the write was an unlogged
+                               batch. Not batch log write has been attempted.
+                             - "COUNTER": the write was a counter write
+                               (batched or not).
+                             - "BATCH_LOG": the timeout occured during the
+                               write to the batch log when a (logged) batch
+                               write was requested.
+    0x1200    Read_timeout: Timeout exception during a read request. The rest
+              of the ERROR message body will be
+                <cl><received><blockfor><data_present>
+              where:
+                <cl> is the [consistency] level of the query having triggered
+                     the exception.
+                <received> is an [int] representing the number of nodes having
+                           answered the request.
+                <blockfor> is the number of replica whose response is
+                           required to achieve <cl>. Please note that it is
+                           possible to have <received> >= <blockfor> if
+                           <data_present> is false. And also in the (unlikely)
+                           case were <cl> is achieved but the coordinator node
+                           timeout while waiting for read-repair
+                           acknowledgement.
+                <data_present> is a single byte. If its value is 0, it means
+                               the replica that was asked for data has not
+                               responded. Otherwise, the value is != 0.
+
+    0x2000    Syntax_error: The submitted query has a syntax error.
+    0x2100    Unauthorized: The logged user doesn't have the right to perform
+              the query.
+    0x2200    Invalid: The query is syntactically correct but invalid.
+    0x2300    Config_error: The query is invalid because of some configuration issue
+    0x2400    Already_exists: The query attempted to create a keyspace or a
+              table that was already existing. The rest of the ERROR message
+              body will be <ks><table> where:
+                <ks> is a [string] representing either the keyspace that
+                     already exists, or the keyspace in which the table that
+                     already exists is.
+                <table> is a [string] representing the name of the table that
+                        already exists. If the query was attempting to create a
+                        keyspace, <table> will be present but will be the empty
+                        string.
+    0x2500    Unprepared: Can be thrown while a prepared statement tries to be
+              executed if the provide prepared statement ID is not known by
+              this host. The rest of the ERROR message body will be [short
+              bytes] representing the unknown ID.

http://git-wip-us.apache.org/repos/asf/cassandra/blob/3fdd4647/doc/native_protocol_v2.spec
----------------------------------------------------------------------
diff --git a/doc/native_protocol_v2.spec b/doc/native_protocol_v2.spec
new file mode 100644
index 0000000..ad14929
--- /dev/null
+++ b/doc/native_protocol_v2.spec
@@ -0,0 +1,640 @@
+
+                             CQL BINARY PROTOCOL v2
+
+
+Table of Contents
+
+  1. Overview
+  2. Frame header
+    2.1. version
+    2.2. flags
+    2.3. stream
+    2.4. opcode
+    2.5. length
+  3. Notations
+  4. Messages
+    4.1. Requests
+      4.1.1. STARTUP
+      4.1.2. CREDENTIALS
+      4.1.3. OPTIONS
+      4.1.4. QUERY
+      4.1.5. PREPARE
+      4.1.6. EXECUTE
+      4.1.7. REGISTER
+    4.2. Responses
+      4.2.1. ERROR
+      4.2.2. READY
+      4.2.3. AUTHENTICATE
+      4.2.4. SUPPORTED
+      4.2.5. RESULT
+        4.2.5.1. Void
+        4.2.5.2. Rows
+        4.2.5.3. Set_keyspace
+        4.2.5.4. Prepared
+        4.2.5.5. Schema_change
+      4.2.6. EVENT
+  5. Compression
+  6. Collection types
+  7. Error codes
+  8. Changes from v1
+
+
+1. Overview
+
+  The CQL binary protocol is a frame based protocol. Frames are defined as:
+
+      0         8        16        24        32
+      +---------+---------+---------+---------+
+      | version |  flags  | stream  | opcode  |
+      +---------+---------+---------+---------+
+      |                length                 |
+      +---------+---------+---------+---------+
+      |                                       |
+      .            ...  body ...              .
+      .                                       .
+      .                                       .
+      +----------------------------------------
+
+  The protocol is big-endian (network byte order).
+
+  Each frame contains a fixed size header (8 bytes) followed by a variable size
+  body. The header is described in Section 2. The content of the body depends
+  on the header opcode value (the body can in particular be empty for some
+  opcode values). The list of allowed opcode is defined Section 2.3 and the
+  details of each corresponding message is described Section 4.
+
+  The protocol distinguishes 2 types of frames: requests and responses. Requests
+  are those frame sent by the clients to the server, response are the ones sent
+  by the server. Note however that while communication are initiated by the
+  client with the server responding to request, the protocol may likely add
+  server pushes in the future, so responses does not obligatory come right after
+  a client request.
+
+  Note to client implementors: clients library should always assume that the
+  body of a given frame may contain more data than what is described in this
+  document. It will however always be safe to ignore the remaining of the frame
+  body in such cases. The reason is that this may allow to sometimes extend the
+  protocol with optional features without needing to change the protocol
+  version.
+
+
+
+2. Frame header
+
+2.1. version
+
+  The version is a single byte that indicate both the direction of the message
+  (request or response) and the version of the protocol in use. The up-most bit
+  of version is used to define the direction of the message: 0 indicates a
+  request, 1 indicates a responses. This can be useful for protocol analyzers to
+  distinguish the nature of the packet from the direction which it is moving.
+  The rest of that byte is the protocol version (1 for the protocol defined in
+  this document). In other words, for this version of the protocol, version will
+  have one of:
+    0x02    Request frame for this protocol version
+    0x82    Response frame for this protocol version
+
+  This document describe the version 2 of the protocol. For the changes made since
+  version 1, see Section 8.
+
+
+2.2. flags
+
+  Flags applying to this frame. The flags have the following meaning (described
+  by the mask that allow to select them):
+    0x01: Compression flag. If set, the frame body is compressed. The actual
+          compression to use should have been set up beforehand through the
+          Startup message (which thus cannot be compressed; Section 4.1.1).
+    0x02: Tracing flag. For a request frame, this indicate the client requires
+          tracing of the request. Note that not all requests support tracing.
+          Currently, only QUERY, PREPARE and EXECUTE queries support tracing.
+          Other requests will simply ignore the tracing flag if set. If a
+          request support tracing and the tracing flag was set, the response to
+          this request will have the tracing flag set and contain tracing
+          information.
+          If a response frame has the tracing flag set, its body contains
+          a tracing ID. The tracing ID is a [uuid] and is the first thing in
+          the frame body. The rest of the body will then be the usual body
+          corresponding to the response opcode.
+
+  The rest of the flags is currently unused and ignored.
+
+2.3. stream
+
+  A frame has a stream id (one signed byte). When sending request messages, this
+  stream id must be set by the client to a positive byte (negative stream id
+  are reserved for streams initiated by the server; currently all EVENT messages
+  (section 4.2.6) have a streamId of -1). If a client sends a request message
+  with the stream id X, it is guaranteed that the stream id of the response to
+  that message will be X.
+
+  This allow to deal with the asynchronous nature of the protocol. If a client
+  sends multiple messages simultaneously (without waiting for responses), there
+  is no guarantee on the order of the responses. For instance, if the client
+  writes REQ_1, REQ_2, REQ_3 on the wire (in that order), the server might
+  respond to REQ_3 (or REQ_2) first. Assigning different stream id to these 3
+  requests allows the client to distinguish to which request an received answer
+  respond to. As there can only be 128 different simultaneous stream, it is up
+  to the client to reuse stream id.
+
+  Note that clients are free to use the protocol synchronously (i.e. wait for
+  the response to REQ_N before sending REQ_N+1). In that case, the stream id
+  can be safely set to 0. Clients should also feel free to use only a subset of
+  the 128 maximum possible stream ids if it is simpler for those
+  implementation.
+
+2.4. opcode
+
+  An integer byte that distinguish the actual message:
+    0x00    ERROR
+    0x01    STARTUP
+    0x02    READY
+    0x03    AUTHENTICATE
+    0x04    CREDENTIALS
+    0x05    OPTIONS
+    0x06    SUPPORTED
+    0x07    QUERY
+    0x08    RESULT
+    0x09    PREPARE
+    0x0A    EXECUTE
+    0x0B    REGISTER
+    0x0C    EVENT
+
+  Messages are described in Section 4.
+
+
+2.5. length
+
+  A 4 byte integer representing the length of the body of the frame (note:
+  currently a frame is limited to 256MB in length).
+
+
+3. Notations
+
+  To describe the layout of the frame body for the messages in Section 4, we
+  define the following:
+
+    [int]          A 4 bytes integer
+    [short]        A 2 bytes unsigned integer
+    [string]       A [short] n, followed by n bytes representing an UTF-8
+                   string.
+    [long string]  An [int] n, followed by n bytes representing an UTF-8 string.
+    [uuid]         A 16 bytes long uuid.
+    [string list]  A [short] n, followed by n [string].
+    [bytes]        A [int] n, followed by n bytes if n >= 0. If n < 0,
+                   no byte should follow and the value represented is `null`.
+    [short bytes]  A [short] n, followed by n bytes if n >= 0.
+
+    [option]       A pair of <id><value> where <id> is a [short] representing
+                   the option id and <value> depends on that option (and can be
+                   of size 0). The supported id (and the corresponding <value>)
+                   will be described when this is used.
+    [option list]  A [short] n, followed by n [option].
+    [inet]         An address (ip and port) to a node. It consists of one
+                   [byte] n, that represents the address size, followed by n
+                   [byte] representing the IP address (in practice n can only be
+                   either 4 (IPv4) or 16 (IPv6)), following by one [int]
+                   representing the port.
+    [consistency]  A consistency level specification. This is a [short]
+                   representing a consistency level with the following
+                   correspondance:
+                     0x0000    ANY
+                     0x0001    ONE
+                     0x0002    TWO
+                     0x0003    THREE
+                     0x0004    QUORUM
+                     0x0005    ALL
+                     0x0006    LOCAL_QUORUM
+                     0x0007    EACH_QUORUM
+
+    [string map]      A [short] n, followed by n pair <k><v> where <k> and <v>
+                      are [string].
+    [string multimap] A [short] n, followed by n pair <k><v> where <k> is a
+                      [string] and <v> is a [string list].
+
+
+4. Messages
+
+4.1. Requests
+
+  Note that outside of their normal responses (described below), all requests
+  can get an ERROR message (Section 4.2.1) as response.
+
+4.1.1. STARTUP
+
+  Initialize the connection. The server will respond by either a READY message
+  (in which case the connection is ready for queries) or an AUTHENTICATE message
+  (in which case credentials will need to be provided using CREDENTIALS).
+
+  This must be the first message of the connection, except for OPTIONS that can
+  be sent before to find out the options supported by the server. Once the
+  connection has been initialized, a client should not send any more STARTUP
+  message.
+
+  The body is a [string map] of options. Possible options are:
+    - "CQL_VERSION": the version of CQL to use. This option is mandatory and
+      currenty, the only version supported is "3.0.0". Note that this is
+      different from the protocol version.
+    - "COMPRESSION": the compression algorithm to use for frames (See section 5).
+      This is optional, if not specified no compression will be used.
+
+
+4.1.2. CREDENTIALS
+
+  Provides credentials information for the purpose of identification. This
+  message comes as a response to an AUTHENTICATE message from the server, but
+  can be use later in the communication to change the authentication
+  information.
+
+  The body is a list of key/value informations. It is a [short] n, followed by n
+  pair of [string]. These key/value pairs are passed as is to the Cassandra
+  IAuthenticator and thus the detail of which informations is needed depends on
+  that authenticator.
+
+  The response to a CREDENTIALS is a READY message (or an ERROR message).
+
+
+4.1.3. OPTIONS
+
+  Asks the server to return what STARTUP options are supported. The body of an
+  OPTIONS message should be empty and the server will respond with a SUPPORTED
+  message.
+
+
+4.1.4. QUERY
+
+  Performs a CQL query. The body of the message consists of a CQL query as a [long
+  string] followed by the [consistency] for the operation.
+
+  Note that the consistency is ignored by some queries (USE, CREATE, ALTER,
+  TRUNCATE, ...).
+
+  The server will respond to a QUERY message with a RESULT message, the content
+  of which depends on the query.
+
+
+4.1.5. PREPARE
+
+  Prepare a query for later execution (through EXECUTE). The body consists of
+  the CQL query to prepare as a [long string].
+
+  The server will respond with a RESULT message with a `prepared` kind (0x00003,
+  see Section 4.2.5).
+
+
+4.1.6. EXECUTE
+
+  Executes a prepared query. The body of the message must be:
+    <id><n><value_1>....<value_n><consistency>
+  where:
+    - <id> is the prepared query ID. It's the [short bytes] returned as a
+      response to a PREPARE message.
+    - <n> is a [short] indicating the number of following values.
+    - <value_1>...<value_n> are the [bytes] to use for bound variables in the
+      prepared query.
+    - <consistency> is the [consistency] level for the operation.
+
+  Note that the consistency is ignored by some (prepared) queries (USE, CREATE,
+  ALTER, TRUNCATE, ...).
+
+  The response from the server will be a RESULT message.
+
+
+4.1.7. REGISTER
+
+  Register this connection to receive some type of events. The body of the
+  message is a [string list] representing the event types to register to. See
+  section 4.2.6 for the list of valid event types.
+
+  The response to a REGISTER message will be a READY message.
+
+  Please note that if a client driver maintains multiple connections to a
+  Cassandra node and/or connections to multiple nodes, it is advised to
+  dedicate a handful of connections to receive events, but to *not* register
+  for events on all connections, as this would only result in receiving
+  multiple times the same event messages, wasting bandwidth.
+
+
+4.2. Responses
+
+  This section describes the content of the frame body for the different
+  responses. Please note that to make room for future evolution, clients should
+  support extra informations (that they should simply discard) to the one
+  described in this document at the end of the frame body.
+
+4.2.1. ERROR
+
+  Indicates an error processing a request. The body of the message will be an
+  error code ([int]) followed by a [string] error message. Then, depending on
+  the exception, more content may follow. The error codes are defined in
+  Section 7, along with their additional content if any.
+
+
+4.2.2. READY
+
+  Indicates that the server is ready to process queries. This message will be
+  sent by the server either after a STARTUP message if no authentication is
+  required, or after a successful CREDENTIALS message.
+
+  The body of a READY message is empty.
+
+
+4.2.3. AUTHENTICATE
+
+  Indicates that the server require authentication. This will be sent following
+  a STARTUP message and must be answered by a CREDENTIALS message from the
+  client to provide authentication informations.
+
+  The body consists of a single [string] indicating the full class name of the
+  IAuthenticator in use.
+
+
+4.2.4. SUPPORTED
+
+  Indicates which startup options are supported by the server. This message
+  comes as a response to an OPTIONS message.
+
+  The body of a SUPPORTED message is a [string multimap]. This multimap gives
+  for each of the supported STARTUP options, the list of supported values.
+
+
+4.2.5. RESULT
+
+  The result to a query (QUERY, PREPARE or EXECUTE messages).
+
+  The first element of the body of a RESULT message is an [int] representing the
+  `kind` of result. The rest of the body depends on the kind. The kind can be
+  one of:
+    0x0001    Void: for results carrying no information.
+    0x0002    Rows: for results to select queries, returning a set of rows.
+    0x0003    Set_keyspace: the result to a `use` query.
+    0x0004    Prepared: result to a PREPARE message.
+    0x0005    Schema_change: the result to a schema altering query.
+
+  The body for each kind (after the [int] kind) is defined below.
+
+
+4.2.5.1. Void
+
+  The rest of the body for a Void result is empty. It indicates that a query was
+  successful without providing more information.
+
+
+4.2.5.2. Rows
+
+  Indicates a set of rows. The rest of body of a Rows result is:
+    <metadata><rows_count><rows_content>
+  where:
+    - <metadata> is composed of:
+        <flags><columns_count><global_table_spec>?<col_spec_1>...<col_spec_n>
+      where:
+        - <flags> is an [int]. The bits of <flags> provides information on the
+          formatting of the remaining informations. A flag is set if the bit
+          corresponding to its `mask` is set. Supported flags are, given there
+          mask:
+            0x0001    Global_tables_spec: if set, only one table spec (keyspace
+                      and table name) is provided as <global_table_spec>. If not
+                      set, <global_table_spec> is not present.
+        - <columns_count> is an [int] representing the number of columns selected
+          by the query this result is of. It defines the number of <col_spec_i>
+          elements in and the number of element for each row in <rows_content>.
+        - <global_table_spec> is present if the Global_tables_spec is set in
+          <flags>. If present, it is composed of two [string] representing the
+          (unique) keyspace name and table name the columns return are of.
+        - <col_spec_i> specifies the columns returned in the query. There is
+          <column_count> such column specification that are composed of:
+            (<ksname><tablename>)?<column_name><type>
+          The initial <ksname> and <tablename> are two [string] are only present
+          if the Global_tables_spec flag is not set. The <column_name> is a
+          [string] and <type> is an [option] that correspond to the column name
+          and type. The option for <type> is either a native type (see below),
+          in which case the option has no value, or a 'custom' type, in which
+          case the value is a [string] representing the full qualified class
+          name of the type represented. Valid option ids are:
+            0x0000    Custom: the value is a [string], see above.
+            0x0001    Ascii
+            0x0002    Bigint
+            0x0003    Blob
+            0x0004    Boolean
+            0x0005    Counter
+            0x0006    Decimal
+            0x0007    Double
+            0x0008    Float
+            0x0009    Int
+            0x000A    Text
+            0x000B    Timestamp
+            0x000C    Uuid
+            0x000D    Varchar
+            0x000E    Varint
+            0x000F    Timeuuid
+            0x0010    Inet
+            0x0020    List: the value is an [option], representing the type
+                            of the elements of the list.
+            0x0021    Map: the value is two [option], representing the types of the
+                           keys and values of the map
+            0x0022    Set: the value is an [option], representing the type
+                            of the elements of the set
+    - <rows_count> is an [int] representing the number of rows present in this
+      result. Those rows are serialized in the <rows_content> part.
+    - <rows_content> is composed of <row_1>...<row_m> where m is <rows_count>.
+      Each <row_i> is composed of <value_1>...<value_n> where n is
+      <columns_count> and where <value_j> is a [bytes] representing the value
+      returned for the jth column of the ith row. In other words, <rows_content>
+      is composed of (<rows_count> * <columns_count>) [bytes].
+
+
+4.2.5.3. Set_keyspace
+
+  The result to a `use` query. The body (after the kind [int]) is a single
+  [string] indicating the name of the keyspace that has been set.
+
+
+4.2.5.4. Prepared
+
+  The result to a PREPARE message. The rest of the body of a Prepared result is:
+    <id><metadata>
+  where:
+    - <id> is [short bytes] representing the prepared query ID.
+    - <metadata> is defined exactly as for a Rows RESULT (See section 4.2.5.2).
+
+  Note that prepared query ID return is global to the node on which the query
+  has been prepared. It can be used on any connection to that node and this
+  until the node is restarted (after which the query must be reprepared).
+
+4.2.5.5. Schema_change
+
+  The result to a schema altering query (creation/update/drop of a
+  keyspace/table/index). The body (after the kind [int]) is composed of 3
+  [string]:
+    <change><keyspace><table>
+  where:
+    - <change> describe the type of change that has occured. It can be one of
+      "CREATED", "UPDATED" or "DROPPED".
+    - <keyspace> is the name of the affected keyspace or the keyspace of the
+      affected table.
+    - <table> is the name of the affected table. <table> will be empty (i.e.
+      the empty string "") if the change was affecting a keyspace and not a
+      table.
+
+  Note that queries to create and drop an index are considered as change
+  updating the table the index is on.
+
+
+4.2.6. EVENT
+
+  And event pushed by the server. A client will only receive events for the
+  type it has REGISTER to. The body of an EVENT message will start by a
+  [string] representing the event type. The rest of the message depends on the
+  event type. The valid event types are:
+    - "TOPOLOGY_CHANGE": events related to change in the cluster topology.
+      Currently, events are sent when new nodes are added to the cluster, and
+      when nodes are removed. The body of the message (after the event type)
+      consists of a [string] and an [inet], corresponding respectively to the
+      type of change ("NEW_NODE" or "REMOVED_NODE") followed by the address of
+      the new/removed node.
+    - "STATUS_CHANGE": events related to change of node status. Currently,
+      up/down events are sent. The body of the message (after the event type)
+      consists of a [string] and an [inet], corresponding respectively to the
+      type of status change ("UP" or "DOWN") followed by the address of the
+      concerned node.
+    - "SCHEMA_CHANGE": events related to schema change. The body of the message
+      (after the event type) consists of 3 [string] corresponding respectively
+      to the type of schema change ("CREATED", "UPDATED" or "DROPPED"),
+      followed by the name of the affected keyspace and the name of the
+      affected table within that keyspace. For changes that affect a keyspace
+      directly, the table name will be empty (i.e. the empty string "").
+
+  All EVENT message have a streamId of -1 (Section 2.3).
+
+  Please note that "NEW_NODE" and "UP" events are sent based on internal Gossip
+  communication and as such may be sent a short delay before the binary
+  protocol server on the newly up node is fully started. Clients are thus
+  advise to wait a short time before trying to connect to the node (1 seconds
+  should be enough), otherwise they may experience a connection refusal at
+  first.
+
+
+5. Compression
+
+  Frame compression is supported by the protocol, but then only the frame body
+  is compressed (the frame header should never be compressed).
+
+  Before being used, client and server must agree on a compression algorithm to
+  use, which is done in the STARTUP message. As a consequence, a STARTUP message
+  must never be compressed.  However, once the STARTUP frame has been received
+  by the server can be compressed (including the response to the STARTUP
+  request). Frame do not have to be compressed however, even if compression has
+  been agreed upon (a server may only compress frame above a certain size at its
+  discretion). A frame body should be compressed if and only if the compressed
+  flag (see Section 2.2) is set.
+
+
+6. Collection types
+
+  This section describe the serialization format for the collection types:
+  list, map and set. This serialization format is both useful to decode values
+  returned in RESULT messages but also to encode values for EXECUTE ones.
+
+  The serialization formats are:
+     List: a [short] n indicating the size of the list, followed by n elements.
+           Each element is [short bytes] representing the serialized element
+           value.
+     Map: a [short] n indicating the size of the map, followed by n entries.
+          Each entry is composed of two [short bytes] representing the key and
+          the value of the entry map.
+     Set: a [short] n indicating the size of the set, followed by n elements.
+          Each element is [short bytes] representing the serialized element
+          value.
+
+
+7. Error codes
+
+  The supported error codes are described below:
+    0x0000    Server error: something unexpected happened. This indicates a
+              server-side bug.
+    0x000A    Protocol error: some client message triggered a protocol
+              violation (for instance a QUERY message is sent before a STARTUP
+              one has been sent)
+    0x0100    Bad credentials: CREDENTIALS request failed because Cassandra
+              did not accept the provided credentials.
+
+    0x1000    Unavailable exception. The rest of the ERROR message body will be
+                <cl><required><alive>
+              where:
+                <cl> is the [consistency] level of the query having triggered
+                     the exception.
+                <required> is an [int] representing the number of node that
+                           should be alive to respect <cl>
+                <alive> is an [int] representing the number of replica that
+                        were known to be alive when the request has been
+                        processed (since an unavailable exception has been
+                        triggered, there will be <alive> < <required>)
+    0x1001    Overloaded: the request cannot be processed because the
+              coordinator node is overloaded
+    0x1002    Is_bootstrapping: the request was a read request but the
+              coordinator node is bootstrapping
+    0x1003    Truncate_error: error during a truncation error.
+    0x1100    Write_timeout: Timeout exception during a write request. The rest
+              of the ERROR message body will be
+                <cl><received><blockfor><writeType>
+              where:
+                <cl> is the [consistency] level of the query having triggered
+                     the exception.
+                <received> is an [int] representing the number of nodes having
+                           acknowledged the request.
+                <blockfor> is the number of replica whose acknowledgement is
+                           required to achieve <cl>.
+                <writeType> is a [string] that describe the type of the write
+                            that timeouted. The value of that string can be one
+                            of:
+                             - "SIMPLE": the write was a non-batched
+                               non-counter write.
+                             - "BATCH": the write was a (logged) batch write.
+                               If this type is received, it means the batch log
+                               has been successfully written (otherwise a
+                               "BATCH_LOG" type would have been send instead).
+                             - "UNLOGGED_BATCH": the write was an unlogged
+                               batch. Not batch log write has been attempted.
+                             - "COUNTER": the write was a counter write
+                               (batched or not).
+                             - "BATCH_LOG": the timeout occured during the
+                               write to the batch log when a (logged) batch
+                               write was requested.
+    0x1200    Read_timeout: Timeout exception during a read request. The rest
+              of the ERROR message body will be
+                <cl><received><blockfor><data_present>
+              where:
+                <cl> is the [consistency] level of the query having triggered
+                     the exception.
+                <received> is an [int] representing the number of nodes having
+                           answered the request.
+                <blockfor> is the number of replica whose response is
+                           required to achieve <cl>. Please note that it is
+                           possible to have <received> >= <blockfor> if
+                           <data_present> is false. And also in the (unlikely)
+                           case were <cl> is achieved but the coordinator node
+                           timeout while waiting for read-repair
+                           acknowledgement.
+                <data_present> is a single byte. If its value is 0, it means
+                               the replica that was asked for data has not
+                               responded. Otherwise, the value is != 0.
+
+    0x2000    Syntax_error: The submitted query has a syntax error.
+    0x2100    Unauthorized: The logged user doesn't have the right to perform
+              the query.
+    0x2200    Invalid: The query is syntactically correct but invalid.
+    0x2300    Config_error: The query is invalid because of some configuration issue
+    0x2400    Already_exists: The query attempted to create a keyspace or a
+              table that was already existing. The rest of the ERROR message
+              body will be <ks><table> where:
+                <ks> is a [string] representing either the keyspace that
+                     already exists, or the keyspace in which the table that
+                     already exists is.
+                <table> is a [string] representing the name of the table that
+                        already exists. If the query was attempting to create a
+                        keyspace, <table> will be present but will be the empty
+                        string.
+    0x2500    Unprepared: Can be thrown while a prepared statement tries to be
+              executed if the provide prepared statement ID is not known by
+              this host. The rest of the ERROR message body will be [short
+              bytes] representing the unknown ID.