You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Michael Alan Dorman <md...@ironicdesign.com> on 2013/02/18 21:16:10 UTC

Best place to discuss CQL Binary Protcol spec?

Hey, all,

I've been working on a greenfield Perl client for the CQL Binary
Protocol.  Since this is a client-in-progress, and my question
is actually about the protocol, I guessed dev@ seemed like the better
list, but please let me know if I should relocate to client-dev@.

As always happens when working from a spec, I have ended up with a quick
clarification request, a more involved question, and would like to know
how best to contribute to the document.

* 4.1.2. CREDENTIALS

My quick clarification is from this bit of text:

  The body is a list of key/value informations. It is a [short] n,
  followed by n pair of [string].  These key/value pairs [...]

Is this just a string map, and the text just isn't using consistent
terminology?

* 4.2.5.2. Rows

My more involved question is about this text describing the column
contents:

  - <rows_content> is composed of <row_1>...<row_m> where m is <rows_count>.
    Each <row_i> is composed of <value_1>...<value_n> where n is
    <columns_count> and where <value_j> is a [bytes] representing the value
    returned for the jth column of the ith row. In other words, <rows_content>
    is composed of (<rows_count> * <columns_count>) [bytes].

I read this and thought, "Oh, sure I'll need to figure out the width of
the java types for the different columns, tedious but easily doable",
and then noticed some of the options are things like Blob or Varchar,
both which I would assume to be variable width.  So how should one
determine how many bytes to read for different types?

I'm guessing the actual information about how much space the different
values take up is located somewhere else.  At the very least it seems
like that should be mentioned, though even more ideal, it seems to me
all that information should be called out in the spec itself.

* Updating the docs

Which kind of brings me to my final question: what would be the best way
to contribute cleanups, etc. for the document, and how far could I take
it?

At the very least, there are a lot of typos I'd be happy to fix.  I also
think the text could be tightened up in various ways.  And I think some
things could be moved around to make the spec more accessible to
implementors.

But most importantly, I think it needs to be in some format that can
produce a hyperlinked document, because right now having to scroll back
and forth through everything is tedious indeed.  But it seems improbable
to me that this is the native format for the document---did someone
really do that TOC by hand?  So is there a source doc where it would be
best to actually work on edits?  And if not, could I contribute by
converting it to textile (which seems already in use in the tree) or
perhaps markdown?

Mike.

Re: Best place to discuss CQL Binary Protcol spec?

Posted by paul cannon <pc...@gmail.com>.

On Mon, Feb 18, 2013 at 6:48 PM, Michael Alan Dorman <
mdorman@ironicdesign.com> wrote:

> paul cannon <pc...@gmail.com> writes:
>
> As the doc says, each <value_j> is a [bytes], which means it's represented
> > on the wire as an [int] x followed by x bytes.
>
> Thank you for pointing out what I had succeeded in reading repeatedly
> without actually processing. ;)
>
> At the same time, that seems to gloss over the structure of the
> content---it's not all encoded as string values, or is it?
>

No, the values are serialized according to whatever data type definitions
they have. The data types and serializations are technically details of
Cassandra usage in general (not specific to the native protocol), and the
types aren't limited to the ones which are assigned type IDs in the native
protocol, so it is arguably appropriate to leave out type serialization
details in the native protocol document (I could see it either way).

If you do need details, the builtin cassandra types and serialization
formats are defined in the various org.apache.cassandra.db.marshal.*Type
classes. Or read deserialization code from the other C* libraries.

p

Re: Best place to discuss CQL Binary Protcol spec?

Posted by Michael Alan Dorman <md...@ironicdesign.com>.

paul cannon <pc...@gmail.com> writes:
> It has the same structure as a string map, but might not necessarily *be* a
> string map. I would guess that this phrasing is used because it may be
> possible to have multiple identical "keys" in this structure, which would
> not make sense in a [string map]. (Although I don't think it's explicitly
> stated, it seems safe to imply that [string map] is intended to be a plain
> lookup table, not a set of arbitrary pairs.)

OK, so it is distinct.  Thanks for the clarification.

> As the doc says, each <value_j> is a [bytes], which means it's represented
> on the wire as an [int] x followed by x bytes.

Thank you for pointing out what I had succeeded in reading repeatedly
without actually processing. ;)

At the same time, that seems to gloss over the structure of the
content---it's not all encoded as string values, or is it?

Mike.

Re: Best place to discuss CQL Binary Protcol spec?

Posted by paul cannon <pc...@gmail.com>.

I can't usefully speak to your other questions, but the answers to the
technical questions are below.

On Mon, Feb 18, 2013 at 1:16 PM, Michael Alan Dorman <
mdorman@ironicdesign.com> wrote:

> * 4.1.2. CREDENTIALS
>
> My quick clarification is from this bit of text:
>
>   The body is a list of key/value informations. It is a [short] n,
>   followed by n pair of [string].  These key/value pairs [...]
>
> Is this just a string map, and the text just isn't using consistent
> terminology?
>

It has the same structure as a string map, but might not necessarily *be* a
string map. I would guess that this phrasing is used because it may be
possible to have multiple identical "keys" in this structure, which would
not make sense in a [string map]. (Although I don't think it's explicitly
stated, it seems safe to imply that [string map] is intended to be a plain
lookup table, not a set of arbitrary pairs.)

* 4.2.5.2. Rows
>
> My more involved question is about this text describing the column
> contents:
>
>   - <rows_content> is composed of <row_1>...<row_m> where m is
> <rows_count>.
>     Each <row_i> is composed of <value_1>...<value_n> where n is
>     <columns_count> and where <value_j> is a [bytes] representing the value
>     returned for the jth column of the ith row. In other words,
> <rows_content>
>     is composed of (<rows_count> * <columns_count>) [bytes].
>
> I read this and thought, "Oh, sure I'll need to figure out the width of
> the java types for the different columns, tedious but easily doable",
> and then noticed some of the options are things like Blob or Varchar,
> both which I would assume to be variable width.  So how should one
> determine how many bytes to read for different types?
>

As the doc says, each <value_j> is a [bytes], which means it's represented
on the wire as an [int] x followed by x bytes.

p