You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2013/12/02 12:12:38 UTC
[jira] [Commented] (CASSANDRA-6428) Inconsistency in CQL native protocol

    [ https://issues.apache.org/jira/browse/CASSANDRA-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13836425#comment-13836425 ] 

Sylvain Lebresne commented on CASSANDRA-6428:
---------------------------------------------


The "inconsistency" part will be somewhat "solved" by CASSANDRA-5428 (and please see the comments there that probably answer some of this).

But the real answer to that ticket is that you should, indeed, refactor you code to not use large collections. Collections are not meant to denormalize large amount of data, they are not meant to be the CQL equivalent of thrift "wide rows". The equivalent of a thrift CF with wide rows is a CQL table with one or more clustering columns.

In particular, collections are always read in their entirety, making them poorly suited to be large. A CQL table with clustering columns on the other side will allow to query only parts of the items, will be paged automatically if you use the v2 of the binary protocol etc...

Let me add that it would be absolutely fair to say that the proper use of collections is currently poorly documented. We'll start fixing that as part of CASSANDRA-5428 and will generally try to communicate on that better.


> Inconsistency in CQL native protocol
> ------------------------------------
>
>                 Key: CASSANDRA-6428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6428
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jan Chochol
>
> We are trying to use Cassandra CQL3 collections (sets and maps) for denormalizing data.
> Problem is, when size of these collections go above some limit. We found that current limitation is 64k - 1 (65535) items in collection.
> We found that there is inconsistency in CQL binary protocol (all current available versions). 
> In protocol (for set) there are these fields:
> {noformat}
> [value size: int] [items count: short] [items] ...
> {noformat}
> One example in our case (collection with 65536 elements):
> {noformat}
> 00 21 ff ee 00 00 00 20 30 30 30 30 35 63 38 69 65 33 67 37 73 61 ...
> {noformat}
> So decode {{value size}} is 1245166 bytes and {{items count}} is 0.
> This is wrong - you can not have collection with 0 items occupying more than 1MB.
> I understand that in unsigned short you can not have more than 65535, but I do not understand why there is such limitation in protocol, when all data are currently sent.
> In this case we have several possibilities:
> * ignore {{items count}} field and read all bytes specified in {{value size}}
> ** there is problem that we can not be sure, that this behaviour will be kept over for future versions of Cassandra, as it is quite strange
> * refactor our code to use only small collections (this seems quite odd, as Cassandra has no problems with wide rows)
> * do not use collections, and fall-back to net wide rows
> * wait for change in protocol for removing unnecessary limitation



--
This message was sent by Atlassian JIRA
(v6.1#6144)