You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ondřej Černoš (JIRA)" <ji...@apache.org> on 2013/12/03 17:34:36 UTC

[jira] [Commented] (CASSANDRA-6428) Use 4 bytes to encode collection size in next native protocol version

    [ https://issues.apache.org/jira/browse/CASSANDRA-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837864#comment-13837864 ] 

Ondřej Černoš commented on CASSANDRA-6428:
------------------------------------------

Thanks a lot Sylvain for your time and answers. It is really appreciated.

I think the whole thing boils down to two issues:

* the size of collection in native protocol, which is workaroundable now by just ignoring the field in the protocol (the data are fetched from the storage now, only the value of the field is incorrect if the size is bigger than 64k)
* the usage of collections for mixed cql3 rows (mixing static and dynamic content, i.e. mixing narrow-row and wide-row in underlying storage terminology).

We shall probably need to split the above described table (having 20 or so static columns and a hundreds of thousands elements long set) into two tables, one  for the static column set and other for the wide row. So instead of using:

{noformat}
CREATE TABLE test (
  id text PRIMARY KEY,
  val1 text,
  val2 int,
  val3 timestamp,
  valN text,
  some_set set<text>
)
{noformat}

we will have to have two tables:

{noformat}
CREATE TABLE test_narrow (
  id text PRIMARY KEY,
  val1 text,
  val2 int,
  val3 timestamp,
  valN text
)

CREATE TABLE test_wide (
  id text,
  val text,
  PRIMARY KEY (id, val)
)
{noformat}

The reason is not a modelling one (the first approach is much more comfortable and more compliant with the _denormalize everything_ approach), but performance one. The problem is cassandra always performs range query over all the columns of the underlying row if the table is not created with compact storage. So a query like {{select val1, val2 from test where id='some_key'}} performs poorly if the {{set}} in the table is big (~400 ms primary key lookup on a table having roughly 150k records on a row with a set with roughly 150k records on a 2 CPU machine with enough memory and DB all mapped into RAM - no disk ops involved), even though we don't fetch the set in the select.

The question is: is this behaviour by design and is this the reason behind the recommendation not to use big collections?

I know and agree this is not the best place for modelling questions, but again - maybe this is useful for you as the designer of the feature to see how it is perceived by users and what issues we run into (by the way, we are new cassandra users and we started on cql3 from scratch - we are not thrift old-timers). I may take this whole topic to user list if you wish.

> Use 4 bytes to encode collection size in next native protocol version
> ---------------------------------------------------------------------
>
>                 Key: CASSANDRA-6428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6428
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jan Chochol
>
> We are trying to use Cassandra CQL3 collections (sets and maps) for denormalizing data.
> Problem is, when size of these collections go above some limit. We found that current limitation is 64k - 1 (65535) items in collection.
> We found that there is inconsistency in CQL binary protocol (all current available versions). 
> In protocol (for set) there are these fields:
> {noformat}
> [value size: int] [items count: short] [items] ...
> {noformat}
> One example in our case (collection with 65536 elements):
> {noformat}
> 00 21 ff ee 00 00 00 20 30 30 30 30 35 63 38 69 65 33 67 37 73 61 ...
> {noformat}
> So decode {{value size}} is 1245166 bytes and {{items count}} is 0.
> This is wrong - you can not have collection with 0 items occupying more than 1MB.
> I understand that in unsigned short you can not have more than 65535, but I do not understand why there is such limitation in protocol, when all data are currently sent.
> In this case we have several possibilities:
> * ignore {{items count}} field and read all bytes specified in {{value size}}
> ** there is problem that we can not be sure, that this behaviour will be kept over for future versions of Cassandra, as it is quite strange
> * refactor our code to use only small collections (this seems quite odd, as Cassandra has no problems with wide rows)
> * do not use collections, and fall-back to net wide rows
> * wait for change in protocol for removing unnecessary limitation



--
This message was sent by Atlassian JIRA
(v6.1#6144)