You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Les Hazlewood <lh...@apache.org> on 2013/08/29 21:04:33 UTC

CQL3 wide row and slow inserts - is there a single insert alternative?

Hi all,

We're using a Cassandra table to store search results in a
table/column family that that look like this:

+--------+---------+---------+---------+----
|        | 0       | 1       | 2       | ...
+--------+---------+---------+---------+----
| row_id | text... | text... | text... | ...

The column name is the index # (an integer) of the location in the
overall result set.  The value is the result at that particular index.
 This is great because pagination becomes a simple slice query on the
column name.

Large result sets are split into multiple rows - we're limiting row
size on disk to be around 6 or 7 MB.  For our particular result
entries, this means we can get around 50,000 columns in a single row.

When we create the rows, we have the entire data available in the
application at the time the row insert is necessary.

Using CQL3, an initial implementation had one INSERT statement per
column.  This was killing performance (not to mention the # of
tombstones it created).

Here's the CQL3 table definition:

create table query_results (
    row_id text,
    shard_num int,
    list_index int,
    result text,
    primary key (row_id, shard_num), list_index))
with compact storage

(the row key is row_id + shard_num.  The 'cluster column' is list_index).

I don't want to execute 50,000 INSERT statements for a single row.  We
have all of the data up front - I want to execute a single INSERT.

Is this possible?

We're using the Datastax Java Driver.

Thanks for any help!

Les

Re: CQL3 wide row and slow inserts - is there a single insert alternative?

Posted by Les Hazlewood <lh...@apache.org>.
Well, it appears that this just isn't possible.  I created CASSANDRA-5959
as a result.  (Backstory + performance testing results are described in the
issue):

https://issues.apache.org/jira/browse/CASSANDRA-5959

--
Les Hazlewood | @lhazlewood
CTO, Stormpath | http://stormpath.com | @goStormpath | 888.391.5282

On Thu, Aug 29, 2013 at 12:04 PM, Les Hazlewood <lh...@apache.org>wrote:

> Hi all,
>
> We're using a Cassandra table to store search results in a
> table/column family that that look like this:
>
> +--------+---------+---------+---------+----
> |        | 0       | 1       | 2       | ...
> +--------+---------+---------+---------+----
> | row_id | text... | text... | text... | ...
>
> The column name is the index # (an integer) of the location in the
> overall result set.  The value is the result at that particular index.
>  This is great because pagination becomes a simple slice query on the
> column name.
>
> Large result sets are split into multiple rows - we're limiting row
> size on disk to be around 6 or 7 MB.  For our particular result
> entries, this means we can get around 50,000 columns in a single row.
>
> When we create the rows, we have the entire data available in the
> application at the time the row insert is necessary.
>
> Using CQL3, an initial implementation had one INSERT statement per
> column.  This was killing performance (not to mention the # of
> tombstones it created).
>
> Here's the CQL3 table definition:
>
> create table query_results (
>     row_id text,
>     shard_num int,
>     list_index int,
>     result text,
>     primary key (row_id, shard_num), list_index))
> with compact storage
>
> (the row key is row_id + shard_num.  The 'cluster column' is list_index).
>
> I don't want to execute 50,000 INSERT statements for a single row.  We
> have all of the data up front - I want to execute a single INSERT.
>
> Is this possible?
>
> We're using the Datastax Java Driver.
>
> Thanks for any help!
>
> Les
>