You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Les Hazlewood (JIRA)" <ji...@apache.org> on 2013/08/30 20:42:52 UTC
[jira] [Updated] (CASSANDRA-5959) CQL3 support for multi-column
insert in a single operation (Batch Insert / Batch Mutate)
[ https://issues.apache.org/jira/browse/CASSANDRA-5959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Les Hazlewood updated CASSANDRA-5959:
-------------------------------------
Description:
h3. Impetus for this Request
(from the original [question on StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):
I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory):
{code}
+---------+------+------+------+------+-------+
| | 0 | 1 | 2 | ... | 49999 |
| row_id +------+------+------+------+-------+
| | text | text | text | ... | text |
+---------+------+------+------|------+-------+
{code}
The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.
CQL3 table definition:
{code}
create table results (
row_id text,
index int,
value text,
primary key (row_id, index)
)
with compact storage;
{code}
As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.
The only thing I can seem to find is to do execute the following 50,000 times:
{code}
INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
{code}
where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text value to store at location {{i}}.
With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute.
Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)] on the same machine took 7.5 minutes. I thought batches were supposed to be _faster_ than individual inserts?
We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html]. This took _235 milliseconds_.
h3. Feature Request
As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients.
Example suggested syntax (based on the above example table/column family):
{code}
insert into results (row_id, (index,value)) values
((0,text0), (1,text1), (2,text2), ..., (N,textN));
{code}
Each value in the {{values}} clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate.
Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver.
was:
h3. Impetus for this Request
(from the original [question on StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):
I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory):
{code}
+---------+------+------+------+------+-------+
| | 0 | 1 | 2 | ... | 49999 |
| row_id +------+------+------+------+-------+
| | text | text | text | ... | text |
+---------+------+------+------|------+-------+
{code}
The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.
CQL3 table definition:
{code}
create table results (
row_id text,
index int,
value text,
primary key (row_id, index)
)
with compact storage;
{code}
As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.
The only thing I can seem to find is to do execute the following 50,000 times:
{code}
INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
{code}
where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text value to store at location {{i}}.
With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute.
Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)] on the same machine took 7.5 minutes. I thought batches were supposed to be _faster_ than individual inserts?
We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html]. This took _235 milliseconds_.
h3. Feature Request
As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients.
Example suggested syntax (based on the above example table/column family):
{code}
insert into results (row_id, (index,value)) values
((0,text0), (1,text1), (2,text2), ..., (N,textN));
{code}
Each value in the {{values}} clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate.
Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver.
> CQL3 support for multi-column insert in a single operation (Batch Insert / Batch Mutate)
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-5959
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5959
> Project: Cassandra
> Issue Type: New Feature
> Components: Core, Drivers
> Reporter: Les Hazlewood
> Labels: CQL
>
> h3. Impetus for this Request
> (from the original [question on StackOverflow|http://stackoverflow.com/questions/18522191/using-cassandra-and-cql3-how-do-you-insert-an-entire-wide-row-in-a-single-reque]):
> I want to insert a single row with 50,000 columns into Cassandra 1.2.9. Before inserting, I have all the data for the entire row ready to go (in memory):
> {code}
> +---------+------+------+------+------+-------+
> | | 0 | 1 | 2 | ... | 49999 |
> | row_id +------+------+------+------+-------+
> | | text | text | text | ... | text |
> +---------+------+------+------|------+-------+
> {code}
> The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.
> CQL3 table definition:
> {code}
> create table results (
> row_id text,
> index int,
> value text,
> primary key (row_id, index)
> )
> with compact storage;
> {code}
> As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.
> The only thing I can seem to find is to do execute the following 50,000 times:
> {code}
> INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);
> {code}
> where the first {{?}} is is an index counter ({{i}}) and the second {{?}} is the text value to store at location {{i}}.
> With the Datastax Java Driver client and C* server on the same development machine, this took a full minute to execute.
> Oddly enough, the same 50,000 insert statements in a [Datastax Java Driver Batch|http://www.datastax.com/drivers/java/apidocs/com/datastax/driver/core/querybuilder/QueryBuilder.html#batch(com.datastax.driver.core.Statement...)] on the same machine took 7.5 minutes. I thought batches were supposed to be _faster_ than individual inserts?
> We tried instead with a Thrift client (Astyanax) and the same insert via a [MutationBatch|http://netflix.github.io/astyanax/javadoc/com/netflix/astyanax/MutationBatch.html]. This took _235 milliseconds_.
> h3. Feature Request
> As a result of this performance testing, this issue is to request that CQL3 support batch mutation operations as a single operation (statement) to ensure the same speed/performance benefits as existing Thrift clients.
> Example suggested syntax (based on the above example table/column family):
> {code}
> insert into results (row_id, (index,value)) values
> ((0,text0), (1,text1), (2,text2), ..., (N,textN));
> {code}
> Each value in the {{values}} clause is a tuple. The first tuple element is the column name, the second tuple element is the column value. This seems to be the most simple/accurate representation of what happens during a batch insert/mutate.
> Not having this CQL feature forced us to remove the Datastax Java Driver (which we liked) in favor of Astyanax because Astyanax supports this behavior. We desire feature/performance parity between Thrift and CQL3/Datastax Java Driver, so we hope this request improves both CQL3 and the Driver.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira