You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mark Farnan <de...@petrolink.com> on 2014/05/25 15:36:13 UTC

Possible to Add multiple columns in one query ?

I'm sure this is a  CQL 101 question, but. 

 

Is it possible to add MULTIPLE   Rows/Columns  to a single Partition in a
single CQL 3  Query / Call.  

 

Need: 

I'm trying to find the most efficient way to add multiple time series events
to a table in a single call. 

Whilst most time series data comes in sequentially, we have a case where it
is often loaded in bulk,  say sent  100,000 points for 50  channels/tags  at
one go.  (sometimes more), and this needs to be loaded as quickly and
efficiently as possible. 

 

Fairly standard Time-Series schema (this is for testing purposes only at
this point, and doesn't represent final schemas) 

 

CREATE TABLE tag (

  tagid int,

  idx timestamp,

  value double,

  PRIMARY KEY (channelid, idx)

) WITH CLUSTERING ORDER BY (idx DESC);

 

 

Currently I'm using Batch statements, but even that is not fast enough. 

 

Note: At this point I'm testing on a single node cluster on laptop, to
compare different versions.

 

We are using DataStax C# 2.0 (beta) client. And Cassandra 2.0.7

 

Regards

Mark. 


Re: Possible to Add multiple columns in one query ?

Posted by Colin <co...@gmail.com>.
Also, make sure you're using prepared statements.

--
Colin
320-221-9531


> On May 25, 2014, at 1:56 PM, "Jack Krupansky" <ja...@basetechnology.com> wrote:
> 
> Typo: I presume “channelid” should be “tagid” for the partition key for your table.
>  
> Yes, BATCH statements are the way to go, but be careful not to make your batches too large, otherwise you could lose performance when Cassandra is relatively idle while the batch is slowly streaming in to the coordinator node over the network. Better to break up a large batch into multiple moderate size batches (exact size and number will vary and need testing to deduce) that will transmit quicker and can be executed in parallel.
>  
> I’m not sure Cassandra on a laptop would be the best measure of performance for a real cluster, especially compared to a server with more CPU cores than your laptop.
>  
> And for a real cluster, rows with different partition keys can be sent to a coordinator node that owns that partition key, which could be multiple nodes for RF>1.
>  
> -- Jack Krupansky
>  
> From: Mark Farnan
> Sent: Sunday, May 25, 2014 9:36 AM
> To: user@cassandra.apache.org
> Subject: Possible to Add multiple columns in one query ?
>  
> I’m sure this is a  CQL 101 question, but.  
>  
> Is it possible to add MULTIPLE   Rows/Columns  to a single Partition in a single CQL 3  Query / Call. 
>  
> Need:
> I’m trying to find the most efficient way to add multiple time series events to a table in a single call.
> Whilst most time series data comes in sequentially, we have a case where it is often loaded in bulk,  say sent  100,000 points for 50  channels/tags  at one go.  (sometimes more), and this needs to be loaded as quickly and efficiently as possible.
>  
> Fairly standard Time-Series schema (this is for testing purposes only at this point, and doesn’t represent final schemas)
>  
> CREATE TABLE tag (
>   tagid int,
>   idx timestamp,
>   value double,
>   PRIMARY KEY (channelid, idx)
> ) WITH CLUSTERING ORDER BY (idx DESC);
>  
>  
> Currently I’m using Batch statements, but even that is not fast enough.
>  
> Note: At this point I’m testing on a single node cluster on laptop, to compare different versions.
>  
> We are using DataStax C# 2.0 (beta) client. And Cassandra 2.0.7
>  
> Regards
> Mark.

Re: Possible to Add multiple columns in one query ?

Posted by Colin <co...@gmail.com>.
Try asynch updates, and collect the futures at 1,000 and play around from there.  

Also, in the real world, you'd want to use load balancing and token aware policies when connecting to the cluster.  This will actually bypass the coordinator and write directly to the correct nodes.

I will post a link to my github with an example when I get off the road

--
Colin
320-221-9531


> On May 25, 2014, at 1:56 PM, "Jack Krupansky" <ja...@basetechnology.com> wrote:
> 
> Typo: I presume “channelid” should be “tagid” for the partition key for your table.
>  
> Yes, BATCH statements are the way to go, but be careful not to make your batches too large, otherwise you could lose performance when Cassandra is relatively idle while the batch is slowly streaming in to the coordinator node over the network. Better to break up a large batch into multiple moderate size batches (exact size and number will vary and need testing to deduce) that will transmit quicker and can be executed in parallel.
>  
> I’m not sure Cassandra on a laptop would be the best measure of performance for a real cluster, especially compared to a server with more CPU cores than your laptop.
>  
> And for a real cluster, rows with different partition keys can be sent to a coordinator node that owns that partition key, which could be multiple nodes for RF>1.
>  
> -- Jack Krupansky
>  
> From: Mark Farnan
> Sent: Sunday, May 25, 2014 9:36 AM
> To: user@cassandra.apache.org
> Subject: Possible to Add multiple columns in one query ?
>  
> I’m sure this is a  CQL 101 question, but.  
>  
> Is it possible to add MULTIPLE   Rows/Columns  to a single Partition in a single CQL 3  Query / Call. 
>  
> Need:
> I’m trying to find the most efficient way to add multiple time series events to a table in a single call.
> Whilst most time series data comes in sequentially, we have a case where it is often loaded in bulk,  say sent  100,000 points for 50  channels/tags  at one go.  (sometimes more), and this needs to be loaded as quickly and efficiently as possible.
>  
> Fairly standard Time-Series schema (this is for testing purposes only at this point, and doesn’t represent final schemas)
>  
> CREATE TABLE tag (
>   tagid int,
>   idx timestamp,
>   value double,
>   PRIMARY KEY (channelid, idx)
> ) WITH CLUSTERING ORDER BY (idx DESC);
>  
>  
> Currently I’m using Batch statements, but even that is not fast enough.
>  
> Note: At this point I’m testing on a single node cluster on laptop, to compare different versions.
>  
> We are using DataStax C# 2.0 (beta) client. And Cassandra 2.0.7
>  
> Regards
> Mark.

Re: Possible to Add multiple columns in one query ?

Posted by Jack Krupansky <ja...@basetechnology.com>.
Typo: I presume “channelid” should be “tagid” for the partition key for your table.

Yes, BATCH statements are the way to go, but be careful not to make your batches too large, otherwise you could lose performance when Cassandra is relatively idle while the batch is slowly streaming in to the coordinator node over the network. Better to break up a large batch into multiple moderate size batches (exact size and number will vary and need testing to deduce) that will transmit quicker and can be executed in parallel.

I’m not sure Cassandra on a laptop would be the best measure of performance for a real cluster, especially compared to a server with more CPU cores than your laptop.

And for a real cluster, rows with different partition keys can be sent to a coordinator node that owns that partition key, which could be multiple nodes for RF>1.

-- Jack Krupansky

From: Mark Farnan 
Sent: Sunday, May 25, 2014 9:36 AM
To: user@cassandra.apache.org 
Subject: Possible to Add multiple columns in one query ?

I’m sure this is a  CQL 101 question, but. 

 

Is it possible to add MULTIPLE   Rows/Columns  to a single Partition in a single CQL 3  Query / Call.  

 

Need: 

I’m trying to find the most efficient way to add multiple time series events to a table in a single call. 

Whilst most time series data comes in sequentially, we have a case where it is often loaded in bulk,  say sent  100,000 points for 50  channels/tags  at one go.  (sometimes more), and this needs to be loaded as quickly and efficiently as possible. 

 

Fairly standard Time-Series schema (this is for testing purposes only at this point, and doesn’t represent final schemas) 

 

CREATE TABLE tag (

  tagid int,

  idx timestamp,

  value double,

  PRIMARY KEY (channelid, idx)

) WITH CLUSTERING ORDER BY (idx DESC);

 

 

Currently I’m using Batch statements, but even that is not fast enough. 

 

Note: At this point I’m testing on a single node cluster on laptop, to compare different versions.

 

We are using DataStax C# 2.0 (beta) client. And Cassandra 2.0.7

 

Regards

Mark.