You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Drew Kutcharian <dr...@venarc.com> on 2014/08/30 00:48:16 UTC

Data partitioning and composite partition key

Hey Guys,

AFAIK, currently Cassandra partitions (thrift) rows using the row key, basically uses the hash(row_key) to decide what node that row needs to be stored on. Now there are times when there is a need to shard a wide row, say storing events per sensor, so you’d have sensorId-datetime row key so you don’t end up with very large rows. Is there a way to have the partitioner use only the “sensorId” part of the row key for the hash? This way we would be able to store all the data relating to a sensor in one node.

Another use case of this would be multi-tenancy:

Say we have accounts and accounts have users. So we would have the following tables:

CREATE TABLE account (
  id                     timeuuid PRIMARY KEY,
  company         text      //timezone
);

CREATE TABLE user (
  id              timeuuid PRIMARY KEY, 
  accountId timeuuid,
  email        text,
  password text
);

// Get users by account
CREATE TABLE user_account_index (
  accountId  timeuuid,
  userId        timeuuid,
  PRIMARY KEY(acid, id)
);

Say I want to get all the users that belong to an account. I would first have to get the results from user_account_index and then use a multi-get (WHERE IN) to get the records from user table. Now this multi-get part could potentially query a lot of different nodes in the cluster. It’d be great if there was a way to limit storage of users of an account to a single node so that way multi-get would only need to query a single node. 

Note that the problem cannot be simply fixed by using (accountId, id) as the primary key for the user table since that would create a problem of having a very large number of (thrift) rows in the users table.

I did look thru the code and JIRA and I couldn’t really find a solution. The closest I got was to have a custom partitioner, but then you can’t have a partitioner per keyspace and that’s not even something that’d be implemented in future based on the following JIRA:
https://issues.apache.org/jira/browse/CASSANDRA-295

Any ideas are much appreciated.

Best,

Drew

Re: Data partitioning and composite partition key

Posted by Drew Kutcharian <dr...@venarc.com>.

Hi Rob,

I agree that one should not mess around with the default partitioner. But there might be value in improving the Murmur3 partitioner to be “Composite Aware”. Since we can have composites in row keys now, why not be able to use only a part of the row key for partitioning? Makes sense?

I just opened this JIRA https://issues.apache.org/jira/browse/CASSANDRA-7850

- Drew


On Aug 29, 2014, at 4:36 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Fri, Aug 29, 2014 at 3:48 PM, Drew Kutcharian <dr...@venarc.com> wrote:
> AFAIK, currently Cassandra partitions (thrift) rows using the row key, basically uses the hash(row_key) to decide what node that row needs to be stored on. Now there are times when there is a need to shard a wide row, say storing events per sensor, so you’d have sensorId-datetime row key so you don’t end up with very large rows. Is there a way to have the partitioner use only the “sensorId” part of the row key for the hash? This way we would be able to store all the data relating to a sensor in one node.
> 
> As a general statement, if you believe you need to create a custom Partitioner in order to handle your use case, you are almost certainly wrong or Doing It Wrong.
> 
> =Rob

Re: Data partitioning and composite partition key

Posted by Robert Coli <rc...@eventbrite.com>.

On Fri, Aug 29, 2014 at 3:48 PM, Drew Kutcharian <dr...@venarc.com> wrote:

> AFAIK, currently Cassandra partitions (thrift) rows using the row key,
> basically uses the hash(row_key) to decide what node that row needs to be
> stored on. Now there are times when there is a need to shard a wide row,
> say storing events per sensor, so you’d have sensorId-datetime row key so
> you don’t end up with very large rows. Is there a way to have the
> partitioner use only the “sensorId” part of the row key for the hash? This
> way we would be able to store all the data relating to a sensor in one node.
>

As a general statement, if you believe you need to create a custom
Partitioner in order to handle your use case, you are almost certainly
wrong or Doing It Wrong.

=Rob

Re: Data partitioning and composite partition key

Posted by Jack Krupansky <ja...@basetechnology.com>.

But you already said that your have “very wide rows”, so pulling massive amounts of data off a single node is very likely to completely dwarf the connect time. Again, doing the gets in parallel from multiple nodes, with parallel requests, would be so much more performant. How many nodes are we talking about?

One of the secrets of Cassandra is to use more, smaller requests in parallel, rather than massive requests to a single coordinator node.

-- Jack Krupansky

From: Drew Kutcharian 
Sent: Friday, August 29, 2014 8:28 PM
To: user@cassandra.apache.org 
Subject: Re: Data partitioning and composite partition key

Mainly lower latency and (network overhead) in multi-get requests (WHERE IN (….)). The coordinator needs to connect only to one node vs potentially all the nodes in the cluster. 

On Aug 29, 2014, at 5:23 PM, Jack Krupansky <ja...@basetechnology.com> wrote:

  Okay, but what benefit do you think you get from having the partitions on the same node – since they would be separate partitions anyway? I mean, what exactly do you think you’re going to do with them, that wouldn’t be a whole lot more performant by being able to process data in parallel from separate nodes? I mean, the whole point of Cassandra is scalability and distributed processing, right?

  -- Jack Krupansky

  From: Drew Kutcharian 
  Sent: Friday, August 29, 2014 7:31 PM
  To: user@cassandra.apache.org 
  Subject: Re: Data partitioning and composite partition key

  Hi Jack, 

  I think you missed the point of my email which was trying to avoid the problem of having very wide rows :)  In the notation of sensorId-datatime, the datatime is a datetime bucket, say a day. The CQL rows would still be keyed by the actual time of the event. So you’d end up having SesonId->Datetime Bucket (day/week/month)->actual event. What I wanted to be able to do was to colocate all the events related to a sensor id on a single node (token).

  See "High Throughput Timelines” at http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

  - Drew

  On Aug 29, 2014, at 3:58 PM, Jack Krupansky <ja...@basetechnology.com> wrote:

    With CQL3, you, the developer, get to decide whether to place a primary key column in the partition key or as a clustering column. So, make sensorID the partition key and datetime as a clustering column.

    -- Jack Krupansky

    From: Drew Kutcharian 
    Sent: Friday, August 29, 2014 6:48 PM
    To: user@cassandra.apache.org 
    Subject: Data partitioning and composite partition key

    Hey Guys, 

    AFAIK, currently Cassandra partitions (thrift) rows using the row key, basically uses the hash(row_key) to decide what node that row needs to be stored on. Now there are times when there is a need to shard a wide row, say storing events per sensor, so you’d have sensorId-datetime row key so you don’t end up with very large rows. Is there a way to have the partitioner use only the “sensorId” part of the row key for the hash? This way we would be able to store all the data relating to a sensor in one node.

    Another use case of this would be multi-tenancy:

    Say we have accounts and accounts have users. So we would have the following tables:

    CREATE TABLE account (
      id                     timeuuid PRIMARY KEY,
      company         text      //timezone
    );

    CREATE TABLE user (
      id              timeuuid PRIMARY KEY, 
      accountId timeuuid,
      email        text,
      password text
    );

    // Get users by account
    CREATE TABLE user_account_index (
      accountId  timeuuid,
      userId        timeuuid,
      PRIMARY KEY(acid, id)
    );

    Say I want to get all the users that belong to an account. I would first have to get the results from user_account_index and then use a multi-get (WHERE IN) to get the records from user table. Now this multi-get part could potentially query a lot of different nodes in the cluster. It’d be great if there was a way to limit storage of users of an account to a single node so that way multi-get would only need to query a single node. 

    Note that the problem cannot be simply fixed by using (accountId, id) as the primary key for the user table since that would create a problem of having a very large number of (thrift) rows in the users table.

    I did look thru the code and JIRA and I couldn’t really find a solution. The closest I got was to have a custom partitioner, but then you can’t have a partitioner per keyspace and that’s not even something that’d be implemented in future based on the following JIRA:
    https://issues.apache.org/jira/browse/CASSANDRA-295

    Any ideas are much appreciated.

    Best,

    Drew

Re: Data partitioning and composite partition key

Posted by Drew Kutcharian <dr...@venarc.com>.

Mainly lower latency and (network overhead) in multi-get requests (WHERE IN (….)). The coordinator needs to connect only to one node vs potentially all the nodes in the cluster.


On Aug 29, 2014, at 5:23 PM, Jack Krupansky <ja...@basetechnology.com> wrote:

> Okay, but what benefit do you think you get from having the partitions on the same node – since they would be separate partitions anyway? I mean, what exactly do you think you’re going to do with them, that wouldn’t be a whole lot more performant by being able to process data in parallel from separate nodes? I mean, the whole point of Cassandra is scalability and distributed processing, right?
>  
> -- Jack Krupansky
>  
> From: Drew Kutcharian
> Sent: Friday, August 29, 2014 7:31 PM
> To: user@cassandra.apache.org
> Subject: Re: Data partitioning and composite partition key
>  
> Hi Jack,
>  
> I think you missed the point of my email which was trying to avoid the problem of having very wide rows :)  In the notation of sensorId-datatime, the datatime is a datetime bucket, say a day. The CQL rows would still be keyed by the actual time of the event. So you’d end up having SesonId->Datetime Bucket (day/week/month)->actual event. What I wanted to be able to do was to colocate all the events related to a sensor id on a single node (token).
>  
> See "High Throughput Timelines” at http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
>  
> - Drew
>  
>  
> On Aug 29, 2014, at 3:58 PM, Jack Krupansky <ja...@basetechnology.com> wrote:
> 
>> With CQL3, you, the developer, get to decide whether to place a primary key column in the partition key or as a clustering column. So, make sensorID the partition key and datetime as a clustering column.
>>  
>> -- Jack Krupansky
>>  
>> From: Drew Kutcharian
>> Sent: Friday, August 29, 2014 6:48 PM
>> To: user@cassandra.apache.org
>> Subject: Data partitioning and composite partition key
>>  
>> Hey Guys,
>>  
>> AFAIK, currently Cassandra partitions (thrift) rows using the row key, basically uses the hash(row_key) to decide what node that row needs to be stored on. Now there are times when there is a need to shard a wide row, say storing events per sensor, so you’d have sensorId-datetime row key so you don’t end up with very large rows. Is there a way to have the partitioner use only the “sensorId” part of the row key for the hash? This way we would be able to store all the data relating to a sensor in one node.
>>  
>> Another use case of this would be multi-tenancy:
>>  
>> Say we have accounts and accounts have users. So we would have the following tables:
>>  
>> CREATE TABLE account (
>>   id                     timeuuid PRIMARY KEY,
>>   company         text      //timezone
>> );
>>  
>> CREATE TABLE user (
>>   id              timeuuid PRIMARY KEY,
>>   accountId timeuuid,
>>   email        text,
>>   password text
>> );
>>  
>> // Get users by account
>> CREATE TABLE user_account_index (
>>   accountId  timeuuid,
>>   userId        timeuuid,
>>   PRIMARY KEY(acid, id)
>> );
>>  
>> Say I want to get all the users that belong to an account. I would first have to get the results from user_account_index and then use a multi-get (WHERE IN) to get the records from user table. Now this multi-get part could potentially query a lot of different nodes in the cluster. It’d be great if there was a way to limit storage of users of an account to a single node so that way multi-get would only need to query a single node.
>>  
>> Note that the problem cannot be simply fixed by using (accountId, id) as the primary key for the user table since that would create a problem of having a very large number of (thrift) rows in the users table.
>>  
>> I did look thru the code and JIRA and I couldn’t really find a solution. The closest I got was to have a custom partitioner, but then you can’t have a partitioner per keyspace and that’s not even something that’d be implemented in future based on the following JIRA:
>> https://issues.apache.org/jira/browse/CASSANDRA-295
>>  
>> Any ideas are much appreciated.
>>  
>> Best,
>>  
>> Drew
> 
>

Re: Data partitioning and composite partition key

Posted by Jack Krupansky <ja...@basetechnology.com>.

Okay, but what benefit do you think you get from having the partitions on the same node – since they would be separate partitions anyway? I mean, what exactly do you think you’re going to do with them, that wouldn’t be a whole lot more performant by being able to process data in parallel from separate nodes? I mean, the whole point of Cassandra is scalability and distributed processing, right?

-- Jack Krupansky

From: Drew Kutcharian 
Sent: Friday, August 29, 2014 7:31 PM
To: user@cassandra.apache.org 
Subject: Re: Data partitioning and composite partition key

Hi Jack, 

I think you missed the point of my email which was trying to avoid the problem of having very wide rows :)  In the notation of sensorId-datatime, the datatime is a datetime bucket, say a day. The CQL rows would still be keyed by the actual time of the event. So you’d end up having SesonId->Datetime Bucket (day/week/month)->actual event. What I wanted to be able to do was to colocate all the events related to a sensor id on a single node (token).

See "High Throughput Timelines” at http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

- Drew

On Aug 29, 2014, at 3:58 PM, Jack Krupansky <ja...@basetechnology.com> wrote:

  With CQL3, you, the developer, get to decide whether to place a primary key column in the partition key or as a clustering column. So, make sensorID the partition key and datetime as a clustering column.

  -- Jack Krupansky

  From: Drew Kutcharian 
  Sent: Friday, August 29, 2014 6:48 PM
  To: user@cassandra.apache.org 
  Subject: Data partitioning and composite partition key

  Hey Guys, 

  AFAIK, currently Cassandra partitions (thrift) rows using the row key, basically uses the hash(row_key) to decide what node that row needs to be stored on. Now there are times when there is a need to shard a wide row, say storing events per sensor, so you’d have sensorId-datetime row key so you don’t end up with very large rows. Is there a way to have the partitioner use only the “sensorId” part of the row key for the hash? This way we would be able to store all the data relating to a sensor in one node.

  Another use case of this would be multi-tenancy:

  Say we have accounts and accounts have users. So we would have the following tables:

  CREATE TABLE account (
    id                     timeuuid PRIMARY KEY,
    company         text      //timezone
  );

  CREATE TABLE user (
    id              timeuuid PRIMARY KEY, 
    accountId timeuuid,
    email        text,
    password text
  );

  // Get users by account
  CREATE TABLE user_account_index (
    accountId  timeuuid,
    userId        timeuuid,
    PRIMARY KEY(acid, id)
  );

  Say I want to get all the users that belong to an account. I would first have to get the results from user_account_index and then use a multi-get (WHERE IN) to get the records from user table. Now this multi-get part could potentially query a lot of different nodes in the cluster. It’d be great if there was a way to limit storage of users of an account to a single node so that way multi-get would only need to query a single node. 

  Note that the problem cannot be simply fixed by using (accountId, id) as the primary key for the user table since that would create a problem of having a very large number of (thrift) rows in the users table.

  I did look thru the code and JIRA and I couldn’t really find a solution. The closest I got was to have a custom partitioner, but then you can’t have a partitioner per keyspace and that’s not even something that’d be implemented in future based on the following JIRA:
  https://issues.apache.org/jira/browse/CASSANDRA-295

  Any ideas are much appreciated.

  Best,

  Drew

Re: Data partitioning and composite partition key

Posted by Drew Kutcharian <dr...@venarc.com>.

Hi Jack,

I think you missed the point of my email which was trying to avoid the problem of having very wide rows :)  In the notation of sensorId-datatime, the datatime is a datetime bucket, say a day. The CQL rows would still be keyed by the actual time of the event. So you’d end up having SesonId->Datetime Bucket (day/week/month)->actual event. What I wanted to be able to do was to colocate all the events related to a sensor id on a single node (token).

See "High Throughput Timelines” at http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra

- Drew


On Aug 29, 2014, at 3:58 PM, Jack Krupansky <ja...@basetechnology.com> wrote:

> With CQL3, you, the developer, get to decide whether to place a primary key column in the partition key or as a clustering column. So, make sensorID the partition key and datetime as a clustering column.
>  
> -- Jack Krupansky
>  
> From: Drew Kutcharian
> Sent: Friday, August 29, 2014 6:48 PM
> To: user@cassandra.apache.org
> Subject: Data partitioning and composite partition key
>  
> Hey Guys,
>  
> AFAIK, currently Cassandra partitions (thrift) rows using the row key, basically uses the hash(row_key) to decide what node that row needs to be stored on. Now there are times when there is a need to shard a wide row, say storing events per sensor, so you’d have sensorId-datetime row key so you don’t end up with very large rows. Is there a way to have the partitioner use only the “sensorId” part of the row key for the hash? This way we would be able to store all the data relating to a sensor in one node.
>  
> Another use case of this would be multi-tenancy:
>  
> Say we have accounts and accounts have users. So we would have the following tables:
>  
> CREATE TABLE account (
>   id                     timeuuid PRIMARY KEY,
>   company         text      //timezone
> );
>  
> CREATE TABLE user (
>   id              timeuuid PRIMARY KEY,
>   accountId timeuuid,
>   email        text,
>   password text
> );
>  
> // Get users by account
> CREATE TABLE user_account_index (
>   accountId  timeuuid,
>   userId        timeuuid,
>   PRIMARY KEY(acid, id)
> );
>  
> Say I want to get all the users that belong to an account. I would first have to get the results from user_account_index and then use a multi-get (WHERE IN) to get the records from user table. Now this multi-get part could potentially query a lot of different nodes in the cluster. It’d be great if there was a way to limit storage of users of an account to a single node so that way multi-get would only need to query a single node.
>  
> Note that the problem cannot be simply fixed by using (accountId, id) as the primary key for the user table since that would create a problem of having a very large number of (thrift) rows in the users table.
>  
> I did look thru the code and JIRA and I couldn’t really find a solution. The closest I got was to have a custom partitioner, but then you can’t have a partitioner per keyspace and that’s not even something that’d be implemented in future based on the following JIRA:
> https://issues.apache.org/jira/browse/CASSANDRA-295
>  
> Any ideas are much appreciated.
>  
> Best,
>  
> Drew

Re: Data partitioning and composite partition key

Posted by Jack Krupansky <ja...@basetechnology.com>.

With CQL3, you, the developer, get to decide whether to place a primary key column in the partition key or as a clustering column. So, make sensorID the partition key and datetime as a clustering column.

-- Jack Krupansky

From: Drew Kutcharian 
Sent: Friday, August 29, 2014 6:48 PM
To: user@cassandra.apache.org 
Subject: Data partitioning and composite partition key

Hey Guys, 

AFAIK, currently Cassandra partitions (thrift) rows using the row key, basically uses the hash(row_key) to decide what node that row needs to be stored on. Now there are times when there is a need to shard a wide row, say storing events per sensor, so you’d have sensorId-datetime row key so you don’t end up with very large rows. Is there a way to have the partitioner use only the “sensorId” part of the row key for the hash? This way we would be able to store all the data relating to a sensor in one node.

Another use case of this would be multi-tenancy:

Say we have accounts and accounts have users. So we would have the following tables:

CREATE TABLE account (
  id                     timeuuid PRIMARY KEY,
  company         text      //timezone
);

CREATE TABLE user (
  id              timeuuid PRIMARY KEY, 
  accountId timeuuid,
  email        text,
  password text
);

// Get users by account
CREATE TABLE user_account_index (
  accountId  timeuuid,
  userId        timeuuid,
  PRIMARY KEY(acid, id)
);

Say I want to get all the users that belong to an account. I would first have to get the results from user_account_index and then use a multi-get (WHERE IN) to get the records from user table. Now this multi-get part could potentially query a lot of different nodes in the cluster. It’d be great if there was a way to limit storage of users of an account to a single node so that way multi-get would only need to query a single node. 

Note that the problem cannot be simply fixed by using (accountId, id) as the primary key for the user table since that would create a problem of having a very large number of (thrift) rows in the users table.

I did look thru the code and JIRA and I couldn’t really find a solution. The closest I got was to have a custom partitioner, but then you can’t have a partitioner per keyspace and that’s not even something that’d be implemented in future based on the following JIRA:
https://issues.apache.org/jira/browse/CASSANDRA-295

Any ideas are much appreciated.

Best,

Drew