You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by onlinespending <on...@gmail.com> on 2013/12/03 06:09:13 UTC

Exactly one wide row per node for a given CF?

Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.

As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column.

If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).

I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning without even using a key (and then clustering on some column).

Thanks for any help.

Re: Exactly one wide row per node for a given CF?

Posted by "Laing, Michael" <mi...@nytimes.com>.

You could shard your rows like the following.

You would need over 100 shards, possibly... so testing is in order :)

Michael

-- put this in <file> and run using 'cqlsh -f <file>

DROP KEYSPACE robert_test;

CREATE KEYSPACE robert_test WITH replication = {
    'class': 'SimpleStrategy',
    'replication_factor' : 1
};

USE robert_test;

CREATE TABLE bdn_index_pub (
    tree int,
    shard int,
    pord int,
    hpath text,
    PRIMARY KEY ((tree, shard), pord)
);

-- shard is calculated as pord % 12

COPY bdn_index_pub (tree, shard, pord, hpath) FROM STDIN;
1, 1, 1, "Chicago"
5, 3, 15, "New York"
1, 5, 5, "Melbourne"
3, 2, 2, "San Francisco"
1, 3, 3, "Palo Alto"
\.

SELECT * FROM bdn_index_pub
WHERE shard IN (0,1,2,3,4,5,6,7,8,9,10,11)
    AND tree =  1
    AND pord < 4
    AND pord > 0
ORDER BY pord desc
;

-- returns:

-- tree | shard | pord | hpath
--------+-------+------+--------------
--    1 |     3 |    3 |  "Palo Alto"
--    1 |     1 |    1 |    "Chicago"



On Tue, Dec 10, 2013 at 8:41 AM, Robert Wille <rw...@fold3.com> wrote:

> I have a question about this statement:
>
> When rows get above a few 10’s  of MB things can slow down, when they get
> above 50 MB they can be a pain, when they get above 100MB it’s a warning
> sign. And when they get above 1GB, well you you don’t want to know what
> happens then.
>
> I tested a data model that I created. Here’s the schema for the table in
> question:
>
> CREATE TABLE bdn_index_pub (
>
> tree INT,
>
> pord INT,
>
> hpath VARCHAR,
>
> PRIMARY KEY (tree, pord)
>
> );
>
> As a test, I inserted 100 million records. tree had the same value for
> every record, and I had 100 million values for pord. hpath averaged about
> 50 characters in length. My understanding is that all 100 million strings
> would have been stored in a single row, since they all had the same value
> for the first component of the primary key. I didn’t look at the size of
> the table, but it had to be several gigs (uncompressed). Contrary to what
> Aaron says, I do want to know what happens, because I didn’t experience any
> issues with this table during my test. Inserting was fast. The last batch
> of records inserted in approximately the same amount of time as the first
> batch. Querying the table was fast. What I didn’t do was test the table
> under load, nor did I try this in a multi-node cluster.
>
> If this is bad, can somebody suggest a better pattern? This table was
> designed to support a query like this: select hpath from bdn_index_pub
> where tree = :tree and pord >= :start and pord <= :end. In my application,
> most trees will have less than a million records. A handful will have 10’s
> of millions, and one of them will have 100 million.
>
> If I need to break up my rows, my first instinct would be to divide each
> tree into blocks of say 10,000 and change tree to a string that contains
> the tree and the block number. Something like this:
>
> 17:0, 0, ‘/’
> …
> 17:0, 9999, ’/a/b/c’
> 17:1,10000, ‘/a/b/d’
> …
>
> I’d then need to issue an extra query for ranges that crossed block
> boundaries.
>
> Any suggestions on a better pattern?
>
> Thanks
>
> Robert
>
> From: Aaron Morton <aa...@thelastpickle.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Tuesday, December 10, 2013 at 12:33 AM
> To: Cassandra User <us...@cassandra.apache.org>
> Subject: Re: Exactly one wide row per node for a given CF?
>
> But this becomes troublesome if I add or remove nodes. What effectively I
>> want is to partition on the unique id of the record modulus N (id % N;
>> where N is the number of nodes).
>
> This is exactly the problem consistent hashing (used by cassandra) is
> designed to solve. If you hash the key and modulo the number of nodes,
> adding and removing nodes requires a lot of data to move.
>
> I want to be able to randomly distribute a large set of records but keep
>> them clustered in one wide row per node.
>
> Sounds like you should revisit your data modelling, this is a pretty well
> known anti pattern.
>
> When rows get above a few 10’s  of MB things can slow down, when they get
> above 50 MB they can be a pain, when they get above 100MB it’s a warning
> sign. And when they get above 1GB, well you you don’t want to know what
> happens then.
>
> It’s a bad idea and you should take another look at the data model. If you
> have to do it, you can try the ByteOrderedPartitioner which uses the row
> key as a token, given you total control of the row placement.
>
> Cheers
>
>
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
>
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> On 4/12/2013, at 8:32 pm, Vivek Mishra <mi...@gmail.com> wrote:
>
> So Basically you want to create a cluster of multiple unique keys, but
> data which belongs to one unique should be colocated. correct?
>
> -Vivek
>
>
> On Tue, Dec 3, 2013 at 10:39 AM, onlinespending <on...@gmail.com>wrote:
>
>> Subject says it all. I want to be able to randomly distribute a large set
>> of records but keep them clustered in one wide row per node.
>>
>> As an example, lets say I’ve got a collection of about 1 million records
>> each with a unique id. If I just go ahead and set the primary key (and
>> therefore the partition key) as the unique id, I’ll get very good random
>> distribution across my server cluster. However, each record will be its own
>> row. I’d like to have each record belong to one large wide row (per server
>> node) so I can have them sorted or clustered on some other column.
>>
>> If I say have 5 nodes in my cluster, I could randomly assign a value of 1
>> - 5 at the time of creation and have the partition key set to this value.
>> But this becomes troublesome if I add or remove nodes. What effectively I
>> want is to partition on the unique id of the record modulus N (id % N;
>> where N is the number of nodes).
>>
>> I have to imagine there’s a mechanism in Cassandra to simply randomize
>> the partitioning without even using a key (and then clustering on some
>> column).
>>
>> Thanks for any help.
>
>
>
>

Re: Exactly one wide row per node for a given CF?

Posted by onlinespending <on...@gmail.com>.

comments below

On Dec 9, 2013, at 11:33 PM, Aaron Morton <aa...@thelastpickle.com> wrote:

>> But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
> This is exactly the problem consistent hashing (used by cassandra) is designed to solve. If you hash the key and modulo the number of nodes, adding and removing nodes requires a lot of data to move. 
> 
>> I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
> Sounds like you should revisit your data modelling, this is a pretty well known anti pattern. 
> 
> When rows get above a few 10’s  of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it’s a warning sign. And when they get above 1GB, well you you don’t want to know what happens then. 
> 
> It’s a bad idea and you should take another look at the data model. If you have to do it, you can try the ByteOrderedPartitioner which uses the row key as a token, given you total control of the row placement.


You should re-read my last paragraph as an example that would clearly benefit from such an approach. If one understands how paging works then you’ll see why how you’d benefit from grouping probabilistically similar data within each node, but also wanting to split data across nodes to reduce hot spotting.

Regardless, I no longer think it’s necessary to have a single wide rode per node. Several wide rows per node is just as good, since for all practical purposes paging in the first N key/values per M rows on a node is the same as reading in the first N*M key/values from a single row.

So I’m going to do what I alluded to before. Treat the LSB of a record’s id as the partition key, and then cluster on something meaningful (such as geohash) and the prefix of the id.

create table test (
	id_prefix int,
	id_suffix int,
	geohash text,
	value text, 
primary key (id_suffix, geohash, id_prefix));

So if this was for a collection of users, they would be randomly distributed across nodes to increase parallelism and reduce hotspots, but within each wide row they'd be meaningfully clustered by geographic region, so as to increase the probability that paged in data is going to contain more of the working set.



> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 4/12/2013, at 8:32 pm, Vivek Mishra <mi...@gmail.com> wrote:
> 
>> So Basically you want to create a cluster of multiple unique keys, but data which belongs to one unique should be colocated. correct?
>> 
>> -Vivek
>> 
>> 
>> On Tue, Dec 3, 2013 at 10:39 AM, onlinespending <on...@gmail.com> wrote:
>> Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
>> 
>> As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column.
>> 
>> If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
>> 
>> I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning without even using a key (and then clustering on some column).
>> 
>> Thanks for any help.
>> 
>

Re: Exactly one wide row per node for a given CF?

Posted by Aaron Morton <aa...@thelastpickle.com>.

>  Querying the table was fast. What I didn’t do was test the table under load, nor did I try this in a multi-node cluster.
As the number of columns in a row increases so does the size of the column index which is read as part of the read path. 

For background and comparisons of latency see http://thelastpickle.com/blog/2011/07/04/Cassandra-Query-Plans.html  or my talk on performance at the SF summit last year http://thelastpickle.com/speaking/2012/08/08/Cassandra-Summit-SF.html While the column index has been lifted to the -Index.db component AFAIK it must still be fully loaded.

Larger rows take longer to go through compaction, tend to cause more JVM GC and have issue during repair. See the in_memory_compaction_limit_in_mb comments in the yaml file. During repair we detect differences in ranges of rows and stream them between the nodes. If you have wide rows and a single column is our of sync we will create a new copy of that row on the node, which must then be compacted. I’ve seen the load on nodes with very wide rows go down by 150GB just by reducing the compaction settings. 

IMHO all things been equal rows in the few 10’s of MB work better. 

Cheers

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 11/12/2013, at 2:41 am, Robert Wille <rw...@fold3.com> wrote:

> I have a question about this statement:
> 
> When rows get above a few 10’s  of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it’s a warning sign. And when they get above 1GB, well you you don’t want to know what happens then. 
> 
> I tested a data model that I created. Here’s the schema for the table in question:
> 
> CREATE TABLE bdn_index_pub (
> 	tree INT,
> 	pord INT,
> 	hpath VARCHAR,
> 	PRIMARY KEY (tree, pord)
> );
> 
> As a test, I inserted 100 million records. tree had the same value for every record, and I had 100 million values for pord. hpath averaged about 50 characters in length. My understanding is that all 100 million strings would have been stored in a single row, since they all had the same value for the first component of the primary key. I didn’t look at the size of the table, but it had to be several gigs (uncompressed). Contrary to what Aaron says, I do want to know what happens, because I didn’t experience any issues with this table during my test. Inserting was fast. The last batch of records inserted in approximately the same amount of time as the first batch. Querying the table was fast. What I didn’t do was test the table under load, nor did I try this in a multi-node cluster.
> 
> If this is bad, can somebody suggest a better pattern? This table was designed to support a query like this: select hpath from bdn_index_pub where tree = :tree and pord >= :start and pord <= :end. In my application, most trees will have less than a million records. A handful will have 10’s of millions, and one of them will have 100 million.
> 
> If I need to break up my rows, my first instinct would be to divide each tree into blocks of say 10,000 and change tree to a string that contains the tree and the block number. Something like this:
> 
> 17:0, 0, ‘/’
> …
> 17:0, 9999, ’/a/b/c’
> 17:1,10000, ‘/a/b/d’
> …
> 
> I’d then need to issue an extra query for ranges that crossed block boundaries.
> 
> Any suggestions on a better pattern?
> 
> Thanks
> 
> Robert
> 
> From: Aaron Morton <aa...@thelastpickle.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Tuesday, December 10, 2013 at 12:33 AM
> To: Cassandra User <us...@cassandra.apache.org>
> Subject: Re: Exactly one wide row per node for a given CF?
> 
>>> But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
> This is exactly the problem consistent hashing (used by cassandra) is designed to solve. If you hash the key and modulo the number of nodes, adding and removing nodes requires a lot of data to move. 
> 
>>> I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
> Sounds like you should revisit your data modelling, this is a pretty well known anti pattern. 
> 
> When rows get above a few 10’s  of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it’s a warning sign. And when they get above 1GB, well you you don’t want to know what happens then. 
> 
> It’s a bad idea and you should take another look at the data model. If you have to do it, you can try the ByteOrderedPartitioner which uses the row key as a token, given you total control of the row placement. 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 4/12/2013, at 8:32 pm, Vivek Mishra <mi...@gmail.com> wrote:
> 
>> So Basically you want to create a cluster of multiple unique keys, but data which belongs to one unique should be colocated. correct?
>> 
>> -Vivek
>> 
>> 
>> On Tue, Dec 3, 2013 at 10:39 AM, onlinespending <on...@gmail.com> wrote:
>>> Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
>>> 
>>> As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column.
>>> 
>>> If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
>>> 
>>> I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning without even using a key (and then clustering on some column).
>>> 
>>> Thanks for any help.
>> 
>

Re: Exactly one wide row per node for a given CF?

Posted by onlinespending <on...@gmail.com>.

Where you’ll run into trouble is with compaction.  It looks as if pord is some sequentially increasing value.  Try your experiment again with a clustering key which is a bit more random at the time of insertion.


On Dec 10, 2013, at 5:41 AM, Robert Wille <rw...@fold3.com> wrote:

> I have a question about this statement:
> 
> When rows get above a few 10’s  of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it’s a warning sign. And when they get above 1GB, well you you don’t want to know what happens then. 
> 
> I tested a data model that I created. Here’s the schema for the table in question:
> 
> CREATE TABLE bdn_index_pub (
> 	tree INT,
> 	pord INT,
> 	hpath VARCHAR,
> 	PRIMARY KEY (tree, pord)
> );
> 
> As a test, I inserted 100 million records. tree had the same value for every record, and I had 100 million values for pord. hpath averaged about 50 characters in length. My understanding is that all 100 million strings would have been stored in a single row, since they all had the same value for the first component of the primary key. I didn’t look at the size of the table, but it had to be several gigs (uncompressed). Contrary to what Aaron says, I do want to know what happens, because I didn’t experience any issues with this table during my test. Inserting was fast. The last batch of records inserted in approximately the same amount of time as the first batch. Querying the table was fast. What I didn’t do was test the table under load, nor did I try this in a multi-node cluster.
> 
> If this is bad, can somebody suggest a better pattern? This table was designed to support a query like this: select hpath from bdn_index_pub where tree = :tree and pord >= :start and pord <= :end. In my application, most trees will have less than a million records. A handful will have 10’s of millions, and one of them will have 100 million.
> 
> If I need to break up my rows, my first instinct would be to divide each tree into blocks of say 10,000 and change tree to a string that contains the tree and the block number. Something like this:
> 
> 17:0, 0, ‘/’
> …
> 17:0, 9999, ’/a/b/c’
> 17:1,10000, ‘/a/b/d’
> …
> 
> I’d then need to issue an extra query for ranges that crossed block boundaries.
> 
> Any suggestions on a better pattern?
> 
> Thanks
> 
> Robert
> 
> From: Aaron Morton <aa...@thelastpickle.com>
> Reply-To: <us...@cassandra.apache.org>
> Date: Tuesday, December 10, 2013 at 12:33 AM
> To: Cassandra User <us...@cassandra.apache.org>
> Subject: Re: Exactly one wide row per node for a given CF?
> 
>>> But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
> This is exactly the problem consistent hashing (used by cassandra) is designed to solve. If you hash the key and modulo the number of nodes, adding and removing nodes requires a lot of data to move. 
> 
>>> I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
> Sounds like you should revisit your data modelling, this is a pretty well known anti pattern. 
> 
> When rows get above a few 10’s  of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it’s a warning sign. And when they get above 1GB, well you you don’t want to know what happens then. 
> 
> It’s a bad idea and you should take another look at the data model. If you have to do it, you can try the ByteOrderedPartitioner which uses the row key as a token, given you total control of the row placement. 
> 
> Cheers
> 
> 
> -----------------
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 4/12/2013, at 8:32 pm, Vivek Mishra <mi...@gmail.com> wrote:
> 
>> So Basically you want to create a cluster of multiple unique keys, but data which belongs to one unique should be colocated. correct?
>> 
>> -Vivek
>> 
>> 
>> On Tue, Dec 3, 2013 at 10:39 AM, onlinespending <on...@gmail.com> wrote:
>>> Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
>>> 
>>> As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column.
>>> 
>>> If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
>>> 
>>> I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning without even using a key (and then clustering on some column).
>>> 
>>> Thanks for any help.
>> 
>

Re: Exactly one wide row per node for a given CF?

Posted by Robert Wille <rw...@fold3.com>.

I have a question about this statement:

When rows get above a few 10¹s  of MB things can slow down, when they get
above 50 MB they can be a pain, when they get above 100MB it¹s a warning
sign. And when they get above 1GB, well you you don¹t want to know what
happens then. 

I tested a data model that I created. Here¹s the schema for the table in
question:

CREATE TABLE bdn_index_pub (

tree INT,

pord INT,

hpath VARCHAR,

PRIMARY KEY (tree, pord)

);

As a test, I inserted 100 million records. tree had the same value for every
record, and I had 100 million values for pord. hpath averaged about 50
characters in length. My understanding is that all 100 million strings would
have been stored in a single row, since they all had the same value for the
first component of the primary key. I didn¹t look at the size of the table,
but it had to be several gigs (uncompressed). Contrary to what Aaron says, I
do want to know what happens, because I didn¹t experience any issues with
this table during my test. Inserting was fast. The last batch of records
inserted in approximately the same amount of time as the first batch.
Querying the table was fast. What I didn¹t do was test the table under load,
nor did I try this in a multi-node cluster.

If this is bad, can somebody suggest a better pattern? This table was
designed to support a query like this: select hpath from bdn_index_pub where
tree = :tree and pord >= :start and pord <= :end. In my application, most
trees will have less than a million records. A handful will have 10¹s of
millions, and one of them will have 100 million.

If I need to break up my rows, my first instinct would be to divide each
tree into blocks of say 10,000 and change tree to a string that contains the
tree and the block number. Something like this:

17:0, 0, /¹

17:0, 9999, ¹/a/b/c¹
17:1,10000, /a/b/d¹

I¹d then need to issue an extra query for ranges that crossed block
boundaries.

Any suggestions on a better pattern?

Thanks

Robert

From:  Aaron Morton <aa...@thelastpickle.com>
Reply-To:  <us...@cassandra.apache.org>
Date:  Tuesday, December 10, 2013 at 12:33 AM
To:  Cassandra User <us...@cassandra.apache.org>
Subject:  Re: Exactly one wide row per node for a given CF?

>> But this becomes troublesome if I add or remove nodes. What effectively I
>> want is to partition on the unique id of the record modulus N (id % N; where
>> N is the number of nodes).
This is exactly the problem consistent hashing (used by cassandra) is
designed to solve. If you hash the key and modulo the number of nodes,
adding and removing nodes requires a lot of data to move.

>> I want to be able to randomly distribute a large set of records but keep them
>> clustered in one wide row per node.
Sounds like you should revisit your data modelling, this is a pretty well
known anti pattern.

When rows get above a few 10¹s  of MB things can slow down, when they get
above 50 MB they can be a pain, when they get above 100MB it¹s a warning
sign. And when they get above 1GB, well you you don¹t want to know what
happens then. 

It¹s a bad idea and you should take another look at the data model. If you
have to do it, you can try the ByteOrderedPartitioner which uses the row key
as a token, given you total control of the row placement.

Cheers

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 4/12/2013, at 8:32 pm, Vivek Mishra <mi...@gmail.com> wrote:

> So Basically you want to create a cluster of multiple unique keys, but data
> which belongs to one unique should be colocated. correct?
> 
> -Vivek
> 
> 
> On Tue, Dec 3, 2013 at 10:39 AM, onlinespending <on...@gmail.com>
> wrote:
>> Subject says it all. I want to be able to randomly distribute a large set of
>> records but keep them clustered in one wide row per node.
>> 
>> As an example, lets say I¹ve got a collection of about 1 million records each
>> with a unique id. If I just go ahead and set the primary key (and therefore
>> the partition key) as the unique id, I¹ll get very good random distribution
>> across my server cluster. However, each record will be its own row. I¹d like
>> to have each record belong to one large wide row (per server node) so I can
>> have them sorted or clustered on some other column.
>> 
>> If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5
>> at the time of creation and have the partition key set to this value. But
>> this becomes troublesome if I add or remove nodes. What effectively I want is
>> to partition on the unique id of the record modulus N (id % N; where N is the
>> number of nodes).
>> 
>> I have to imagine there¹s a mechanism in Cassandra to simply randomize the
>> partitioning without even using a key (and then clustering on some column).
>> 
>> Thanks for any help.
>

Re: Exactly one wide row per node for a given CF?

Posted by Aaron Morton <aa...@thelastpickle.com>.

> But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
This is exactly the problem consistent hashing (used by cassandra) is designed to solve. If you hash the key and modulo the number of nodes, adding and removing nodes requires a lot of data to move. 

> I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
Sounds like you should revisit your data modelling, this is a pretty well known anti pattern. 

When rows get above a few 10’s  of MB things can slow down, when they get above 50 MB they can be a pain, when they get above 100MB it’s a warning sign. And when they get above 1GB, well you you don’t want to know what happens then. 

It’s a bad idea and you should take another look at the data model. If you have to do it, you can try the ByteOrderedPartitioner which uses the row key as a token, given you total control of the row placement. 

Cheers


-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 4/12/2013, at 8:32 pm, Vivek Mishra <mi...@gmail.com> wrote:

> So Basically you want to create a cluster of multiple unique keys, but data which belongs to one unique should be colocated. correct?
> 
> -Vivek
> 
> 
> On Tue, Dec 3, 2013 at 10:39 AM, onlinespending <on...@gmail.com> wrote:
> Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
> 
> As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column.
> 
> If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
> 
> I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning without even using a key (and then clustering on some column).
> 
> Thanks for any help.
>

Re: Exactly one wide row per node for a given CF?

Posted by Aaron Morton <aa...@thelastpickle.com>.

> Basically this desire all stems from wanting efficient use of memory. 
Do you have any real latency numbers you are trying to tune ? 

Otherwise this sounds a little like premature optimisation.

Cheers

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 5/12/2013, at 6:16 am, onlinespending <on...@gmail.com> wrote:

> Pretty much yes. Although I think it’d be nice if Cassandra handled such a case, I’ve resigned to the fact that it cannot at the moment. The workaround will be to partition on the LSB portion of the id (giving 256 rows spread amongst my nodes) which allows room for scaling, and then cluster each row on geohash or something else.
> 
> Basically this desire all stems from wanting efficient use of memory. Frequently accessed keys’ values are kept in RAM through the OS page cache. But the page size is 4KB. This is a problem if you are accessing several small records of data (say 200 bytes), since each record only occupies a small % of a page. This is why it’s important to increase the probability that neighboring data on the disk is relevant. Worst case would be to read in a full 4KB page into RAM, of which you’re only accessing one record that’s a couple hundred bytes. All of the other unused data of the page is wastefully occupying RAM. Now project this problem to a collection of millions of small records all indiscriminately and randomly scattered on the disk, and you can easily see how inefficient your memory usage will become.
> 
> That’s why it’s best to cluster data in some meaningful way, all in an effort to increasing the probability that when one record is accessed in that 4KB block that its neighboring records will also be accessed. This brings me back to the question of this thread. I want to randomly distribute the data amongst the nodes to avoid hot spotting, but within each node I want to cluster the data meaningfully such that the probability that neighboring data is relevant is increased.
> 
> An example of this would be having a huge collection of small records that store basic user information. If you partition on the unique user id, then you’ll get nice random distribution but with no ability to cluster (each record would occupy its own row). You could partition on say geographical region, but then you’ll end up with hot spotting when one region is more active than another. So ideally you’d like to randomly assign a node to each record to increase parallelism, but then cluster all records on a node by say geohash since it is more likely (however small that may be) that when one user from a geographical region is accessed other users from the same region will also need to be accessed. It’s certainly better than having some random user record next to the one you are accessing at the moment.
> 
> 
> 
> 
> On Dec 3, 2013, at 11:32 PM, Vivek Mishra <mi...@gmail.com> wrote:
> 
>> So Basically you want to create a cluster of multiple unique keys, but data which belongs to one unique should be colocated. correct?
>> 
>> -Vivek
>> 
>> 
>> On Tue, Dec 3, 2013 at 10:39 AM, onlinespending <on...@gmail.com> wrote:
>> Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
>> 
>> As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column.
>> 
>> If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
>> 
>> I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning without even using a key (and then clustering on some column).
>> 
>> Thanks for any help.
>> 
>

Re: Exactly one wide row per node for a given CF?

Posted by onlinespending <on...@gmail.com>.

Pretty much yes. Although I think it’d be nice if Cassandra handled such a case, I’ve resigned to the fact that it cannot at the moment. The workaround will be to partition on the LSB portion of the id (giving 256 rows spread amongst my nodes) which allows room for scaling, and then cluster each row on geohash or something else.

Basically this desire all stems from wanting efficient use of memory. Frequently accessed keys’ values are kept in RAM through the OS page cache. But the page size is 4KB. This is a problem if you are accessing several small records of data (say 200 bytes), since each record only occupies a small % of a page. This is why it’s important to increase the probability that neighboring data on the disk is relevant. Worst case would be to read in a full 4KB page into RAM, of which you’re only accessing one record that’s a couple hundred bytes. All of the other unused data of the page is wastefully occupying RAM. Now project this problem to a collection of millions of small records all indiscriminately and randomly scattered on the disk, and you can easily see how inefficient your memory usage will become.

That’s why it’s best to cluster data in some meaningful way, all in an effort to increasing the probability that when one record is accessed in that 4KB block that its neighboring records will also be accessed. This brings me back to the question of this thread. I want to randomly distribute the data amongst the nodes to avoid hot spotting, but within each node I want to cluster the data meaningfully such that the probability that neighboring data is relevant is increased.

An example of this would be having a huge collection of small records that store basic user information. If you partition on the unique user id, then you’ll get nice random distribution but with no ability to cluster (each record would occupy its own row). You could partition on say geographical region, but then you’ll end up with hot spotting when one region is more active than another. So ideally you’d like to randomly assign a node to each record to increase parallelism, but then cluster all records on a node by say geohash since it is more likely (however small that may be) that when one user from a geographical region is accessed other users from the same region will also need to be accessed. It’s certainly better than having some random user record next to the one you are accessing at the moment.

On Dec 3, 2013, at 11:32 PM, Vivek Mishra <mi...@gmail.com> wrote:

> So Basically you want to create a cluster of multiple unique keys, but data which belongs to one unique should be colocated. correct?
> 
> -Vivek
> 
> 
> On Tue, Dec 3, 2013 at 10:39 AM, onlinespending <on...@gmail.com> wrote:
> Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.
> 
> As an example, lets say I’ve got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I’ll get very good random distribution across my server cluster. However, each record will be its own row. I’d like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column.
> 
> If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).
> 
> I have to imagine there’s a mechanism in Cassandra to simply randomize the partitioning without even using a key (and then clustering on some column).
> 
> Thanks for any help.
>

Re: Exactly one wide row per node for a given CF?

Posted by Vivek Mishra <mi...@gmail.com>.

So Basically you want to create a cluster of multiple unique keys, but data
which belongs to one unique should be colocated. correct?

-Vivek


On Tue, Dec 3, 2013 at 10:39 AM, onlinespending <on...@gmail.com>wrote:

> Subject says it all. I want to be able to randomly distribute a large set
> of records but keep them clustered in one wide row per node.
>
> As an example, lets say I’ve got a collection of about 1 million records
> each with a unique id. If I just go ahead and set the primary key (and
> therefore the partition key) as the unique id, I’ll get very good random
> distribution across my server cluster. However, each record will be its own
> row. I’d like to have each record belong to one large wide row (per server
> node) so I can have them sorted or clustered on some other column.
>
> If I say have 5 nodes in my cluster, I could randomly assign a value of 1
> - 5 at the time of creation and have the partition key set to this value.
> But this becomes troublesome if I add or remove nodes. What effectively I
> want is to partition on the unique id of the record modulus N (id % N;
> where N is the number of nodes).
>
> I have to imagine there’s a mechanism in Cassandra to simply randomize the
> partitioning without even using a key (and then clustering on some column).
>
> Thanks for any help.