You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "David G. Boney" <db...@semanticartifacts.com> on 2010/12/24 20:07:15 UTC

Partitions

I am using the Hadoop interface with Cassandra. Is it possible to line up partitions or splits of two different column families to be on the same node? I am doing this for data locality reasons. I want to read all the data from a split of column family A and a split from column family B into memory to do some processing.

Here is an example. Column family A has 1,000,000 rows and column family B has 50,000,000 rows. Let say column family A has a split every 10,000 rows and column family B has a split every 500,000 rows. I want the first split of A and the first split of B on same node and the second split of A and second split of B on the next node, and so on. 

A second scenario is that the two column families use the same key. Lets assume the key is an integer in the range of 1 to 1,000,000. The two column families have a different number of rows. I would like the splits to occur at certain multiples of the key value, say every 10,000. The first split would have keys in the range of 1 to 9999. The second split would have keys in the range of 10,000 to 19,999 and so on. I still want the first split of column family A and the first split of column family B to be on the first node, and so on. It is possible in this scenario that a split could be empty or very small, that is OK.
-------------
Sincerely,
David G. Boney
dboney1@semanticartifacts.com
http://www.semanticartifacts.com