You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/06/07 20:24:21 UTC

A list of all potential problems when using byte ordered partitioner?

I believe I'm aware of the problems that can arise due to the byte ordered
partitioner.

Is there a full list of ALL the problems?  I want to make sure I'm not
missing anything.

The main problems I'm aware of are:

... "natural" inserts where the key is something like a username will tend
to have radically uneven value distribution.

Another problem is that bulk loading data will sequentially overload one of
the nodes, followed by the next.  The will trigger a bottleneck in the
cluster and your write throughput will only be as good as a single node.

I believe my design would benefit from the byte ordered partitioner.

I'm going to use the MD5 hash of the a unique identifier for the primary
key.  (MD5 probably as it doesn't need to be secure and is faster than
SHA1, but perhaps SHA1 anyway just to avoid any potential collisions).

This way I can use the cluster in both modes.. .since I'm going to be using
a hashcode as my primary key, for some tables, it will be have like
RandomPartitioner.

And for certain tables which are append-only , I can get range queries
across the cluster if I design my primary key correctly.

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.