You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "David G. Boney" <db...@semanticartifacts.com> on 2011/02/17 18:27:10 UTC

Simple Compression Scheme

Below is a link for a simple client side compression scheme. I thought this might be of interest for some members of the list.

While column values and column names are easy to handle on the client side, with the use of a custom column name comparator for the column names, the fact that there is only one row partitioner for all column families makes it complicated to use compression for the row keys if you have multiple data types for the keys of the different column families. Using properties of Unicode, the below scheme can differentiate between uncompresses Unicode strings, compressed Unicode strings, uncompressed UUIDs, and a pass through code for no compression for a one byte penalty. For my project I only use Unicode strings and UUIDs for my row keys, so this works well for me. The actual compression algorithm can work with both short strings using a static probability table for arithmetic coding compression and long strings using an adaptive arithmetic coding compression You milage may vary. I will have code for this design in a month or two.

http://www.semanticartifacts.com/compression/compression.html

-------------
Sincerely,
David G. Boney
dboney1@semanticartifacts.com
http://www.semanticartifacts.com