You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Andrew Martin <an...@billforward.net> on 2014/06/03 18:53:57 UTC

Output to Cassandra table with column of type "map"

I've written a number of MapReduce jobs using the CQL3 driver that allows input/output from/to Cassandra column families.


The output from the Reducer has always a been a Map<String, ByteBuffer> for the primary key(s) and a List<ByteBuffer> for the values. This works fine for all data types that can be converted easily to a ByteBuffer with "org.apache.cassandra.utils.ByteBufferUtil.bytes()", namely double, float, int, String, etc.


Now I'd like to output data to a column in Cassandra that has the datatype "map", but I'm not sure if I should still pass it as an item in the List of ByteBuffers and, if so, how I'd correctly cast it to a bunch of bytes.


My problem is like the traditional WordCount problem, only I need to output more than one bit of data about the words (imagine I was storing, for each word, the number of times it appeared in the text, the average length of the sentences it appears in, and the date of publication of the oldest text it appears in). I can conceive of a solution with more than one column family, but Cassandra appears to provide the map datatype to avoid this.


Is there a way to output to a Cassandra column of datatype Map, or a way to avoid having to do so?


Cheers,


Andrew