You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by James Campbell <ja...@breachintelligence.com> on 2014/04/22 15:43:48 UTC

BulkOutputFormat and CQL3

Hi Cassandra Users-

I have a Hadoop job that uses the pattern in Cassandra 2.0.6's hadoop_cql3_word_count example to load data from HDFS into Cassandra.  Having read about BulkOutputFormat as a way to potentially significantly increase the write throughput from Hadoop to Cassandra, I am considering testing against that pattern (http://www.datastax.com/dev/blog/improved-hadoop-output, http://shareitexploreit.blogspot.com/2012/03/bulkloadto-cassandra-with-hadoop.html ).

Is it possible/supported/recommended to use the BulkOutputFormat to load data from Hadoop to a CQL3 table in Cassandra?

I see several examples of building composite keys using Hector (e.g. http://www.datastax.com/dev/blog/introduction-to-composite-columns-part-1, http://brianoneill.blogspot.com/2012/09/composite-keys-connecting-dots-between.html ), but with the changes to support CQL3 having left a lot of different documentation out there for different versions, it's not clear to me what the "proper" way to build the requisite ByteBuffer, List<Mutation> pairs that the ColumnFamilyOutputFormat (and so BulkOutputFormat) needs.

James