You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Brian O'Neill <bo...@alumni.brown.edu> on 2013/01/21 05:40:40 UTC

Getting plugged in... (Cassandra and Drill?)

(Sorry for the cross-list post, I didn't know which list was appropriate for this question)

Last week, Brad Anderson came up and presented at the PhillyDB meetup.
http://www.slideshare.net/boorad/phillydb-talk-beyond-batch

He gave us an overview of Drill, and I'm curious...

Presently, we heavily use Storm + Cassandra.
http://brianoneill.blogspot.com/2012/08/a-big-data-trifecta-storm-kafka-and.html

We treat CRUD operations as events. Then within Storm we calculate aggregate counts of entities flowing through the system by various dimensions.   That works well, but we still need an ad hoc reporting capability, and a way to report on data in the system that is not active (historical).

Would it be possible to use the Drill engine against a Cassandra backend?
If so, what does that mean?   (implementing some API?)

I assume that performance would be terrible unless somehow the data is stored using the columnar data format from the Dremel paper.  Is that accurate?  Does anyone know if anyone has attempted a translation of that format to Cassandra?

Regardless, I'm very interested in getting involved and no stranger to getting my hands dirty. Let me know if you can provide any direction. (our entities are currently stored in JSON in Cassandra)

-brian

-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/