You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Brian O'Neill <bo...@alumni.brown.edu> on 2012/01/20 19:45:11 UTC

Ad Hoc Queries

Interesting articles... (changing the subject line to broaden the scope)
http://codemonkeyism.com/dark-side-nosql/
http://www.reportsanywhere.com/pebble/2010/04/16/1271437740000.html

These articulate the exact challenge we're trying to overcome.

-brian



On Fri, Jan 20, 2012 at 12:57 PM, Brian O'Neill <bo...@alumni.brown.edu>wrote:

> Not terribly large....
> ~50 million rows, each row has ~100-300 columns.
>
> But big enough that a map/reduce job takes longer than users would like.
>
> Actually maybe that is another question...
> Does anyone have any benchmarks running map/reduce against Cassandra?
> (even a simple count / or copy CF benchmark would be helpful)
>
> -brian
>
> On Fri, Jan 20, 2012 at 12:41 PM, Zach Richardson <
> j.zach.richardson@gmail.com> wrote:
>
>> How much data do you think you will need ad hoc query ability for?
>>
>>
>> On Fri, Jan 20, 2012 at 11:28 AM, Brian O'Neill <bo...@alumni.brown.edu>wrote:
>>
>>>
>>> I can't remember if I asked this question before, but....
>>>
>>> We're using Cassandra as our transactional system, and building up quite
>>> a library of map/reduce jobs that perform data quality analysis,
>>> statistics, etc.
>>> (> 100 jobs now)
>>>
>>> But... we are still struggling to provide an "ad-hoc" query mechanism
>>> for our users.
>>>
>>> To fill that gap, I believe we still need to materialize our data in an
>>> RDBMS.
>>>
>>> Anyone have any ideas?  Better ways to support ad-hoc queries?
>>>
>>> Effectively, our users want to be able to select count(distinct Y) from
>>> X group by Z.
>>> Where Y and Z are arbitrary columns of rows in X.
>>>
>>> We believe we can create column families with different key structures
>>> (using Y an Z as row keys), but some column names we don't know / can't
>>> predict ahead of time.
>>>
>>> Are people doing bulk exports?
>>> Anyone trying to keep an RDBMS in synch in real-time?
>>>
>>> -brian
>>>
>>> --
>>> Brian ONeill
>>> Lead Architect, Health Market Science (http://healthmarketscience.com)
>>> mobile:215.588.6024
>>> blog: http://weblogs.java.net/blog/boneill42/
>>> blog: http://brianoneill.blogspot.com/
>>>
>>>
>>
>
>
> --
> Brian ONeill
> Lead Architect, Health Market Science (http://healthmarketscience.com)
> mobile:215.588.6024
> blog: http://weblogs.java.net/blog/boneill42/
> blog: http://brianoneill.blogspot.com/
>
>


-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/