You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2009/11/20 22:17:50 UTC

Cassandra users survey

Hi all,

I'd love to get a better feel for who is using Cassandra and what kind
of applications it is seeing.  If you are using Cassandra, could you
share what you're using it for and what stage you are at with it
(evaluation / testing / production)? Also, what alternatives you
evaluated/are evaluating would be useful.  Finally, feel free to throw
in "I'd love to use Cassandra if only it did X" wishes. :)

I can start: Rackspace is using Cassandra for stats collection
(testing, almost production) and as a backend for the Mail & Apps
division (early testing).  We evaluated HBase, Hypertable, dynomite,
and Voldemort as well.

Thanks,

-Jonathan

(If you're in stealth mode or don't want to say anything in public,
feel free to reply to me privately and I will keep it off the record.)

Re: Cassandra users survey

Posted by Phillip Michalak <ph...@digitalreasoning.com>.
We're using Cassandra in development to store custom index information  
on large document sets. Also considered HBase and Voldemort.  
Cassandra's data model and performance tradeoffs seemed to best fit  
our needs.

Features that we're looking forward to seeing:
* map/reduce integration
* built-in counters with incr/decr
* more automated load balancing

Cheers,
Phil

On Nov 20, 2009, at 3:17 PM, Jonathan Ellis wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)


Re: Cassandra users survey

Posted by Ryan King <ry...@twitter.com>.
At twitter we're working on using Cassandra to replace our currents
storage for all tweets. We have a cluster in production that's being
populated outside the the user-critical path (ie, the cassandra
writing is async).

Additionally, we're testing and evaluating for basically everything
else in our stack.

We evaluated a lot of things: a custom mysql impl, voldemort, hbase,
mongodb, memcachdb, hypertable, and others.

-ryan

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Vitaly Kushner <vi...@gmail.com>.
At Astrails we are using Cassandra in a project for one of our
clients. The performance requirements
are such that would require database sharding from the beginning if we
were to use an SQL solution
We thing Cassandra's horizontal scaling allow us to more concentrate
on the application an less on the infrastructure.
The project is still in the early development stage.

--
Vitaly Kushner
http://twitter.com/vkushner
Founder, Astrails Ltd. http://astrails.com/
Check out our blog: http://blog.astrails.com/

On Fri, Nov 20, 2009 at 11:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Richard Grossman <ri...@gmail.com>.
Hello, I'm working at bee.tv. We build actually a large application related
to smart TV and movies recommendations.
We've developed an application wich was only based on Oracle + Java server
side but it was not enough scalable solution. Of course Oracle have
capabilities to scale but you need to pay a very heavy price.

Cassandra 0.4.2 is actually the base of the new recommendation engine we
index each day all the TV shows in every states + all the new VOD sources
like hulu, itunes amazon etc... It's make a lot of data. We expect a very
high level of request we hope that cassandra will match these requirements.

Request:
It'll really help if we could develop a data manager. Some tools that can
help to see what are in DB make simple query something like PgAdmin or mySql
tool.

Add delete with range keys.

Thanks

Re: Cassandra users survey

Posted by Mark Robson <ma...@gmail.com>.
We are keeping an eye on Cassandra with a view to using it in a large-scale
audit data application. Currently I don't think it does quite what we want
but I'm still very impressed with what it does do.

We're not yet at the stage of really properly evaluating it for production
use, but I have had a play but only with up to 4 nodes on VMs with little
data. To evaluate it properly I'd need to try it with a lot more nodes on
real hardware with a lot of data; this would require considerable motivation
from the business to loan me the kit (and spare my time from other
activities, of course)

I'd like to see

* More ideas / solutions for the load balancing problem (all my data goes
into a few nodes) - I understand this quite well but find it very difficult
to explain to others
* Bulk delete operations such as the proposed remove_range - or a method of
supplying a timestamp to delete rows after. This is essential for efficient
data purging.

But rather a lot of the things that I wanted from 0.3 have been done
already, thanks.

Mark

Re: Cassandra users survey

Posted by Eric Lubow <er...@gmail.com>.
At ShermansTravel, we already tested Cassandra and Tokyo Cabinet/Tokyo
Tyrant.  We are starting the implementation phase this week.  The issue with
Tokyo Cabinet for us was writes were too slow.  The issue with Cassandra was
that reads were a little slower than we'd like, but with some caching
(either eventually being built into Cassandra or with memcache in front of
it).

We do a lot of mailings and we want to track who received what mailing, what
kind of mailing, and that when we segment the lists for mailings (in MySQL),
we need to check that the user hasn't received too many mailings of a
certain type over a certain period (get_slice).

We are also evaluating a social network usage similar to the way Facebook
and Digg use the Like concept.  We want to make recommendations on our site
based on what a user's network likes based on location.  We believe
(although we haven't started evaluating yet) that Cassandra will be the
method of storing/accessing this data as well.

As I mentioned earlier some caching would be nice and also (I believe this
was suggested earlier in the thread), a remove_by_slice method.  Also (and I
will likely end up writing some of these myself), some Nagios plugins to
check the health of Cassandra.

Thanks for the excellent product.

-e

On Fri, Nov 20, 2009 at 4:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by scott w <sc...@gmail.com>.
For a project I am working on now at Onespot we are just beginning to move
off RDBMS and onto Cassandra for a subset of our data store. We evaluated
against several other solutions including Tokyo, Voldemort and Riak and
Cassandra seemed the clear winner for our requirements. We have also done
stress testing and been happy with the results.

Wishlist:
- rebalancing
- Hadoop integration
- node replacement that doesn't depend on having the same ip/hostname
- multi-insert (so can insert against multiple keys in one request)

cheers,
Scott

Re: Cassandra users survey

Posted by Erich Nachbar <er...@nachbar.biz>.
Hi,

I'm using Cassandra 0.4.2 at my current client to persist URL graphs
for Spam detection.
The crawling and page classification is done in Hadoop/Bixo/Cascading,
which persists URL classification results into Cassandra.
The incoming production traffic is using Cassandra for the real-time
spam score lookup to determine the spammyness of a URL.

It started out as a prototype and is currently in production with 4
Cassandra nodes (for the last >3 weeks).
Sometimes Cassandra is a little rough on the edges, but in general it works.

Wishes:
- data rebalancing
- proper MapReduce support (ideally supporting the same API HBase
uses, so one could use the same eco-system)
- node decommissioning

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Vitaly Kushner <vi...@gmail.com>.
At Astrails we are using Cassandra in a project for one of our
clients. The performance requirements
are such that would require database sharding from the beginning if we
were to use an SQL solution
We thing Cassandra's horizontal scaling allow us to more concentrate
on the application an less on the infrastructure.
The project is still in the early development stage.

--
Vitaly Kushner
http://twitter.com/vkushner
Founder, Astrails Ltd. http://astrails.com/
Check out our blog: http://blog.astrails.com/

On Fri, Nov 20, 2009 at 11:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Ian Holsman <ia...@holsman.net>.
We're looking at it to be part of a near real time Web analytics engine, which sounds similar to Ooyala.
at the moment I'm pushing to get the thing open sourced if possible.

we're looking at combining Cassandra + Esper, but we are still in the very early stages.
On Nov 21, 2009, at 8:17 AM, Jonathan Ellis wrote:

> Hi all,
> 
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
> 
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
> 
> Thanks,
> 
> -Jonathan
> 
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)

--
Ian Holsman
Ian@Holsman.net




Re: Cassandra users survey

Posted by Jonathan Ellis <jb...@gmail.com>.
Thanks for the replies, everyone.

A couple people have suggested that we put this on the
wiki to replace the old, never-updated PoweredBy page (maybe as
UsersSurvey09).  I'll pull the _public_ responses only into a wiki
page later this week; if you replied to the list but don't want to be
on the page, let me know in the next couple days and I'll leave you
off.  And of course since it is a publicly editable wiki, you can
always remove yourself later too.

-Jonathan

Re: Cassandra users survey

Posted by Chris Were <ch...@gmail.com>.
Hi Jonathan,

Firstly, thanks for all your help on this list. Without lots of your
solutions / tips etc I probably wouldn't be using Cassandra.

I've built a real-time search engine based around all the links that appear
on twitter (http://www.mozzler.com/). Cassandra is my data store for all the
comments associated with a link, mapping short URL's to endpoint URL's etc.
Cassandra is also used for session and user data for the web front end, in
conjunction with memcached to speed up the writes.

If anyone wants a simple django session manager that uses lazyboy to talk to
cassandra, let me know.

Cheers,
Chris

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Joe Stump <jo...@joestump.net>.
SimpleGeo is using Cassandra as the backend of our real-time location  
infrastructure. We needed something that was distributed, could scale,  
could handle lots of writes, etc.

We looked into all the usual suspects, but went with Cassandra because  
it was written in Java (we have two guys who know Java internally), it  
was small enough that we could become heavily involved early on, I  
personally knew a few of the committers, and it's multi-master.

The only thing I think would be super interesting would be increment/ 
decrement.

--Joe


On Nov 20, 2009, at 1:17 PM, Jonathan Ellis wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)


Re: Cassandra users survey

Posted by Edmond Lau <ed...@ooyala.com>.
At Ooyala, we're in the process of testing and productionizing
Cassandra to store and serve our near real-time video analytics data.
Ooyala provides a comprehensive platform for professional video
publishers and enterprise companies looking to build up their online
video presence, and analytics/monetization is a key part of the
platform.

We researched a variety of systems to replace our current MySQL
solution, including HBase, Cassandra, Voldemort, and some others.  Of
those, we seriously considered HBase and Cassandra as satisfying our
needs b/c of HA, scaling, and the more fully featured data schema,
which is a better fit for our high dimensional data.  For both HBase
and Cassandra, we designed data schemas, built functional prototypes
of our application, conducted a fairly thorough performance
evaluation, tested the two systems for various failure scenarios, and
also evaluated how easy each system was to maintain and run.

What I'd like to see in Cassandra:
- More comments in the source code, esp. high-level descriptions of
code organization.  Design docs for various functionality would also
be helpful in getting other folks to contribute.  This was one area
where HBase was significantly better.
- Better bootstrapping and load balancing support (bootstrapping
seemed broken in 0.4.2), but I've seen a lot of work done in these two
areas for 0.5.

Edmond

On Fri, Nov 20, 2009 at 3:02 PM, Tim Underwood <ti...@gmail.com> wrote:
> My company runs a niche comparison shopping site where we take in all sorts
> of raw product data from various sources (retailers, manufacturers,
> distributors, etc...).  We then have to take all that raw data and collapse
> it down across the data sources (e.g. product FOO from source A matches
> product BAR from source B) and eventually end up with a final product that
> gets surfaced to our website.
> Cassandra's data model works great for the raw data where columns are
> sparsely populated and updated.  The SuperColumnFamily model works great for
> my collapsed data where I need to track which bits of information came from
> which raw data.
> I'm currently in testing (almost production).  For this use case I'll only
> be using Cassandra on the backend and then indexing the final data into
> Apache Solr to power the frontend.  My data is small enough to fit on a
> single node so I don't have much use for the partitioning at this point.  If
> anything I'd be more interested in a fully replicated setup where the
> ReplicationFactor is equal to the number of nodes.
> I looked at most of the other nosql solutions (couchdb, mongodb, hbase,
> hypertable, dynomite, voldemort).
> One thing I'd love to see improved:
> - Reading through all the data (or a specific key prefix) in a ColumnFamily
> seems slow.  Cassandra is the bottleneck when I try to index data into Solr
> and it looks like Cassandra's CPU usage is 2-3 times that of Solr's during
> the process.
> I look forward to playing around with 0.5!
> -Tim
> On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I'd love to get a better feel for who is using Cassandra and what kind
>> of applications it is seeing.  If you are using Cassandra, could you
>> share what you're using it for and what stage you are at with it
>> (evaluation / testing / production)? Also, what alternatives you
>> evaluated/are evaluating would be useful.  Finally, feel free to throw
>> in "I'd love to use Cassandra if only it did X" wishes. :)
>>
>> I can start: Rackspace is using Cassandra for stats collection
>> (testing, almost production) and as a backend for the Mail & Apps
>> division (early testing).  We evaluated HBase, Hypertable, dynomite,
>> and Voldemort as well.
>>
>> Thanks,
>>
>> -Jonathan
>>
>> (If you're in stealth mode or don't want to say anything in public,
>> feel free to reply to me privately and I will keep it off the record.)
>
>

Re: Cassandra users survey

Posted by Edmond Lau <ed...@ooyala.com>.
At Ooyala, we're in the process of testing and productionizing
Cassandra to store and serve our near real-time video analytics data.
Ooyala provides a comprehensive platform for professional video
publishers and enterprise companies looking to build up their online
video presence, and analytics/monetization is a key part of the
platform.

We researched a variety of systems to replace our current MySQL
solution, including HBase, Cassandra, Voldemort, and some others.  Of
those, we seriously considered HBase and Cassandra as satisfying our
needs b/c of HA, scaling, and the more fully featured data schema,
which is a better fit for our high dimensional data.  For both HBase
and Cassandra, we designed data schemas, built functional prototypes
of our application, conducted a fairly thorough performance
evaluation, tested the two systems for various failure scenarios, and
also evaluated how easy each system was to maintain and run.

What I'd like to see in Cassandra:
- More comments in the source code, esp. high-level descriptions of
code organization.  Design docs for various functionality would also
be helpful in getting other folks to contribute.  This was one area
where HBase was significantly better.
- Better bootstrapping and load balancing support (bootstrapping
seemed broken in 0.4.2), but I've seen a lot of work done in these two
areas for 0.5.

Edmond

On Fri, Nov 20, 2009 at 3:02 PM, Tim Underwood <ti...@gmail.com> wrote:
> My company runs a niche comparison shopping site where we take in all sorts
> of raw product data from various sources (retailers, manufacturers,
> distributors, etc...).  We then have to take all that raw data and collapse
> it down across the data sources (e.g. product FOO from source A matches
> product BAR from source B) and eventually end up with a final product that
> gets surfaced to our website.
> Cassandra's data model works great for the raw data where columns are
> sparsely populated and updated.  The SuperColumnFamily model works great for
> my collapsed data where I need to track which bits of information came from
> which raw data.
> I'm currently in testing (almost production).  For this use case I'll only
> be using Cassandra on the backend and then indexing the final data into
> Apache Solr to power the frontend.  My data is small enough to fit on a
> single node so I don't have much use for the partitioning at this point.  If
> anything I'd be more interested in a fully replicated setup where the
> ReplicationFactor is equal to the number of nodes.
> I looked at most of the other nosql solutions (couchdb, mongodb, hbase,
> hypertable, dynomite, voldemort).
> One thing I'd love to see improved:
> - Reading through all the data (or a specific key prefix) in a ColumnFamily
> seems slow.  Cassandra is the bottleneck when I try to index data into Solr
> and it looks like Cassandra's CPU usage is 2-3 times that of Solr's during
> the process.
> I look forward to playing around with 0.5!
> -Tim
> On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I'd love to get a better feel for who is using Cassandra and what kind
>> of applications it is seeing.  If you are using Cassandra, could you
>> share what you're using it for and what stage you are at with it
>> (evaluation / testing / production)? Also, what alternatives you
>> evaluated/are evaluating would be useful.  Finally, feel free to throw
>> in "I'd love to use Cassandra if only it did X" wishes. :)
>>
>> I can start: Rackspace is using Cassandra for stats collection
>> (testing, almost production) and as a backend for the Mail & Apps
>> division (early testing).  We evaluated HBase, Hypertable, dynomite,
>> and Voldemort as well.
>>
>> Thanks,
>>
>> -Jonathan
>>
>> (If you're in stealth mode or don't want to say anything in public,
>> feel free to reply to me privately and I will keep it off the record.)
>
>

Re: Cassandra users survey

Posted by Tim Underwood <ti...@gmail.com>.
My company runs a niche comparison shopping site where we take in all sorts
of raw product data from various sources (retailers, manufacturers,
distributors, etc...).  We then have to take all that raw data and collapse
it down across the data sources (e.g. product FOO from source A matches
product BAR from source B) and eventually end up with a final product that
gets surfaced to our website.

Cassandra's data model works great for the raw data where columns are
sparsely populated and updated.  The SuperColumnFamily model works great for
my collapsed data where I need to track which bits of information came from
which raw data.

I'm currently in testing (almost production).  For this use case I'll only
be using Cassandra on the backend and then indexing the final data into
Apache Solr to power the frontend.  My data is small enough to fit on a
single node so I don't have much use for the partitioning at this point.  If
anything I'd be more interested in a fully replicated setup where the
ReplicationFactor is equal to the number of nodes.

I looked at most of the other nosql solutions (couchdb, mongodb, hbase,
hypertable, dynomite, voldemort).

One thing I'd love to see improved:

- Reading through all the data (or a specific key prefix) in a ColumnFamily
seems slow.  Cassandra is the bottleneck when I try to index data into Solr
and it looks like Cassandra's CPU usage is 2-3 times that of Solr's during
the process.

I look forward to playing around with 0.5!

-Tim

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Nov 25, 2009 at 5:38 PM, time <ti...@digg.com> wrote:
> We don't have any such thing. The deployment at Digg is just as alpha as the
> deployment anywhere else. The database team is still trying to figure out
> how to tune, monitor/alert on, and deploy the cluster. So far it's chaotic.

Yeah, I hear the MySQL DBAs are having trouble with the transition.

It's true that the support ecosystem is a lot less mature than for
relational databases.  We'll get there.

Thanks for the suggestions,

-Jonathan

Re: Cassandra users survey

Posted by matthew hawthorne <mh...@gmail.com>.
On Wed, Nov 25, 2009 at 6:38 PM, time <ti...@digg.com> wrote:
> If I query a key range M..N, what nodes would likely answer?

I also would enjoy this feature.  For my evaluation I was attempting
to verify that data was being replicated to the proper nodes.  I was
able to do this superficially by hacking my own EndpointSnitch and
then modifying the log4j config to place the output of StorageProxy
into a separate file.

This way I could see all of the nodes that were being read + written
for each action, and verify that they were the nodes I expected.

-matt

Re: Cassandra users survey

Posted by time <ti...@digg.com>.
>> 2) a practical/situational view of managing a cassandra cluster
>> ...
>> it would be nice to have a more comprehensive deployment guide.
>>     
> You're right.  Maybe we can get Digg to share theirs. :)
>   
We don't have any such thing. The deployment at Digg is just as alpha as 
the deployment anywhere else. The database team is still trying to 
figure out how to tune, monitor/alert on, and deploy the cluster. So far 
it's chaotic.

We have no experience with what to do when a node fails, a rack fails, 
or a datacentre fails.

Our experience with data corruption has been answered with "lose that 
data, hope the bug was fixed, redeploy next version up."

Our answer to "Cassandra performance has degraded in an unusual fashion" 
has been to shut Cassandra down and work on an upgrade path.

If anything, I might advise an entity undertaking a Cassandra deployment 
to "have developers on staff that can help you administer the cluster by 
way of hacking the source code" because, honestly, that's how we've done 
it thus far.

I expect once Cassandra features, architecture, and bugginess stabilise 
(I understand we're on the cusp of that now), the database team at Digg 
will take nearly 100% responsibility for the cluster, and at that point 
we will write extensive documentation about administering the cluster. 
My estimate is 3-9 months from now.

I guess since this is the users survey thread, I should list what I wish 
I had. I would love to have a CLI that can tell me:

         1. What's the keyspace?
         2. What column families exist?
         3. What supercolumns exist?
         4. What columns are part of a particular supercolumn?
         5. What is the key range for a given column family?
         6. What are the last N rows in this column family?
         7. What are the first N rows?
         8. If I query a key range M..N, what nodes would likely answer?
         9. For a given structure I can see, what is the underlying
            directory, file, memory, structure? What SStables make up
            this column family? Which are compacted? What are their
            sizes? How many tombstones are in each? Etc.

I would want this all from the point of view of a CLI. I would not want 
to have to login to any particular node via a shell to ask these 
questions (so "Just look at the XML config file!" is not the proper answer).

Think of a "shell" client of Cassandra that allows exploration and 
navigation by way of Cassandra-specific ls, cd, ps, cat, head, tail.

--
timeless


Re: Cassandra users survey

Posted by Jonathan Ellis <jb...@gmail.com>.
On Tue, Nov 24, 2009 at 8:17 PM, matthew hawthorne <mh...@gmail.com> wrote:
> A few desirables for cassandra:
>
> 1) I'm not a huge fan of thrift.  it would be nice if the client jar
> came packaged with cassandra  (I had to manually build it from the
> thrift-generated java).

It's there, it's just not split out in a separate jar.
org.apache.cassandra.service.Cassandra, etc.  (Probably for someone
better at ant than I, creating a separate cassandra-client.jar for
this would be easy.)

> also, the lack of streaming support is troubling.  a lot of our
> internal services are http, and I'd like to be able to connect a
> column's input stream to the output stream of an http response,
> instead of loading it all into memory.

I've mentioned this to Thrift developers, too, but I'm not holding my breath. :)

> 2) a practical/situational view of managing a cassandra cluster
> ...
> it would be nice to have a more comprehensive deployment guide.

You're right.  Maybe we can get Digg to share theirs. :)

> you fellows at Rackspace should consider offering Cassandra support.
> I know that the ability to have some paid professionals come in and
> train our ops team on how to monitor + manage a cassandra cluster
> would have made a huge difference for us.

My impression is that Rackspace is more comfortable with the "we'll
manage your cluster for your" model than the "we'll train you to do it
yourselves" one, but maybe there is room for both.

-Jonathan

Re: Cassandra users survey

Posted by matthew hawthorne <mh...@gmail.com>.
I work for Comcast, and we have tons of data that we are migrating
into non-relational storage.

we recently evaluated cassandra, riak, voldemort, and hdfs.  I focused
on cassandra, this is why you may have seen me asking dumb questions
over IRC :-)

A few desirables for cassandra:

1) I'm not a huge fan of thrift.  it would be nice if the client jar
came packaged with cassandra  (I had to manually build it from the
thrift-generated java).

also, the lack of streaming support is troubling.  a lot of our
internal services are http, and I'd like to be able to connect a
column's input stream to the output stream of an http response,
instead of loading it all into memory.

2) a practical/situational view of managing a cassandra cluster
("deployment guide", maybe) would be nice.  for my evaluation, I was
seeking answers to questions like:

- how do I add capacity?

- how do I remove capacity? (I believe you're calling it "decommissioning")

- what files should I backup?

- how can I mitigate the risk of lost writes during a power failure?

- how can I ensure that my writes go to multiple data centers?

I think overall the docs are good (I found answers to most of my
questions), but since a lot of groups are analyzing cassandra in this
fashion, and needing to make a sales pitch to management, ops, etc. --
it would be nice to have a more comprehensive deployment guide.

you fellows at Rackspace should consider offering Cassandra support.
I know that the ability to have some paid professionals come in and
train our ops team on how to monitor + manage a cassandra cluster
would have made a huge difference for us.

thanks!

-matt


On Fri, Nov 20, 2009 at 4:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Alexander Staubo <ma...@gmail.com>.
On Fri, Nov 20, 2009 at 10:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.

I would love some perspective about why the other technologies were
not satisfactory, and why Cassandra was.

A.

Re: Cassandra users survey

Posted by Ian Holsman <ia...@holsman.net>.


---
Sent from my phone
Ian Holsman - 703 879-3128

On 21/11/2009, at 12:38 PM, Dan Di Spaltro <da...@gmail.com>  
wrote:

> At Cloudkick we are using Cassandra to store monitoring statistics and
> running analytics over the data.  I would love to share some ideas
> about how we set up our data-model, if anyone is interested.  This
> isn't the right thread to do it in, but I think it would be useful to
> show how we store billions of points of data in Cassandra (and maybe
> get some feedback).
>
> Wishlist
> -remove_slice_range
> -auto loadbalancing
> -inc/dev
>
> On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com>  
> wrote:
>> Hi all,
>>
>> I'd love to get a better feel for who is using Cassandra and what  
>> kind
>> of applications it is seeing.  If you are using Cassandra, could you
>> share what you're using it for and what stage you are at with it
>> (evaluation / testing / production)? Also, what alternatives you
>> evaluated/are evaluating would be useful.  Finally, feel free to  
>> throw
>> in "I'd love to use Cassandra if only it did X" wishes. :)
>>
>> I can start: Rackspace is using Cassandra for stats collection
>> (testing, almost production) and as a backend for the Mail & Apps
>> division (early testing).  We evaluated HBase, Hypertable, dynomite,
>> and Voldemort as well.
>>
>> Thanks,
>>
>> -Jonathan
>>
>> (If you're in stealth mode or don't want to say anything in public,
>> feel free to reply to me privately and I will keep it off the  
>> record.)
>>
>
>
>
> -- 
> Dan Di Spaltro

Re: Cassandra users survey

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Mon, 23 Nov 2009 23:30:51 -0500 Matt Revelle <mr...@gmail.com> wrote: 

MR> Are you both using timestamps as row keys?  Would be great to hear
MR> more details.

I'm using super column keys in a super column.

So let's say your resource is "routerA."

Your data will be:

Row "routerA"
 SuperColumn "Status"
  SuperColumn key T0 (this morning)
   Columns { status: connected, location: USA, ... }
  SuperColumn key T1 (T0 + 10 seconds for example)
   Columns { status: disconnected, location: Europe, ... }
  SuperColumn key T2 (T1 + 10 seconds for example)
   Columns { status: connected, ... } // no location specified

Then you can say "give me the latest super column key" (limit = 1, 
order = reversed, start == end == 0) and you'll get T1.

Ted


Re: Cassandra users survey

Posted by Matt Revelle <mr...@gmail.com>.
On Nov 23, 2009, at 12:27, Ted Zlatanov <tz...@lifelogs.com> wrote:

> On Fri, 20 Nov 2009 17:38:39 -0800 Dan Di Spaltro <dan.dispaltro@gmail.com 
> > wrote:
>
> DDS> At Cloudkick we are using Cassandra to store monitoring  
> statistics and
> DDS> running analytics over the data.  I would love to share some  
> ideas
> DDS> about how we set up our data-model, if anyone is interested.   
> This
> DDS> isn't the right thread to do it in, but I think it would be  
> useful to
> DDS> show how we store billions of points of data in Cassandra (and  
> maybe
> DDS> get some feedback).
>
> I'd like to see that.  My Cassandra use is also for monitoring and so
> far it has been great.  I store status updates in a SuperColumn  
> indexed
> by date and each row represents a unique resource.  It's really simple
> compared to your setup, I'm sure.
>
> Ted

Hi Dan and Ted,

Are you both using timestamps as row keys?  Would be great to hear  
more details.

-Matt

Re: Cassandra users survey

Posted by Ted Zlatanov <tz...@lifelogs.com>.
On Fri, 20 Nov 2009 17:38:39 -0800 Dan Di Spaltro <da...@gmail.com> wrote: 

DDS> At Cloudkick we are using Cassandra to store monitoring statistics and
DDS> running analytics over the data.  I would love to share some ideas
DDS> about how we set up our data-model, if anyone is interested.  This
DDS> isn't the right thread to do it in, but I think it would be useful to
DDS> show how we store billions of points of data in Cassandra (and maybe
DDS> get some feedback).

I'd like to see that.  My Cassandra use is also for monitoring and so
far it has been great.  I store status updates in a SuperColumn indexed
by date and each row represents a unique resource.  It's really simple
compared to your setup, I'm sure.

Ted


Re: Cassandra users survey

Posted by James Golick <ja...@gmail.com>.
I would love to see that post about your data model.

J.

Sent from my iPhone.

On 2009-11-20, at 5:38 PM, Dan Di Spaltro <da...@gmail.com>  
wrote:

> At Cloudkick we are using Cassandra to store monitoring statistics and
> running analytics over the data.  I would love to share some ideas
> about how we set up our data-model, if anyone is interested.  This
> isn't the right thread to do it in, but I think it would be useful to
> show how we store billions of points of data in Cassandra (and maybe
> get some feedback).
>
> Wishlist
> -remove_slice_range
> -auto loadbalancing
> -inc/dev
>
> On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com>  
> wrote:
>> Hi all,
>>
>> I'd love to get a better feel for who is using Cassandra and what  
>> kind
>> of applications it is seeing.  If you are using Cassandra, could you
>> share what you're using it for and what stage you are at with it
>> (evaluation / testing / production)? Also, what alternatives you
>> evaluated/are evaluating would be useful.  Finally, feel free to  
>> throw
>> in "I'd love to use Cassandra if only it did X" wishes. :)
>>
>> I can start: Rackspace is using Cassandra for stats collection
>> (testing, almost production) and as a backend for the Mail & Apps
>> division (early testing).  We evaluated HBase, Hypertable, dynomite,
>> and Voldemort as well.
>>
>> Thanks,
>>
>> -Jonathan
>>
>> (If you're in stealth mode or don't want to say anything in public,
>> feel free to reply to me privately and I will keep it off the  
>> record.)
>>
>
>
>
> -- 
> Dan Di Spaltro

Re: Cassandra users survey

Posted by Dan Di Spaltro <da...@gmail.com>.
At Cloudkick we are using Cassandra to store monitoring statistics and
running analytics over the data.  I would love to share some ideas
about how we set up our data-model, if anyone is interested.  This
isn't the right thread to do it in, but I think it would be useful to
show how we store billions of points of data in Cassandra (and maybe
get some feedback).

Wishlist
-remove_slice_range
-auto loadbalancing
-inc/dev

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>



-- 
Dan Di Spaltro

Re: Cassandra users survey

Posted by Chris Were <ch...@gmail.com>.
Hi Jonathan,

Firstly, thanks for all your help on this list. Without lots of your
solutions / tips etc I probably wouldn't be using Cassandra.

I've built a real-time search engine based around all the links that appear
on twitter (http://www.mozzler.com/). Cassandra is my data store for all the
comments associated with a link, mapping short URL's to endpoint URL's etc.
Cassandra is also used for session and user data for the web front end, in
conjunction with memcached to speed up the writes.

If anyone wants a simple django session manager that uses lazyboy to talk to
cassandra, let me know.

Cheers,
Chris

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Michael Pearson <mj...@gmail.com>.
Hi, I've been waiting for something like Cassandra for a while now for
a personal project.  The data model seems ideally suited to building a
mashup engine, or any arbitrary data user app for that matter.  I'm
still at an early stage conceptually having come from an rdbms
background, and mostly trying to wrap my head around Thrift api and
building a crud/factory in php (Pandra on github) and a keyspace
administrator for fun.  I wanted to jump in early with Cassandra
(started watching from 0.3) with a view to the future as a production
level solution once some administrative nicities have been ironed out
(data migration, node decommissioning, more robust query api etc).
What would be awesome and make me love Cassandra forever would be a
way to group columns together across keys, similar to the way
supercolumns work but by key range (depth) rather than column (width).

.michael.

On Sat, Nov 21, 2009 at 7:17 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Joe Stump <jo...@joestump.net>.
SimpleGeo is using Cassandra as the backend of our real-time location  
infrastructure. We needed something that was distributed, could scale,  
could handle lots of writes, etc.

We looked into all the usual suspects, but went with Cassandra because  
it was written in Java (we have two guys who know Java internally), it  
was small enough that we could become heavily involved early on, I  
personally knew a few of the committers, and it's multi-master.

The only thing I think would be super interesting would be increment/ 
decrement.

--Joe


On Nov 20, 2009, at 1:17 PM, Jonathan Ellis wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)


Re: Cassandra users survey

Posted by Ray Slakinski <ra...@mahalo.com>.
Hello!

Mahalo is currently testing Cassandra as an alternative data store to MySQL
for certain pieces of our data. We are close to production which would see a
move of 15 million rows from one table into a cluster which we hope will
increase speed and allow us to scale the data as it grows more easily than
on MySQL.

We also have a short url product and would like to move its stats collecting
data to Cassandra as well, to do this we would love an incr/decr feature
like memcache/memcachedb has. I'm not sure if this is outside of the scope
of Cassandra or not, but I did notice on this list it was requested before
but thought I'd reitterate again in hopes it gets into the feature list.

Ray Slakinski

On Fri, Nov 20, 2009 at 4:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Santal Li <sa...@gmail.com>.
In Webex, we are using Cassandra for store User Feed & User Activity data,
it was nearly real time, to support web user interactive, current was in
early testing. We evaluated Voldemort, MemcacheDB, Dynomite.

I most want :
1. support build second index on values or columns.
2. support some kind of optimistic lock, base on row or columnfamily.
3. support increment/decrement.

2009/11/21 Jonathan Ellis <jb...@gmail.com>

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Simon Smith <si...@gmail.com>.
The company I'm with is still small and in the early stages, but we're
planning on using Cassandra for user profile information (in
development right now), and possibly other uses later on.  We
evaluated CouchDB and Voldermort, and both of those were great as well
- for CouchDB, I really liked Futon but had some stability issues and
didn't like the manual replication.  Voldermort may be great, but I
couldn't figure out the API (which probably says more about me than
Voldermort).

One of the reasons we chose Cassandra is because we feel like it is
being used in other situations which required scaling.  I'm looking
forward to v0.5 because of load-balancing and for better support for
the situation where a node is lost permanently.

I'm very pleased with the high level of support for Cassandra, both on
this mailing list and on IRC.

Simon

On Fri, Nov 20, 2009 at 4:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Michael Pearson <mj...@gmail.com>.
Hi, I've been waiting for something like Cassandra for a while now for
a personal project.  The data model seems ideally suited to building a
mashup engine, or any arbitrary data user app for that matter.  I'm
still at an early stage conceptually having come from an rdbms
background, and mostly trying to wrap my head around Thrift api and
building a crud/factory in php (Pandra on github) and a keyspace
administrator for fun.  I wanted to jump in early with Cassandra
(started watching from 0.3) with a view to the future as a production
level solution once some administrative nicities have been ironed out
(data migration, node decommissioning, more robust query api etc).
What would be awesome and make me love Cassandra forever would be a
way to group columns together across keys, similar to the way
supercolumns work but by key range (depth) rather than column (width).

.michael.

On Sat, Nov 21, 2009 at 7:17 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by scott w <sc...@gmail.com>.
For a project I am working on now at Onespot we are just beginning to move
off RDBMS and onto Cassandra for a subset of our data store. We evaluated
against several other solutions including Tokyo, Voldemort and Riak and
Cassandra seemed the clear winner for our requirements. We have also done
stress testing and been happy with the results.

Wishlist:
- rebalancing
- Hadoop integration
- node replacement that doesn't depend on having the same ip/hostname
- multi-insert (so can insert against multiple keys in one request)

cheers,
Scott

Re: Cassandra users survey

Posted by Tim Underwood <ti...@gmail.com>.
My company runs a niche comparison shopping site where we take in all sorts
of raw product data from various sources (retailers, manufacturers,
distributors, etc...).  We then have to take all that raw data and collapse
it down across the data sources (e.g. product FOO from source A matches
product BAR from source B) and eventually end up with a final product that
gets surfaced to our website.

Cassandra's data model works great for the raw data where columns are
sparsely populated and updated.  The SuperColumnFamily model works great for
my collapsed data where I need to track which bits of information came from
which raw data.

I'm currently in testing (almost production).  For this use case I'll only
be using Cassandra on the backend and then indexing the final data into
Apache Solr to power the frontend.  My data is small enough to fit on a
single node so I don't have much use for the partitioning at this point.  If
anything I'd be more interested in a fully replicated setup where the
ReplicationFactor is equal to the number of nodes.

I looked at most of the other nosql solutions (couchdb, mongodb, hbase,
hypertable, dynomite, voldemort).

One thing I'd love to see improved:

- Reading through all the data (or a specific key prefix) in a ColumnFamily
seems slow.  Cassandra is the bottleneck when I try to index data into Solr
and it looks like Cassandra's CPU usage is 2-3 times that of Solr's during
the process.

I look forward to playing around with 0.5!

-Tim

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Jake Luciani <ja...@gmail.com>.
I'm about to release a twitter search engine built ontop of cassandra. If
you are interested in beta testing it let me know.

I would like to see cassandra support increment/decrement.

-Jake

On Fri, Nov 20, 2009 at 4:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Ramzi Rabah <rr...@playdom.com>.
We are currently evaluating Cassandra, and using it for a small
feature in production. We are only using the basic insert/get/remove
from the API, with a standard column family. So far, I like a lot of
what Cassandra offers, though I had some tough times with it.

* Version 0.4.2 seems very broken. Besides CASSANDRA-507 which is not
fixed in the v4 version, it seems that when you do significant amount
of deletes, and you try to restart the server, compaction fails pretty
much most of the time in our environment.
* Version 0.5 seems to be better in terms of stability from what I
observed so far. Some things that would definitely be very helpful for
us going forward:
- Easier way to replace a node that dies.
- Disk is not infinite so a way to say when you insert an entry into
cassandra, how long do you want it to be available before it is
deleted by Cassandra.
- Better monitoring tools. It's very hard to tell how heavily loaded a
node and the whole system is right now.

Re: Cassandra users survey

Posted by "B. Todd Burruss" <bb...@real.com>.
I am evaluating "NoSQL" alternatives to your typical hard to scale
RDBMS, specifically Key/Value stores.  I'm not looking for query
capabilities.  I want very very very high availability with very very
large amounts of data.

I have reduced down my list to Cassandra, Voldemort, Riak, and CouchDB.
Voldemort doesn't seem far enough along to properly evaluate so it is on
the back burner.  Couch is used in a lot of places, but without the
"lounge" it doesn't scale, nor have any sort of HA story (and the lounge
is difficult at best to get installed and working.)  I should mention
Oracle is in use today.

That leaves Riak and Cassandra.  I like Cassandra because of the Rack
and DC awareness hooks.  This is a nice feature for those wanting 5 9's
of availability.

I haven't gotten to performance testing yet.  Just trying to verify that
the products do what they are supposed to, and understand the nuances
with each one.

What I'd like to see in Cassandra:

- flexible conflict resolution mechanism.  Not just "last write wins".
Give the client the ability to "merge" conflicting values.
- A nice web interface to cluster statistics and management.  Something
an operations team could lean on to examine the entire cluster.

thx!





On Fri, 2009-11-20 at 15:17 -0600, Jonathan Ellis wrote:
> Hi all,
> 
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
> 
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
> 
> Thanks,
> 
> -Jonathan
> 
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)



Re: Cassandra users survey

Posted by Erich Nachbar <er...@nachbar.biz>.
Hi,

I'm using Cassandra 0.4.2 at my current client to persist URL graphs
for Spam detection.
The crawling and page classification is done in Hadoop/Bixo/Cascading,
which persists URL classification results into Cassandra.
The incoming production traffic is using Cassandra for the real-time
spam score lookup to determine the spammyness of a URL.

It started out as a prototype and is currently in production with 4
Cassandra nodes (for the last >3 weeks).
Sometimes Cassandra is a little rough on the edges, but in general it works.

Wishes:
- data rebalancing
- proper MapReduce support (ideally supporting the same API HBase
uses, so one could use the same eco-system)
- node decommissioning

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Jonathan Ellis <jb...@gmail.com>.
Thanks for the replies, everyone.

A couple people have suggested that we put this on the
wiki to replace the old, never-updated PoweredBy page (maybe as
UsersSurvey09).  I'll pull the _public_ responses only into a wiki
page later this week; if you replied to the list but don't want to be
on the page, let me know in the next couple days and I'll leave you
off.  And of course since it is a publicly editable wiki, you can
always remove yourself later too.

-Jonathan

Re: Cassandra users survey

Posted by Jake Luciani <ja...@gmail.com>.
I'm about to release a twitter search engine built ontop of cassandra. If
you are interested in beta testing it let me know.

I would like to see cassandra support increment/decrement.

-Jake

On Fri, Nov 20, 2009 at 4:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by "B. Todd Burruss" <bb...@real.com>.
I am evaluating "NoSQL" alternatives to your typical hard to scale
RDBMS, specifically Key/Value stores.  I'm not looking for query
capabilities.  I want very very very high availability with very very
large amounts of data.

I have reduced down my list to Cassandra, Voldemort, Riak, and CouchDB.
Voldemort doesn't seem far enough along to properly evaluate so it is on
the back burner.  Couch is used in a lot of places, but without the
"lounge" it doesn't scale, nor have any sort of HA story (and the lounge
is difficult at best to get installed and working.)  I should mention
Oracle is in use today.

That leaves Riak and Cassandra.  I like Cassandra because of the Rack
and DC awareness hooks.  This is a nice feature for those wanting 5 9's
of availability.

I haven't gotten to performance testing yet.  Just trying to verify that
the products do what they are supposed to, and understand the nuances
with each one.

What I'd like to see in Cassandra:

- flexible conflict resolution mechanism.  Not just "last write wins".
Give the client the ability to "merge" conflicting values.
- A nice web interface to cluster statistics and management.  Something
an operations team could lean on to examine the entire cluster.

thx!





On Fri, 2009-11-20 at 15:17 -0600, Jonathan Ellis wrote:
> Hi all,
> 
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
> 
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
> 
> Thanks,
> 
> -Jonathan
> 
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)



Re: Cassandra users survey

Posted by Ryan King <ry...@twitter.com>.
At twitter we're working on using Cassandra to replace our currents
storage for all tweets. We have a cluster in production that's being
populated outside the the user-critical path (ie, the cassandra
writing is async).

Additionally, we're testing and evaluating for basically everything
else in our stack.

We evaluated a lot of things: a custom mysql impl, voldemort, hbase,
mongodb, memcachdb, hypertable, and others.

-ryan

On Fri, Nov 20, 2009 at 1:17 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Hi all,
>
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
>
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
>
> Thanks,
>
> -Jonathan
>
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)
>

Re: Cassandra users survey

Posted by Ian Holsman <ia...@holsman.net>.
We're looking at it to be part of a near real time Web analytics engine, which sounds similar to Ooyala.
at the moment I'm pushing to get the thing open sourced if possible.

we're looking at combining Cassandra + Esper, but we are still in the very early stages.
On Nov 21, 2009, at 8:17 AM, Jonathan Ellis wrote:

> Hi all,
> 
> I'd love to get a better feel for who is using Cassandra and what kind
> of applications it is seeing.  If you are using Cassandra, could you
> share what you're using it for and what stage you are at with it
> (evaluation / testing / production)? Also, what alternatives you
> evaluated/are evaluating would be useful.  Finally, feel free to throw
> in "I'd love to use Cassandra if only it did X" wishes. :)
> 
> I can start: Rackspace is using Cassandra for stats collection
> (testing, almost production) and as a backend for the Mail & Apps
> division (early testing).  We evaluated HBase, Hypertable, dynomite,
> and Voldemort as well.
> 
> Thanks,
> 
> -Jonathan
> 
> (If you're in stealth mode or don't want to say anything in public,
> feel free to reply to me privately and I will keep it off the record.)

--
Ian Holsman
Ian@Holsman.net




Re: Cassandra users survey

Posted by Joseph Bowman <bo...@gmail.com>.
Hi Jonathon,

I'd say I am at the evaluation stage. The only reason I am looking at nosql
type applications instead of using mysql is the vain hope my application
will one day scale to the point that mysql won't be the best option.
Cassandra appears to be the best fit for the requirement I have that
everything must scale horizontally.

The information I will store will be user accounts, user configuration
options, and other small data sets to start. Eventually, the largest
implementation will be comments functionality on urls for a search engine
interface I am building.

The only things i'd like to see is the atomic operations discussed a while
back, and an easier interface for python. Honestly the latter can be built
on top of thrift ( lazyboy is one attempt ) so I could just write it myself,
but you did ask.

On Nov 20, 2009 4:18 PM, "Jonathan Ellis" <jb...@gmail.com> wrote:

Hi all,

I'd love to get a better feel for who is using Cassandra and what kind
of applications it is seeing.  If you are using Cassandra, could you
share what you're using it for and what stage you are at with it
(evaluation / testing / production)? Also, what alternatives you
evaluated/are evaluating would be useful.  Finally, feel free to throw
in "I'd love to use Cassandra if only it did X" wishes. :)

I can start: Rackspace is using Cassandra for stats collection
(testing, almost production) and as a backend for the Mail & Apps
division (early testing).  We evaluated HBase, Hypertable, dynomite,
and Voldemort as well.

Thanks,

-Jonathan

(If you're in stealth mode or don't want to say anything in public,
feel free to reply to me privately and I will keep it off the record.)