You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Kochheiser,Todd W - TOK-DITT-1" <tw...@bpa.gov> on 2010/07/10 07:25:25 UTC

TechCrunch article on Twitter and Cassandra

A good read.

http://techcrunch.com/2010/07/09/twitter-analytics-mysql/

Todd

Re: TechCrunch article on Twitter and Cassandra

Posted by Colin Clark <co...@cloudeventprocessing.com>.
  Glad to hear it; and I'm thrilled to see innovation occurring in this 
space.  But one consulting company that's been in business for a couple 
of months now wouldn't help me with tier 1 deployments.

I'm looking forward to having an ecosystem around Cassandra - I think 
that Cassandra, and other NoSQL databases, are potentially a game 
changer in the business.

But Cassandra just isn't ready for the work I do yet.  That doesn't mean 
it's not ready for *a lot* of other use cases.

+1 315 886 3422 cell
+1 701 212 4314 office
http://blog.cloudeventprocessing.com
http://twitter.com/EventCloudPro <http://twitter.com/EventCloudPro%20>

On 7/12/2010 7:07 PM, Jonathan Ellis wrote:
> On Sat, Jul 10, 2010 at 2:22 PM, Colin Clark
> <co...@cloudeventprocessing.com>  wrote:
>> Although I'm a fan of Cassandra, there's no way I'd use it today for my tier
>> 1 deployments, because I don't have the resources of Facebook, and even
>> though Cassandra is open source, that doesn't mean I can fix it when it goes
>> down.  And, because it's open source, there's no one to call to have it
>> fixed reliably and within production constraints.
> For the record, that hasn't been true for a couple months now. :)
>

Re: TechCrunch article on Twitter and Cassandra

Posted by Jonathan Ellis <jb...@gmail.com>.
On Sat, Jul 10, 2010 at 2:22 PM, Colin Clark
<co...@cloudeventprocessing.com> wrote:
> Although I'm a fan of Cassandra, there's no way I'd use it today for my tier
> 1 deployments, because I don't have the resources of Facebook, and even
> though Cassandra is open source, that doesn't mean I can fix it when it goes
> down.  And, because it's open source, there's no one to call to have it
> fixed reliably and within production constraints.

For the record, that hasn't been true for a couple months now. :)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: TechCrunch article on Twitter and Cassandra

Posted by Schubert Zhang <zs...@gmail.com>.
t is ardently discussing @http://news.ycombinator.com/item?id=1502756
Here are my comments:
1. Cassandra is very young! Especially, the design and implementation of
local storage and local indexing are junior and not good.
2. Pool read-performance is also due to the poor local storage
implementation.
3. The local storage, indexing and persistence structures are not stable.
They need to be re-designed /re-implemented. If Twitter move data to current
Cassandra, they should do another move later for a new local storage,
indexing and persistence structure.

4. Twitter have very good experiences on MySQL, but not for Cassandra. Build
and maintain and product such as Cassandra need more smart and practised
engineers.
5. There are many good techniques in Cassandra and other open-sourced
projects (such as Hadoop, HBase ...), etc. But, they are not ready for
production. Understand the detail of these techniques and implement them in
your projects/products.


On Sun, Jul 11, 2010 at 7:40 AM, Colin Clark <colin@cloudeventprocessing.com
> wrote:

>  Benjamin,
>
> Please see below - it sounds like you're taking this a little personally
> and I'm not sure why.  You've made some errors in your reply.
>
>  Colin
> +1 315 886 3422 cella
> +1 701 212 4314 office
> http://blog.cloudeventprocessing.com
> http://twitter.com/EventCloudPro <http://twitter.com/EventCloudPro%20>
>
> On 7/10/2010 5:21 PM, Benjamin Black wrote:
>
> On Sat, Jul 10, 2010 at 12:22 PM, Colin Clark<co...@cloudeventprocessing.com> <co...@cloudeventprocessing.com> wrote:
>
>
>  Although I'm a fan of Cassandra, there's no way I'd use it today for my tier
> 1 deployments, because I don't have the resources of Facebook, and even
> though Cassandra is open source, that doesn't mean I can fix it when it goes
> down.  And, because it's open source, there's no one to call to have it
> fixed reliably and within production constraints.  Cassandra's strength is
> its greatest weakness right now.
>
>
>
>  There are others, however, who do have the skills not just to fix it
> when it goes down, but to improve the code in a variety of ways and
> contribute that code back the the project.  That you do not have those
> skills is a good indication you should stick to what you know, not an
> indictment of Cassandra (or any other non-SQL store).
>
>
>
>  I didn't say 'didn't have the skills.'  I said 'resources.'  Those are two
> very different things.  While I and my team have nothing to prove to you,
> working on Cassandra is completely within our realm of ability and
> expertise.  Not having the resources means, that relative to our current
> focus, we, our customers, and our investors get a bigger bang of each
> engineering $ spent having us focus on different problems.  Using a piece of
> software isn't just an engineering issue, it has to make business sense as
> well.  So if I really wanted to use Cassandra in a mission critical way, I'd
> have to be able to justify the investment involved in creating an internal
> Cassandra team.  This is why there's so much 'flap' over what Twitter and
> Facebook are or are not using Cassandra for.
>
>  The bloom is starting to come off NoSQL, which is normal - it means that
> people & firms are trying to do more with it and most probably realizing
> that all of the tools, support, infrastructure, etc. surrounding alternative
> solutions isn't such a bad thing.  And that the world of NoSQL had start to
> come up with a better mantra than "joins are bad, dude", and "you're just
> protecting the status quo."  There's a *lot more* big data wrapped up inside
> of SQL databases and only a fraction of the in NoSQL - and there's a lot of
> reasons for it.
>
>
>
>  You are, for whatever reason, using the dullest of cliches as if they
> were informed opinion.  Nobody with actual knowledge of the space says
> "joins are bad, dude".  What they might say is "When you have
> petabytes and low latency requirements, joins are an expensive
> proposition".  That is clearly a true statement and constructing
> indices in a column store to avoid joins is a reasonable decision to
> avoid that expense.  Is it free?  Of course not, nothing is.
>
>
>
>  Again, I'm a fan of NoSql, and of Cassandra.  When I said, 'the world of
> NoSQL,' I was including myself in that world.  And, I agree that those
> cliches are dull, overused, and ill-informed (anyone who's actually done
> anything with a lot of data knows how expensive joins are - with or without
> petabytes).  But again, this is what business sees when they listen to
> Twitter, or subscribe to these mailing lists.  This is how opinions are
> formed in the minds of analysts and they then influence their customers.  We
> need to do a better job, and yet again, this is why understanding what
> Twitter and Facebook are or are not doing with Cassandra is important.
>
>   For example, do I *really* need Cassandra if MySQL will work for me and I
> just want to get up and running quickly without writing a bunch of code?  My
> team was pushing greater than 20k updates per second into, GASP, Oracle 5
> years ago.  Sure, it was expensive.  But it worked.  And it was worth it -
> or we wouldn't have spent the $$.  What's your data worth if you don't have
> your data? zero.
>
>
>
>  Had you spent any time on the irc channel you would've seen this
> advice given repeatedly.  If you don't need what Cassandra does, don't
> use it.  That you have seen 20k updates/sec on really expensive
> hardware with a SQL store is neither surprising nor relevant.  As you
> must realize, those choose to ignore, Cassandra is about more than
> just high, per-node write throughput.  It is about seamless scale-out
> of a single cluster, robustness in the face of node failure and
> network partition, etc.  Can you do that with a SQL store?  Certainly.
>  Expect to pay 5x in hardware and not be able to operate multi-DC.
> It's what folks call a trade-off.
>
>
>
> So that's a trade-off?  Thanks - maybe Facebook and Twitter missed that
> before spending hundreds of thousands of $$ on a project only to later
> change course.  Include opportunity cost in that, and you're easily in the
> millions of wasted $- or do we call that a 'learning exercise?'  I'd love to
> hear what Twitter & Facebook's boards (there I am again with that whole
> pesky 'business' thing again) had to say about that?  And I'm assuming that
> the same thing might just happen to a tech team that chose to spend valuable
> cycles on evaluating/implementing Cassandra only to change course - they'd
> have to explain that as well.  And then they'd hear something like, "Dudes,
> you did what?  Even Facebook & Twitter decided not to use Cassandra that
> way!"  This is not as far fetched as it sounds.  Someone on my advisory
> board asked me a very similar question about our use of Cassandra and given
> the recent news, whether or not that impacted our plans.
>
> And I'm assuming that if you're going to frantically wave arms with "SQL
> costs 5x more and you can't do that multi-DC..." that you've got something
> to back that up?  'Cuz Facebook is using a SQL store, they're using it
> multi-DC, and they're running on commodity hardware, right?
>
>
>
>  And then there's support - internal support.  Picking a database du-jour is
> organizationally expensive.  Especially when there's probably one or two
> databases that Twitter could have bought off the shelf that would have
> solved their problems.
>
>
>  You have no idea what their actual problems are and are merely
> engaging in the favorite game of HN and similar venues: armchair
> engineering.
>
>
>
>  Sure I do.  But from a business perspective.  Their architecture doesn't
> scale right now very well.  They're running with reduced API limits and you
> still get the 'fail whale' more than occasionally.  People lose followers.
> People lose tweets.  Privacy has been compromised.  Need I go on?  All of
> this would make me, as a potential customer of Twitter, as a question, "So,
> what's up with the scalability thing?  What happens if I miss a critical
> time window with my sponsored Tweets?  Do I get that $ back?, I didn't get
> 'imprints' but the opportunity is gone."  But you're right, from an
> engineering point of view, I have no idea what their problems are.  I do
> know that Cassandra was supposed to fix some of them, and now it's not and I
> don't know anything about that from an engineering point of view either.
>
> Also, I have no idea of what 'HN or similar venues' refers to.
>
>  b
>
>
>

Re: TechCrunch article on Twitter and Cassandra

Posted by Colin Clark <co...@cloudeventprocessing.com>.
Benjamin,

Please see below - it sounds like you're taking this a little personally 
and I'm not sure why.  You've made some errors in your reply.

Colin
+1 315 886 3422 cella
+1 701 212 4314 office
http://blog.cloudeventprocessing.com
http://twitter.com/EventCloudPro <http://twitter.com/EventCloudPro%20>

On 7/10/2010 5:21 PM, Benjamin Black wrote:
> On Sat, Jul 10, 2010 at 12:22 PM, Colin Clark
> <co...@cloudeventprocessing.com>  wrote:
>    
>> Although I'm a fan of Cassandra, there's no way I'd use it today for my tier
>> 1 deployments, because I don't have the resources of Facebook, and even
>> though Cassandra is open source, that doesn't mean I can fix it when it goes
>> down.  And, because it's open source, there's no one to call to have it
>> fixed reliably and within production constraints.  Cassandra's strength is
>> its greatest weakness right now.
>>
>>      
> There are others, however, who do have the skills not just to fix it
> when it goes down, but to improve the code in a variety of ways and
> contribute that code back the the project.  That you do not have those
> skills is a good indication you should stick to what you know, not an
> indictment of Cassandra (or any other non-SQL store).
>
>    
I didn't say 'didn't have the skills.'  I said 'resources.'  Those are 
two very different things.  While I and my team have nothing to prove to 
you, working on Cassandra is completely within our realm of ability and 
expertise.  Not having the resources means, that relative to our current 
focus, we, our customers, and our investors get a bigger bang of each 
engineering $ spent having us focus on different problems.  Using a 
piece of software isn't just an engineering issue, it has to make 
business sense as well.  So if I really wanted to use Cassandra in a 
mission critical way, I'd have to be able to justify the investment 
involved in creating an internal Cassandra team.  This is why there's so 
much 'flap' over what Twitter and Facebook are or are not using 
Cassandra for.
>> The bloom is starting to come off NoSQL, which is normal - it means that
>> people&  firms are trying to do more with it and most probably realizing
>> that all of the tools, support, infrastructure, etc. surrounding alternative
>> solutions isn't such a bad thing.  And that the world of NoSQL had start to
>> come up with a better mantra than "joins are bad, dude", and "you're just
>> protecting the status quo."  There's a *lot more* big data wrapped up inside
>> of SQL databases and only a fraction of the in NoSQL - and there's a lot of
>> reasons for it.
>>
>>      
> You are, for whatever reason, using the dullest of cliches as if they
> were informed opinion.  Nobody with actual knowledge of the space says
> "joins are bad, dude".  What they might say is "When you have
> petabytes and low latency requirements, joins are an expensive
> proposition".  That is clearly a true statement and constructing
> indices in a column store to avoid joins is a reasonable decision to
> avoid that expense.  Is it free?  Of course not, nothing is.
>
>    
Again, I'm a fan of NoSql, and of Cassandra.  When I said, 'the world of 
NoSQL,' I was including myself in that world.  And, I agree that those 
cliches are dull, overused, and ill-informed (anyone who's actually done 
anything with a lot of data knows how expensive joins are - with or 
without petabytes).  But again, this is what business sees when they 
listen to Twitter, or subscribe to these mailing lists.  This is how 
opinions are formed in the minds of analysts and they then influence 
their customers.  We need to do a better job, and yet again, this is why 
understanding what Twitter and Facebook are or are not doing with 
Cassandra is important.
>> For example, do I *really* need Cassandra if MySQL will work for me and I
>> just want to get up and running quickly without writing a bunch of code?  My
>> team was pushing greater than 20k updates per second into, GASP, Oracle 5
>> years ago.  Sure, it was expensive.  But it worked.  And it was worth it -
>> or we wouldn't have spent the $$.  What's your data worth if you don't have
>> your data? zero.
>>
>>      
> Had you spent any time on the irc channel you would've seen this
> advice given repeatedly.  If you don't need what Cassandra does, don't
> use it.  That you have seen 20k updates/sec on really expensive
> hardware with a SQL store is neither surprising nor relevant.  As you
> must realize, those choose to ignore, Cassandra is about more than
> just high, per-node write throughput.  It is about seamless scale-out
> of a single cluster, robustness in the face of node failure and
> network partition, etc.  Can you do that with a SQL store?  Certainly.
>   Expect to pay 5x in hardware and not be able to operate multi-DC.
> It's what folks call a trade-off.
>    

So that's a trade-off?  Thanks - maybe Facebook and Twitter missed that 
before spending hundreds of thousands of $$ on a project only to later 
change course.  Include opportunity cost in that, and you're easily in 
the millions of wasted $- or do we call that a 'learning exercise?'  I'd 
love to hear what Twitter & Facebook's boards (there I am again with 
that whole pesky 'business' thing again) had to say about that?  And I'm 
assuming that the same thing might just happen to a tech team that chose 
to spend valuable cycles on evaluating/implementing Cassandra only to 
change course - they'd have to explain that as well.  And then they'd 
hear something like, "Dudes, you did what?  Even Facebook & Twitter 
decided not to use Cassandra that way!"  This is not as far fetched as 
it sounds.  Someone on my advisory board asked me a very similar 
question about our use of Cassandra and given the recent news, whether 
or not that impacted our plans.

And I'm assuming that if you're going to frantically wave arms with "SQL 
costs 5x more and you can't do that multi-DC..." that you've got 
something to back that up?  'Cuz Facebook is using a SQL store, they're 
using it multi-DC, and they're running on commodity hardware, right?


>> And then there's support - internal support.  Picking a database du-jour is
>> organizationally expensive.  Especially when there's probably one or two
>> databases that Twitter could have bought off the shelf that would have
>> solved their problems.
>>      
> You have no idea what their actual problems are and are merely
> engaging in the favorite game of HN and similar venues: armchair
> engineering.
>
>    
Sure I do.  But from a business perspective.  Their architecture doesn't 
scale right now very well.  They're running with reduced API limits and 
you still get the 'fail whale' more than occasionally.  People lose 
followers.  People lose tweets.  Privacy has been compromised.  Need I 
go on?  All of this would make me, as a potential customer of Twitter, 
as a question, "So, what's up with the scalability thing?  What happens 
if I miss a critical time window with my sponsored Tweets?  Do I get 
that $ back?, I didn't get 'imprints' but the opportunity is gone."  But 
you're right, from an engineering point of view, I have no idea what 
their problems are.  I do know that Cassandra was supposed to fix some 
of them, and now it's not and I don't know anything about that from an 
engineering point of view either.

Also, I have no idea of what 'HN or similar venues' refers to.

> b
>    

Re: TechCrunch article on Twitter and Cassandra

Posted by Benjamin Black <b...@b3k.us>.
On Sat, Jul 10, 2010 at 12:22 PM, Colin Clark
<co...@cloudeventprocessing.com> wrote:
>
> Although I'm a fan of Cassandra, there's no way I'd use it today for my tier
> 1 deployments, because I don't have the resources of Facebook, and even
> though Cassandra is open source, that doesn't mean I can fix it when it goes
> down.  And, because it's open source, there's no one to call to have it
> fixed reliably and within production constraints.  Cassandra's strength is
> its greatest weakness right now.
>

There are others, however, who do have the skills not just to fix it
when it goes down, but to improve the code in a variety of ways and
contribute that code back the the project.  That you do not have those
skills is a good indication you should stick to what you know, not an
indictment of Cassandra (or any other non-SQL store).

> The bloom is starting to come off NoSQL, which is normal - it means that
> people & firms are trying to do more with it and most probably realizing
> that all of the tools, support, infrastructure, etc. surrounding alternative
> solutions isn't such a bad thing.  And that the world of NoSQL had start to
> come up with a better mantra than "joins are bad, dude", and "you're just
> protecting the status quo."  There's a *lot more* big data wrapped up inside
> of SQL databases and only a fraction of the in NoSQL - and there's a lot of
> reasons for it.
>

You are, for whatever reason, using the dullest of cliches as if they
were informed opinion.  Nobody with actual knowledge of the space says
"joins are bad, dude".  What they might say is "When you have
petabytes and low latency requirements, joins are an expensive
proposition".  That is clearly a true statement and constructing
indices in a column store to avoid joins is a reasonable decision to
avoid that expense.  Is it free?  Of course not, nothing is.

> For example, do I *really* need Cassandra if MySQL will work for me and I
> just want to get up and running quickly without writing a bunch of code?  My
> team was pushing greater than 20k updates per second into, GASP, Oracle 5
> years ago.  Sure, it was expensive.  But it worked.  And it was worth it -
> or we wouldn't have spent the $$.  What's your data worth if you don't have
> your data? zero.
>

Had you spent any time on the irc channel you would've seen this
advice given repeatedly.  If you don't need what Cassandra does, don't
use it.  That you have seen 20k updates/sec on really expensive
hardware with a SQL store is neither surprising nor relevant.  As you
must realize, those choose to ignore, Cassandra is about more than
just high, per-node write throughput.  It is about seamless scale-out
of a single cluster, robustness in the face of node failure and
network partition, etc.  Can you do that with a SQL store?  Certainly.
 Expect to pay 5x in hardware and not be able to operate multi-DC.
It's what folks call a trade-off.

> And then there's support - internal support.  Picking a database du-jour is
> organizationally expensive.  Especially when there's probably one or two
> databases that Twitter could have bought off the shelf that would have
> solved their problems.

You have no idea what their actual problems are and are merely
engaging in the favorite game of HN and similar venues: armchair
engineering.


b

Re: TechCrunch article on Twitter and Cassandra

Posted by Eric Evans <ee...@rackspace.com>.
On Sun, 2010-07-11 at 01:06 +0530, Sumit Datta wrote:
> What I do not see are details as to why Cassandra is not being used to
> store tweets. Or the details of the implementation that does have
> Cassandra. 

I wouldn't let that stop you. You should consider doing what so many
others are: treat all of the wild-eyed speculation as fact.

-- 
Eric Evans
eevans@rackspace.com


Re: TechCrunch article on Twitter and Cassandra

Posted by Jason Dixon <jd...@omniti.com>.
On Sun, Jul 11, 2010 at 01:06:31AM +0530, Sumit Datta wrote:
> Hello,
> I have been a silent spectator in this list for a long while, and
> while I like reading much mail traffic, this one I thought I should
> reply to.
> You know what I see in all this? More "Twitter" and "Facebook" than
> "Cassandra". Are we here to discuss them or the software?
> What I do not see are details as to why Cassandra is not being used to
> store tweets. Or the details of the implementation that does have
> Cassandra.
> I for one would love to see discussion on code, design, architecture.
> Not just thoughts on company's decisions. Why Twitter took those
> decisions is something we should not guess and tell.

Seriously, you should plan to attend Surge.  This is exactly what the
conference is all about.  People talking about technological growth,
failures that forced a re-engineer, and examples of working solutions.

http://omniti.com/surge/2010

-- 
Jason Dixon
OmniTI Computer Consulting, Inc.
jdixon@omniti.com
443.325.1357 x.241

Re: TechCrunch article on Twitter and Cassandra

Posted by Brett Thomas <br...@gmail.com>.
Idunno, I think understanding those companies' decisions is extremely
relevant for anybody working with cassandra. I really like this thread and
hope it keeps going.

On Jul 10, 2010 3:38 PM, "Sumit Datta" <su...@gmail.com> wrote:

Hello,
I have been a silent spectator in this list for a long while, and
while I like reading much mail traffic, this one I thought I should
reply to.
You know what I see in all this? More "Twitter" and "Facebook" than
"Cassandra". Are we here to discuss them or the software?
What I do not see are details as to why Cassandra is not being used to
store tweets. Or the details of the implementation that does have
Cassandra.
I for one would love to see discussion on code, design, architecture.
Not just thoughts on company's decisions. Why Twitter took those
decisions is something we should not guess and tell.
Thanks
--
Sumit Datta
brainless.in

Re: TechCrunch article on Twitter and Cassandra

Posted by Sumit Datta <su...@gmail.com>.
Hello,
I have been a silent spectator in this list for a long while, and
while I like reading much mail traffic, this one I thought I should
reply to.
You know what I see in all this? More "Twitter" and "Facebook" than
"Cassandra". Are we here to discuss them or the software?
What I do not see are details as to why Cassandra is not being used to
store tweets. Or the details of the implementation that does have
Cassandra.
I for one would love to see discussion on code, design, architecture.
Not just thoughts on company's decisions. Why Twitter took those
decisions is something we should not guess and tell.
Thanks
--
Sumit Datta
brainless.in

Re: TechCrunch article on Twitter and Cassandra

Posted by Colin Clark <co...@cloudeventprocessing.com>.
I'm not aware of anyone classifying what twitter is doing today as 
'working.'  In fact, I believe that twitter's problems are much larger 
than just technology but that's a whole different subject.

What twitter may have realized is that they don't have the resources of 
Facebook, that Facebook's use case is fairly limited (although a large 
deployment), and that they may have been trudging off into the great 
unknown.

Although I'm a fan of Cassandra, there's no way I'd use it today for my 
tier 1 deployments, because I don't have the resources of Facebook, and 
even though Cassandra is open source, that doesn't mean I can fix it 
when it goes down.  And, because it's open source, there's no one to 
call to have it fixed reliably and within production constraints.  
Cassandra's strength is its greatest weakness right now.

The bloom is starting to come off NoSQL, which is normal - it means that 
people & firms are trying to do more with it and most probably realizing 
that all of the tools, support, infrastructure, etc. surrounding 
alternative solutions isn't such a bad thing.  And that the world of 
NoSQL had start to come up with a better mantra than "joins are bad, 
dude", and "you're just protecting the status quo."  There's a *lot 
more* big data wrapped up inside of SQL databases and only a fraction of 
the in NoSQL - and there's a lot of reasons for it.

For example, do I *really* need Cassandra if MySQL will work for me and 
I just want to get up and running quickly without writing a bunch of 
code?  My team was pushing greater than 20k updates per second into, 
GASP, Oracle 5 years ago.  Sure, it was expensive.  But it worked.  And 
it was worth it - or we wouldn't have spent the $$.  What's your data 
worth if you don't have your data? zero.

And then there's support - internal support.  Picking a database du-jour 
is organizationally expensive.  Especially when there's probably one or 
two databases that Twitter could have bought off the shelf that would 
have solved their problems.  But instead of bolstering the reliability 
and robustness of their internal architecture, they've gone and used 
very expensive equity for acquisitions.   Running multiple databases in 
a fault tolerant, geographically disperse deployment isn't easy (yes, 
I've done it) and having multiple databases in the mix really 
complicates things.  And at this stage in Twitter's growth, I frankly 
don't understand why they're looking to complicate their technological 
landscape any more than absolutely required.

So, this entire rant can be summarized really quite succinctly:

"If data is your business (like Facebook & Twitter), if you don't have 
the resources to cost effectively handle all of your data management 
needs internally (Facebook does, Twitter doesn't), then basing your 
solution on un-proven storage solutions (commercial or open source, SQL 
or NoSQL) is a risky and short sighted strategy."

Please send death threats via the channels iterated below:


Colin
+1 315 886 3422 cell
+1 701 212 4314 office
http://blog.cloudeventprocessing.com
http://twitter.com/EventCloudPro <http://twitter.com/EventCloudPro%20>

On 7/10/2010 2:02 PM, Ryan King wrote:
> On Sat, Jul 10, 2010 at 10:33 AM, Marty Greenia<ma...@gmail.com>  wrote:
>    
>> It almost seems counter-intuitive. For analytics, you'd think they'd want a
>> database that supports more sophisticated query functionality (sql). Whereas
>> for everyday tweet storage, something fast and high-throughput (cassandra)
>> makes sense.
>>
>> I'd be curious to here the details as well.
>>      
> These decisions aren't made in a vacuum. One of these use cases has an
> existing system that works, one doesn't.
>
> -ryan
>    

Re: TechCrunch article on Twitter and Cassandra

Posted by Ryan King <ry...@twitter.com>.
On Sat, Jul 10, 2010 at 10:33 AM, Marty Greenia <ma...@gmail.com> wrote:
> It almost seems counter-intuitive. For analytics, you'd think they'd want a
> database that supports more sophisticated query functionality (sql). Whereas
> for everyday tweet storage, something fast and high-throughput (cassandra)
> makes sense.
>
> I'd be curious to here the details as well.

These decisions aren't made in a vacuum. One of these use cases has an
existing system that works, one doesn't.

-ryan

Re: TechCrunch article on Twitter and Cassandra

Posted by Dan Di Spaltro <da...@gmail.com>.
This sounds more like high-throughput external analytics, aka they
will know all the queries consumers will use.  This isn't for internal
analytics.

On Sat, Jul 10, 2010 at 10:33 AM, Marty Greenia <ma...@gmail.com> wrote:
> It almost seems counter-intuitive. For analytics, you'd think they'd want a
> database that supports more sophisticated query functionality (sql). Whereas
> for everyday tweet storage, something fast and high-throughput (cassandra)
> makes sense.
>
> I'd be curious to here the details as well.
>
> On Sat, Jul 10, 2010 at 10:25 AM, S Ahmed <sa...@gmail.com> wrote:
>>
>> Nice link.
>> From what I understood, they are not using it to store tweets but rather
>> will use mysql?  I wish they went into more detail as to why...
>>
>> On Sat, Jul 10, 2010 at 1:25 AM, Kochheiser,Todd W - TOK-DITT-1
>> <tw...@bpa.gov> wrote:
>>>
>>> A good read.
>>>
>>> http://techcrunch.com/2010/07/09/twitter-analytics-mysql/
>>>
>>> Todd
>
>



-- 
Dan Di Spaltro

Re: TechCrunch article on Twitter and Cassandra

Posted by Marty Greenia <ma...@gmail.com>.
It almost seems counter-intuitive. For analytics, you'd think they'd want a
database that supports more sophisticated query functionality (sql). Whereas
for everyday tweet storage, something fast and high-throughput (cassandra)
makes sense.

I'd be curious to here the details as well.

On Sat, Jul 10, 2010 at 10:25 AM, S Ahmed <sa...@gmail.com> wrote:

> Nice link.
>
> From what I understood, they are not using it to store tweets but rather
> will use mysql?  I wish they went into more detail as to why...
>
>
> On Sat, Jul 10, 2010 at 1:25 AM, Kochheiser,Todd W - TOK-DITT-1 <
> twkochheiser@bpa.gov> wrote:
>
>> A good read.
>>
>> http://techcrunch.com/2010/07/09/twitter-analytics-mysql/
>>
>> Todd
>
>
>

Re: TechCrunch article on Twitter and Cassandra

Posted by S Ahmed <sa...@gmail.com>.
Nice link.

>From what I understood, they are not using it to store tweets but rather
will use mysql?  I wish they went into more detail as to why...

On Sat, Jul 10, 2010 at 1:25 AM, Kochheiser,Todd W - TOK-DITT-1 <
twkochheiser@bpa.gov> wrote:

> A good read.
>
> http://techcrunch.com/2010/07/09/twitter-analytics-mysql/
>
> Todd