You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Chris Goffinet <go...@digg.com> on 2010/04/06 18:54:30 UTC

Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

That's not true. We have been using the Zookeper work we posted on jira. That's what we are using internally and have been for months. We are now just wrapping up our vector clocks + distributed counter patch so we can begin transitioning away from the Zookeeper approach because there are problems with it long-term. 

-Chris

On Apr 6, 2010, at 9:50 AM, Ryan King wrote:

> They don't use cassandra for it yet.
> 
> -ryan
> 
> On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed <sa...@gmail.com> wrote:
>> From what I read in another thread, Cassandra isn't used for isn't 'ideal'
>> for keeping track of counts.
>> For example, I would undertand this to mean keeping track of which stories
>> were dugg.
>> If this is true, how would a site like digg keep track of the 'dugg'
>> counter?
>> Also, I am assuming with eventual consistancy the number *may* not be 100%
>> accurate.  If you wanted it to be accurate, would you just use the Quorom
>> flag? (I believe quorom is to ensure all writes are written to disk)

Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

Posted by Paul Prescod <pr...@gmail.com>.

On Sun, Apr 11, 2010 at 3:30 AM, Mark Robson <ma...@gmail.com> wrote:
> Can we not implement counts by just storing all the deltas in a row, and
> then summing them all up to acheive a count.
>
> If a row ends up with too many deltas, a reader could just summarise the
> deltas occasionally into a single value (in a way which avoids race
> conditions, of course).

How do you avoid the race condition? Don't you need a lock?

 Paul Prescod
 Ayogo, Inc.

Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

Posted by Mark Robson <ma...@gmail.com>.

Can we not implement counts by just storing all the deltas in a row, and
then summing them all up to acheive a count.

If a row ends up with too many deltas, a reader could just summarise the
deltas occasionally into a single value (in a way which avoids race
conditions, of course).

So you'd map

key => { uniqueid: delta1, uniqueid: delta2 }

Every column in Cassandra also has a timestamp, so your app can decide, when
it does a read, which deltas to summarise.

Mark

Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

Posted by Boris Shulman <sh...@gmail.com>.

What will be the latency for the zk based atomic increase?

On Tue, Apr 6, 2010 at 8:22 PM, Chris Goffinet <go...@digg.com> wrote:
> http://issues.apache.org/jira/browse/CASSANDRA-704
> http://issues.apache.org/jira/browse/CASSANDRA-721
> We have our own internal codebase of Cassandra at Digg. But we are using
> those above patches until we have the vector clock work cleaned up, that
> patch will also goto jira. Most likely the vector clock work will go into
> 0.7, but since we run 0.6 and built it for that version, we will share that
> patch too.
> -Chris
> On Apr 6, 2010, at 10:17 AM, S Ahmed wrote:
>
> Chris,
> When you so patch, does that mean for Cassandra or your own internal
> codebase?
> Sounds interesting thanks!
>
> On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet <go...@digg.com> wrote:
>>
>> That's not true. We have been using the Zookeper work we posted on jira.
>> That's what we are using internally and have been for months. We are now
>> just wrapping up our vector clocks + distributed counter patch so we can
>> begin transitioning away from the Zookeeper approach because there are
>> problems with it long-term.
>>
>> -Chris
>>
>> On Apr 6, 2010, at 9:50 AM, Ryan King wrote:
>>
>> > They don't use cassandra for it yet.
>> >
>> > -ryan
>> >
>> > On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed <sa...@gmail.com> wrote:
>> >> From what I read in another thread, Cassandra isn't used for isn't
>> >> 'ideal'
>> >> for keeping track of counts.
>> >> For example, I would undertand this to mean keeping track of which
>> >> stories
>> >> were dugg.
>> >> If this is true, how would a site like digg keep track of the 'dugg'
>> >> counter?
>> >> Also, I am assuming with eventual consistancy the number *may* not be
>> >> 100%
>> >> accurate.  If you wanted it to be accurate, would you just use the
>> >> Quorom
>> >> flag? (I believe quorom is to ensure all writes are written to disk)
>>
>
>
>

Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

Posted by Chris Goffinet <go...@digg.com>.

http://issues.apache.org/jira/browse/CASSANDRA-704
http://issues.apache.org/jira/browse/CASSANDRA-721

We have our own internal codebase of Cassandra at Digg. But we are using those above patches until we have the vector clock work cleaned up, that patch will also goto jira. Most likely the vector clock work will go into 0.7, but since we run 0.6 and built it for that version, we will share that patch too.

-Chris

On Apr 6, 2010, at 10:17 AM, S Ahmed wrote:

> Chris,
> 
> When you so patch, does that mean for Cassandra or your own internal codebase?  
> 
> Sounds interesting thanks!
> 
> On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet <go...@digg.com> wrote:
> That's not true. We have been using the Zookeper work we posted on jira. That's what we are using internally and have been for months. We are now just wrapping up our vector clocks + distributed counter patch so we can begin transitioning away from the Zookeeper approach because there are problems with it long-term.
> 
> -Chris
> 
> On Apr 6, 2010, at 9:50 AM, Ryan King wrote:
> 
> > They don't use cassandra for it yet.
> >
> > -ryan
> >
> > On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed <sa...@gmail.com> wrote:
> >> From what I read in another thread, Cassandra isn't used for isn't 'ideal'
> >> for keeping track of counts.
> >> For example, I would undertand this to mean keeping track of which stories
> >> were dugg.
> >> If this is true, how would a site like digg keep track of the 'dugg'
> >> counter?
> >> Also, I am assuming with eventual consistancy the number *may* not be 100%
> >> accurate.  If you wanted it to be accurate, would you just use the Quorom
> >> flag? (I believe quorom is to ensure all writes are written to disk)
> 
>

Re: if cassandra isn't ideal for keep track of counts, how does digg count diggs?

Posted by S Ahmed <sa...@gmail.com>.

Chris,

When you so patch, does that mean for Cassandra or your own internal
codebase?

Sounds interesting thanks!

On Tue, Apr 6, 2010 at 12:54 PM, Chris Goffinet <go...@digg.com> wrote:

> That's not true. We have been using the Zookeper work we posted on jira.
> That's what we are using internally and have been for months. We are now
> just wrapping up our vector clocks + distributed counter patch so we can
> begin transitioning away from the Zookeeper approach because there are
> problems with it long-term.
>
> -Chris
>
> On Apr 6, 2010, at 9:50 AM, Ryan King wrote:
>
> > They don't use cassandra for it yet.
> >
> > -ryan
> >
> > On Tue, Apr 6, 2010 at 9:00 AM, S Ahmed <sa...@gmail.com> wrote:
> >> From what I read in another thread, Cassandra isn't used for isn't
> 'ideal'
> >> for keeping track of counts.
> >> For example, I would undertand this to mean keeping track of which
> stories
> >> were dugg.
> >> If this is true, how would a site like digg keep track of the 'dugg'
> >> counter?
> >> Also, I am assuming with eventual consistancy the number *may* not be
> 100%
> >> accurate.  If you wanted it to be accurate, would you just use the
> Quorom
> >> flag? (I believe quorom is to ensure all writes are written to disk)
>
>