You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Robin Bowes <ro...@robinbowes.com> on 2010/08/12 10:28:46 UTC

cassandra increment counters, Jira #1072

Hi Jonathan,

I'm contacting you in your capacity as project lead for the cassandra
project. I am wondering how close ticket #1072 is to implementation [1]

We are about to do a proof of concept with cassandra to replace around
20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
master/slave in DC B).

We're essentially just counting web hits - around 10k/second at peak
times - so increment counters is pretty much essential functionality for us.

How close is the patch in #1072 to being acceptable? What is blocking it?

Thanks,

R.

[1] https://issues.apache.org/jira/browse/CASSANDRA-1072

Re: cassandra increment counters, Jira #1072

Posted by Kelvin Kakugawa <ka...@gmail.com>.

If anyone on the thread hasn't read Helland's Building on Quicksand
paper, yet, it discusses the abstract strategy behind #1072's
distributed counter implementation.  It's here:
http://blogs.msdn.com/b/pathelland/archive/2008/12/12/building-on-quicksand-paper-for-cidr-conference-on-innovative-database-research.aspx

The relevant sections are 5 & 6 (around 3-4 pages).  I would highly
recommend reviewing it.

Let me clarify some issues for the thread:
#580 (version vectors) and Cages (distributed locks via ZK; see:
http://code.google.com/p/cages/) both require a read before every
write.

If you read the Helland paper, he postulates that idempotence is only
required between every partition--not every update.  #1072 coalesces
commutative operations on each partition (i.e. replica) and
idempotently repairs them between partitions / replicas.  The
alternative approach proposed, inserting UUID columns into a row,
needs to repair every update between replicas.  Unless there is an
aggregation step that I'm not seeing.  ntm, I'm w/ Ben in that I'd
rather not push the aggregation step back up to the client.

#1072 does provide a building block--distributed commutative
operations.  Addition is just the most salient commutative operation.
However, I can imagine that there are many other commutative
operations that ppl would find useful to perform at scale.

#1072 does fit into the EC model.  Granted, the level of consistency
is not tunable, though.  It's fixed at CL.ONE writes.  However, any
use case that relies on CL.ONE writes requires CL.ALL reads for strong
consistency.  If you're willing to tolerate a certain amount of
inconsistency, then CL.ONE reads are fine.  At Digg, we do CL.ONE
writes and CL.ONE reads and tolerate a certain amount of
inconsistency.

And, as Sylvain brought up on the #1072 issue, we have a variant of
read repair, called: repair-on-write.  Where, on write, the current
replica's count is read, then written to the other replicas.  It's not
implemented on the patch submitted to the issue, but it's been
discussed and implemented, elsewhere.

-Kelvin

On Fri, Aug 13, 2010 at 8:49 AM, Benjamin Black <b...@b3k.us> wrote:
> On Fri, Aug 13, 2010 at 6:24 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>>
>>> This is simply not an acceptable alternative and just can't be called
>>> handling it "well".
>>
>> What part is it handling poorly, at a technical level?  This is almost
>> exactly what 1072 does internally -- we are concerned here with the
>> high write, low read volume case.
>>
>
> Requiring clients directly manage the counter rows in order to
> periodically compress or segment them.  Yes, you can emulate the
> behavior.  No, that is not handling it well.
>
>>>  It is equivalent to "make the users do it", which
>>> is the case for almost anything.
>>
>> I strongly feel we should be in the business of providing building
>> blocks, not special cases on top of that.  (But see below, I *do*
>> think the 580 version vectors is the kind of building block we want!)
>>
>
> I agree, 580 is really valuable and should be in.  The problem for
> high write rate, distributed counters is the requirement of read
> before write inherent in such vector-based approaches.  Am I missing
> some aspect of 580 that precludes that?
>
>>>  The reasons #1072 is so valuable:
>>>
>>> 1) Does not require _any_ user action.
>>
>> This can be addressed at the library level.  Just as our first stab at
>> ZK integration was a rather clunky patch; "cages" is better.
>>
>
> Certainly, but it would be hard to argue (and I am not) that the
> tightly synchronized behavior of ZK is a good match for Cassandra
> (mixing in Paxos could make for some neat options, but that's another
> debate...).
>
>>> 2) Does not change the EC-centric model of Cassandra.
>>
>> It does, though.  1072 is *not* a version vector-based approach --
>> that would be 580.  Read the 1072 design doc, if you haven't.  (Thanks
>> to Kelvin for writing that up!)
>>
>
> Nor is Cassandra right now.  I know 1072 isn't vector based, and I
> think that is in its favor _for this application_.
>
>> I'm referring in particular to reads requiring CL.ALL.  (My
>> understanding is that in the previous design, a "master" replica was
>> chosen and was always written to first.)  Both of these break "the
>> EC-centric model" and that is precisely the objection I made when I
>> said "ConsistencyLevel is not respected."  I don't think this is
>> fixable in the 1072 approach.  I would be thrilled to be wrong.
>>
>
> It is EC in that the total for a counter is unknown until resolved on
> read.  Yes, it does not respect CL, but since it can only be used in 1
> way, I don't see that as a disadvantage.
>
>>>> The second is that the approach in 1072 resembles an entirely separate
>>>> system that happens to use part of Cassandra infrastructure -- the
>>>> thrift API, the MessagingService, the sstable format -- but isn't
>>>> really part of it.  ConsistencyLevel is not respected, and special
>>>> cases abound to weld things in that don't fit, e.g. the AES/Streaming
>>>> business.
>>>
>>> Then let's find ways to make it as elegant as it can be.  Ultimately,
>>> this functionality needs to be in Cassandra or users will simply
>>> migrate someplace else for this extremely common use case.
>>
>> This is what I've been pushing for.  The version vector approach to
>> counting (i.e. 580 as opposed to 1072) is exactly the more elegant,
>> EC-centric approach that addresses a case that we *don't* currently
>> handle well (counters with a higher read volume than 1072).
>>
>
> Perhaps I missed something: does counting 580 require read before
> counter update (local to the node, not a client read)?
>
>
> b
>

Re: cassandra increment counters, Jira #1072

Posted by Benjamin Black <b...@b3k.us>.

On Fri, Aug 13, 2010 at 6:24 AM, Jonathan Ellis <jb...@gmail.com> wrote:
>>
>> This is simply not an acceptable alternative and just can't be called
>> handling it "well".
>
> What part is it handling poorly, at a technical level?  This is almost
> exactly what 1072 does internally -- we are concerned here with the
> high write, low read volume case.
>

Requiring clients directly manage the counter rows in order to
periodically compress or segment them.  Yes, you can emulate the
behavior.  No, that is not handling it well.

>>  It is equivalent to "make the users do it", which
>> is the case for almost anything.
>
> I strongly feel we should be in the business of providing building
> blocks, not special cases on top of that.  (But see below, I *do*
> think the 580 version vectors is the kind of building block we want!)
>

I agree, 580 is really valuable and should be in.  The problem for
high write rate, distributed counters is the requirement of read
before write inherent in such vector-based approaches.  Am I missing
some aspect of 580 that precludes that?

>>  The reasons #1072 is so valuable:
>>
>> 1) Does not require _any_ user action.
>
> This can be addressed at the library level.  Just as our first stab at
> ZK integration was a rather clunky patch; "cages" is better.
>

Certainly, but it would be hard to argue (and I am not) that the
tightly synchronized behavior of ZK is a good match for Cassandra
(mixing in Paxos could make for some neat options, but that's another
debate...).

>> 2) Does not change the EC-centric model of Cassandra.
>
> It does, though.  1072 is *not* a version vector-based approach --
> that would be 580.  Read the 1072 design doc, if you haven't.  (Thanks
> to Kelvin for writing that up!)
>

Nor is Cassandra right now.  I know 1072 isn't vector based, and I
think that is in its favor _for this application_.

> I'm referring in particular to reads requiring CL.ALL.  (My
> understanding is that in the previous design, a "master" replica was
> chosen and was always written to first.)  Both of these break "the
> EC-centric model" and that is precisely the objection I made when I
> said "ConsistencyLevel is not respected."  I don't think this is
> fixable in the 1072 approach.  I would be thrilled to be wrong.
>

It is EC in that the total for a counter is unknown until resolved on
read.  Yes, it does not respect CL, but since it can only be used in 1
way, I don't see that as a disadvantage.

>>> The second is that the approach in 1072 resembles an entirely separate
>>> system that happens to use part of Cassandra infrastructure -- the
>>> thrift API, the MessagingService, the sstable format -- but isn't
>>> really part of it.  ConsistencyLevel is not respected, and special
>>> cases abound to weld things in that don't fit, e.g. the AES/Streaming
>>> business.
>>
>> Then let's find ways to make it as elegant as it can be.  Ultimately,
>> this functionality needs to be in Cassandra or users will simply
>> migrate someplace else for this extremely common use case.
>
> This is what I've been pushing for.  The version vector approach to
> counting (i.e. 580 as opposed to 1072) is exactly the more elegant,
> EC-centric approach that addresses a case that we *don't* currently
> handle well (counters with a higher read volume than 1072).
>

Perhaps I missed something: does counting 580 require read before
counter update (local to the node, not a client read)?

b

Re: cassandra increment counters, Jira #1072

Posted by Jonathan Ellis <jb...@gmail.com>.

On Fri, Aug 13, 2010 at 1:11 AM, Benjamin Black <b...@b3k.us> wrote:
> On Thu, Aug 12, 2010 at 8:54 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> There are two concerns that give me pause.
>>
>> The first is that 1072 is tackling a use case that Cassandra already
>> handles well: high volume of writes to a counter, with low volume
>> reads.  (This can be done by inserting uuids into a counter row, and
>> aggregating them either in the background or at read time or with some
>> combination of these.  The counter rows can be sharded if necessary.)
>>
>
> This is simply not an acceptable alternative and just can't be called
> handling it "well".

What part is it handling poorly, at a technical level?  This is almost
exactly what 1072 does internally -- we are concerned here with the
high write, low read volume case.

>  It is equivalent to "make the users do it", which
> is the case for almost anything.

I strongly feel we should be in the business of providing building
blocks, not special cases on top of that.  (But see below, I *do*
think the 580 version vectors is the kind of building block we want!)

>  The reasons #1072 is so valuable:
>
> 1) Does not require _any_ user action.

This can be addressed at the library level.  Just as our first stab at
ZK integration was a rather clunky patch; "cages" is better.

> 2) Does not change the EC-centric model of Cassandra.

It does, though.  1072 is *not* a version vector-based approach --
that would be 580.  Read the 1072 design doc, if you haven't.  (Thanks
to Kelvin for writing that up!)

I'm referring in particular to reads requiring CL.ALL.  (My
understanding is that in the previous design, a "master" replica was
chosen and was always written to first.)  Both of these break "the
EC-centric model" and that is precisely the objection I made when I
said "ConsistencyLevel is not respected."  I don't think this is
fixable in the 1072 approach.  I would be thrilled to be wrong.

>> The second is that the approach in 1072 resembles an entirely separate
>> system that happens to use part of Cassandra infrastructure -- the
>> thrift API, the MessagingService, the sstable format -- but isn't
>> really part of it.  ConsistencyLevel is not respected, and special
>> cases abound to weld things in that don't fit, e.g. the AES/Streaming
>> business.
>
> Then let's find ways to make it as elegant as it can be.  Ultimately,
> this functionality needs to be in Cassandra or users will simply
> migrate someplace else for this extremely common use case.

This is what I've been pushing for.  The version vector approach to
counting (i.e. 580 as opposed to 1072) is exactly the more elegant,
EC-centric approach that addresses a case that we *don't* currently
handle well (counters with a higher read volume than 1072).

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: cassandra increment counters, Jira #1072

Posted by Benjamin Black <b...@b3k.us>.

On Thu, Aug 12, 2010 at 8:54 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> There are two concerns that give me pause.
>
> The first is that 1072 is tackling a use case that Cassandra already
> handles well: high volume of writes to a counter, with low volume
> reads.  (This can be done by inserting uuids into a counter row, and
> aggregating them either in the background or at read time or with some
> combination of these.  The counter rows can be sharded if necessary.)
>

This is simply not an acceptable alternative and just can't be called
handling it "well".  It is equivalent to "make the users do it", which
is the case for almost anything.  The reasons #1072 is so valuable:

1) Does not require _any_ user action.
2) Does not change the EC-centric model of Cassandra.
3) Meets the requirements of many, major users who would otherwise
have to use another storage system.

> The second is that the approach in 1072 resembles an entirely separate
> system that happens to use part of Cassandra infrastructure -- the
> thrift API, the MessagingService, the sstable format -- but isn't
> really part of it.  ConsistencyLevel is not respected, and special
> cases abound to weld things in that don't fit, e.g. the AES/Streaming
> business.
>

Then let's find ways to make it as elegant as it can be.  Ultimately,
this functionality needs to be in Cassandra or users will simply
migrate someplace else for this extremely common use case.

b

Re: cassandra increment counters, Jira #1072

Posted by Ben Standefer <be...@gmail.com>.

Interesting idea with the counter row approach.  I think it puts a dubious
responsibility on the Cassandra user.  Sure, Cassandra users are expected to
maintain the size of a row, but asking Cassandra users to constantly
aggregate counts of uuids in a situation where the rows are growing rapidly
to maintain a counter seems out of the realm of the average Cassandra end
user.

My napkin math may be slightly off, but if a "counter row aggregator"
stopped functioning, crashed, or didn't do it's job correctly on a counter
receiving 2,000 increments per second, you end up with a single row at
>2.57GB after 24 hours (2,000/sec x 86,400 seconds x 16 bytes per uuid).
 This is approaches the magnitude of memory on a single node and would seem
(to me?) to significantly impact load and load distribution.  Maybe there is
a way Cassandra could perform the counter row aggregation internally (with
read repair?) and offer it to end users as a clean, simple, intuitive
interface.

I have never thought counters were something Cassandra handles well.  If
there is not a satisfactory way to integrate counter into the Cassandra
internals, I think it'd be great for somebody in-the-know to provide
in-depth and detailed documentation on best practices for how to implement
counters.  I think distributed and scalable counters can be a killer app for
Cassandra, and circumventing locking systems such as ZooKeeper is key.

Disclaimer: I'm not quite a Cassandra developer, more of an Ops guy and
user, just trying to add perspective.  I do not want a pony.

-Ben Standefer

On Thu, Aug 12, 2010 at 8:54 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> There are two concerns that give me pause.
>
> The first is that 1072 is tackling a use case that Cassandra already
> handles well: high volume of writes to a counter, with low volume
> reads.  (This can be done by inserting uuids into a counter row, and
> aggregating them either in the background or at read time or with some
> combination of these.  The counter rows can be sharded if necessary.)
>
> The second is that the approach in 1072 resembles an entirely separate
> system that happens to use part of Cassandra infrastructure -- the
> thrift API, the MessagingService, the sstable format -- but isn't
> really part of it.  ConsistencyLevel is not respected, and special
> cases abound to weld things in that don't fit, e.g. the AES/Streaming
> business.
>
> On Thu, Aug 12, 2010 at 1:28 AM, Robin Bowes <ro...@robinbowes.com>
> wrote:
> > Hi Jonathan,
> >
> > I'm contacting you in your capacity as project lead for the cassandra
> > project. I am wondering how close ticket #1072 is to implementation [1]
> >
> > We are about to do a proof of concept with cassandra to replace around
> > 20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
> > master/slave in DC B).
> >
> > We're essentially just counting web hits - around 10k/second at peak
> > times - so increment counters is pretty much essential functionality for
> us.
> >
> > How close is the patch in #1072 to being acceptable? What is blocking it?
> >
> > Thanks,
> >
> > R.
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-1072
> >
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: cassandra increment counters, Jira #1072

Posted by Jonathan Ellis <jb...@gmail.com>.

There are two concerns that give me pause.

The first is that 1072 is tackling a use case that Cassandra already
handles well: high volume of writes to a counter, with low volume
reads.  (This can be done by inserting uuids into a counter row, and
aggregating them either in the background or at read time or with some
combination of these.  The counter rows can be sharded if necessary.)

The second is that the approach in 1072 resembles an entirely separate
system that happens to use part of Cassandra infrastructure -- the
thrift API, the MessagingService, the sstable format -- but isn't
really part of it.  ConsistencyLevel is not respected, and special
cases abound to weld things in that don't fit, e.g. the AES/Streaming
business.

On Thu, Aug 12, 2010 at 1:28 AM, Robin Bowes <ro...@robinbowes.com> wrote:
> Hi Jonathan,
>
> I'm contacting you in your capacity as project lead for the cassandra
> project. I am wondering how close ticket #1072 is to implementation [1]
>
> We are about to do a proof of concept with cassandra to replace around
> 20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
> master/slave in DC B).
>
> We're essentially just counting web hits - around 10k/second at peak
> times - so increment counters is pretty much essential functionality for us.
>
> How close is the patch in #1072 to being acceptable? What is blocking it?
>
> Thanks,
>
> R.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-1072
>
>

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: cassandra increment counters, Jira #1072

Posted by Dave Revell <da...@meebo-inc.com>.

For what it's worth, my team would very much like to see counters in trunk.

Right now we're trying to think of ways to implement counters by inserting
columns and counting the sizes of slices, and it seems difficult to do it
quickly and correctly at scale, even with low consistency.

-Dave

On Thu, Aug 12, 2010 at 4:31 PM, Colin Taylor <co...@gmail.com>wrote:

> Would it help prioritizing  if silent majority chimed in if keen on
> this functionality which is so key to large scale analytical apps?
> in which case  :
>
> +1
>
> Although perhaps I should encourage signing up on jira and vote there.
>
> https://issues.apache.org/jira/secure/Signup!default.jspa<https://issues.apache.org/jira/secure/Signup%21default.jspa>
> https://issues.apache.org/jira/browse/CASSANDRA-1072
>
> [We intend counting various attributes of the 100 million documents
> coming through our system a day]
>
> On Fri, Aug 13, 2010 at 11:15 AM, Benjamin Black <b...@b3k.us> wrote:
> > On Thu, Aug 12, 2010 at 10:23 AM, Kelvin Kakugawa <ka...@gmail.com>
> wrote:
> >>
> >> I think the underlying unanswered question is whether #1072 is a niche
> >> feature or whether it should be brought into trunk.
> >>
> >
> > This should not be an unanswered question!  #1072 should be considered
> > essential, as it enables numerous use cases that currently require
> > bolting something like memcache or redis onto the side to handle
> > counters.
> >
> > +100000000 on getting this into trunk ASAP.
> >
> >
> > b
> >
>

Re: cassandra increment counters, Jira #1072

Posted by Lenin Gali <ga...@gmail.com>.

+1M , We need this too.

Lenin Gali
Dir, Infrastructure and BI

Cell:513.382.3371
lenin@sharethis.com
1883 Landings Drive,
Mountain View CA 94043
Contact Me [image: Linkedin] <http://www.linkedin.com/in/leningali>[image:
Twitter] <leningali>

--- @ WiseStamp
Signature<http://my.wisestamp.com/link?u=77hbwcdby98krxxq&site=www.wisestamp.com/email-install>.
Get it now<http://my.wisestamp.com/link?u=77hbwcdby98krxxq&site=www.wisestamp.com/email-install>


On Thu, Aug 12, 2010 at 4:31 PM, Colin Taylor <co...@gmail.com>wrote:

> Would it help prioritizing  if silent majority chimed in if keen on
> this functionality which is so key to large scale analytical apps?
> in which case  :
>
> +1
>
> Although perhaps I should encourage signing up on jira and vote there.
>
> https://issues.apache.org/jira/secure/Signup!default.jspa<https://issues.apache.org/jira/secure/Signup%21default.jspa>
> https://issues.apache.org/jira/browse/CASSANDRA-1072
>
> [We intend counting various attributes of the 100 million documents
> coming through our system a day]
>
> On Fri, Aug 13, 2010 at 11:15 AM, Benjamin Black <b...@b3k.us> wrote:
> > On Thu, Aug 12, 2010 at 10:23 AM, Kelvin Kakugawa <ka...@gmail.com>
> wrote:
> >>
> >> I think the underlying unanswered question is whether #1072 is a niche
> >> feature or whether it should be brought into trunk.
> >>
> >
> > This should not be an unanswered question!  #1072 should be considered
> > essential, as it enables numerous use cases that currently require
> > bolting something like memcache or redis onto the side to handle
> > counters.
> >
> > +100000000 on getting this into trunk ASAP.
> >
> >
> > b
> >
>



-- 
twitter: leningali
skype: galilenin
Cell:513.382.3371

Re: cassandra increment counters, Jira #1072

Posted by Colin Taylor <co...@gmail.com>.

Would it help prioritizing  if silent majority chimed in if keen on
this functionality which is so key to large scale analytical apps?
in which case  :

+1

Although perhaps I should encourage signing up on jira and vote there.

https://issues.apache.org/jira/secure/Signup!default.jspa
https://issues.apache.org/jira/browse/CASSANDRA-1072

[We intend counting various attributes of the 100 million documents
coming through our system a day]

On Fri, Aug 13, 2010 at 11:15 AM, Benjamin Black <b...@b3k.us> wrote:
> On Thu, Aug 12, 2010 at 10:23 AM, Kelvin Kakugawa <ka...@gmail.com> wrote:
>>
>> I think the underlying unanswered question is whether #1072 is a niche
>> feature or whether it should be brought into trunk.
>>
>
> This should not be an unanswered question!  #1072 should be considered
> essential, as it enables numerous use cases that currently require
> bolting something like memcache or redis onto the side to handle
> counters.
>
> +100000000 on getting this into trunk ASAP.
>
>
> b
>

Re: cassandra increment counters, Jira #1072

Posted by Benjamin Black <b...@b3k.us>.

On Thu, Aug 12, 2010 at 10:23 AM, Kelvin Kakugawa <ka...@gmail.com> wrote:
>
> I think the underlying unanswered question is whether #1072 is a niche
> feature or whether it should be brought into trunk.
>

This should not be an unanswered question!  #1072 should be considered
essential, as it enables numerous use cases that currently require
bolting something like memcache or redis onto the side to handle
counters.

+100000000 on getting this into trunk ASAP.

b

Re: cassandra increment counters, Jira #1072

Posted by Kelvin Kakugawa <ka...@gmail.com>.

Hi Robin,

Johan and I have brought the code up to trunk.  It's ready to be
reviewed.  However, in Jonathan's defense, it does require separate
code paths.  Since, we're aggregating commutative operations, not
updating a value.

I think the underlying unanswered question is whether #1072 is a niche
feature or whether it should be brought into trunk.

-Kelvin

On Thu, Aug 12, 2010 at 1:28 AM, Robin Bowes <ro...@robinbowes.com> wrote:
> Hi Jonathan,
>
> I'm contacting you in your capacity as project lead for the cassandra
> project. I am wondering how close ticket #1072 is to implementation [1]
>
> We are about to do a proof of concept with cassandra to replace around
> 20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
> master/slave in DC B).
>
> We're essentially just counting web hits - around 10k/second at peak
> times - so increment counters is pretty much essential functionality for us.
>
> How close is the patch in #1072 to being acceptable? What is blocking it?
>
> Thanks,
>
> R.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-1072
>
>

Re: cassandra increment counters, Jira #1072

Posted by Robin Bowes <ro...@robinbowes.com>.

On 12/08/10 19:21, Jesse McConnell wrote:
> out of curiosity are you shooting for incrementing these counters 10k
> times a second for sustained periods of time?

Jesse,

Our traffic pattern varies between 5.5k and 10k connections/hits per
second. We currently process the hits and log to MySQL (partitioned
DBs). We're looking into the possibility of using cassandra. I don't
think we'll be sending each hit to the DB individually, ie. 10k hits/sec
won't correspond to 10k updates/sec, but I imagine the counter updates
will be fairly high volume. We'll bottom that out in our initial testing.

R.

RE: cassandra increment counters, Jira #1072

Posted by Viktor Jevdokimov <Vi...@adform.com>.

We're also looking into increment counters with the same load. It will not be periods, it will be constantly.

Viktor


-----Original Message-----
From: Jesse McConnell [mailto:jesse.mcconnell@gmail.com] 
Sent: Thursday, August 12, 2010 9:21 PM
To: dev@cassandra.apache.org
Subject: Re: cassandra increment counters, Jira #1072

out of curiosity are you shooting for incrementing these counters 10k
times a second for sustained periods of time?

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Thu, Aug 12, 2010 at 03:28, Robin Bowes <ro...@robinbowes.com> wrote:
> Hi Jonathan,
>
> I'm contacting you in your capacity as project lead for the cassandra
> project. I am wondering how close ticket #1072 is to implementation [1]
>
> We are about to do a proof of concept with cassandra to replace around
> 20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
> master/slave in DC B).
>
> We're essentially just counting web hits - around 10k/second at peak
> times - so increment counters is pretty much essential functionality for us.
>
> How close is the patch in #1072 to being acceptable? What is blocking it?
>
> Thanks,
>
> R.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-1072
>
>

Re: cassandra increment counters, Jira #1072

Posted by Ryan King <ry...@twitter.com>.

On Thu, Aug 12, 2010 at 11:21 AM, Jesse McConnell
<je...@gmail.com> wrote:
> out of curiosity are you shooting for incrementing these counters 10k
> times a second for sustained periods of time?

Our use cases include 100,000's of increments a second but most of the
values will only be incremented for a relatively short window of time.

This is for a real-time analytics system we're working on for both
business and technical analytics (like system monitoring).

We're hoping to open source this system at some point, but the
architecture is dependent on having distributed counters in cassandra.

-ryan

Re: cassandra increment counters, Jira #1072

Posted by Jesse McConnell <je...@gmail.com>.

out of curiosity are you shooting for incrementing these counters 10k
times a second for sustained periods of time?

cheers,
jesse

--
jesse mcconnell
jesse.mcconnell@gmail.com



On Thu, Aug 12, 2010 at 03:28, Robin Bowes <ro...@robinbowes.com> wrote:
> Hi Jonathan,
>
> I'm contacting you in your capacity as project lead for the cassandra
> project. I am wondering how close ticket #1072 is to implementation [1]
>
> We are about to do a proof of concept with cassandra to replace around
> 20 MySQL partitions (1 partition = 4 machines: master/slave in DC A,
> master/slave in DC B).
>
> We're essentially just counting web hits - around 10k/second at peak
> times - so increment counters is pretty much essential functionality for us.
>
> How close is the patch in #1072 to being acceptable? What is blocking it?
>
> Thanks,
>
> R.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-1072
>
>