You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by ad...@pm.me.INVALID on 2019/10/30 14:00:41 UTC

What is the status of counters? Should I use them?

Hi,

I would like to use counters but I am not sure I should.

I read a lot of articles on the Internet how counters are bad / wrong / inaccurate etc etc ...

Let's be honest, counters in Cassandra have quite a bad reputation.

But all stuff I read about that was quite old, I know there was significant improvements in that area especially around 2.1 / 2.2 releases but I can not make my head around so I can definitely be sure if I should use them or not.

The literature I read were:

1) That one elaborates about counters from node-lifecycle perspective and there are still some problems of over / undercounting.

2) This one explains the differences between pre and post 2.1 implementations and suggests that once counter caches are removed, the implementation will be even better and simplified - but I am not sure what is the outcome of this article? It says that all "wrong" implementation of counters (as we knew them in pre 2.x era) was corrected and we should be all good to use it?

3) These guys said that they have not found any bugs ... huh.

So, what is the overall state of counters in 3.11.4 ? (hence 3.11.5)? Would you recommend to use them in production?

My usecase is that I have 2 DCs with 3 nodes each and I have a table where I want to track number number of page visits.

My perception is that "they will be inconsistent and you can not repair it and it is idempotent" but from what I have tested, when I put 1 node down and I brought it back and read it, it was just fine and numbers were good.

So I am not sure if my testing is very naive but the whole mystery about counters and the lack of the authoritative advice what the general status is and where it can go wrong is imho lacking.

Are the links below obsolete? Do I have strong guarantee that counters will "just work"? What are the downsides and why would not you use them? Honestly, after reading a lot about that, I am not trusting counters too much but I am not sure if my opinion is biased based on what I read so far.

Thanks

Links

1) http://datastrophic.io/evaluating-cassandra-2-1-counters-consistency/

2) https://www.datastax.com/blog/2014/05/whats-new-cassandra-21-better-implementation-counters

3) https://www.datastax.com/blog/2016/01/testing-apache-cassandra-jepsen


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org


Re: What is the status of counters? Should I use them?

Posted by Jon Haddad <jo...@jonhaddad.com>.
It's possible to overcount when a server is overwhelmed or slow to respond
and you're getting exceptions on the client.  If you retry your query, it's
possible you'll increment twice, once for the original query (which maybe
threw an exception) and again on the retry.

Use counters if you're OK with approximating values which are right _most
of the time_ and wrong _when the cluster is a dumpster fire_.  You can also
track your individual requests and reconcile the counters later on if you
want to eventually be right.  You may find you want to remove some counters
if they're bots or signs of abuse anyways so IMO this is a better approach
than blindly incrementing on the assumption no one is doing anything
nefarious.

I can't say for sure if there's an issue or not with repairs.  Given the
way they're written now I don't think there is, but I haven't had the need
to investigate it, and I don't see anything in JIRA to suggest it's a
problem with modern counters.  Someone else may know something I don't
though.

Jon

On Wed, Oct 30, 2019 at 9:41 AM <ad...@pm.me.invalid> wrote:

> What about repairs? Can I just repair that table on a regular basis as any
> other?
>
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> On Wednesday, 30 October 2019 16:26, Jon Haddad <jo...@jonhaddad.com> wrote:
>
> Counters are good for things like page views, bad for money.  Yes they can
> under or overcount in certain situations.  If your cluster is stable,
> you'll see very little of it in practice.
>
> I've done quite a bit of tuning of counters.  Here's the main takeaways:
>
> * They do a read before a write, so use low latency disks (SSD)
> * Dial back read ahead to 4KB, this is a big deal (in fact, always do this
> even if you're not using counters and you are using SSDs)
> * Use 4KB compression chunk length
> * Bump up you counter cache
> * Some basic JVM tuning (ParNew + CMS, 16GB heap 10GB new, max tenuring
> threshold 4, survivor ratio 6)
>
> The last 3 will give you a 10-20x perf improvement over stock Cassandra if
> you've got a lot of counters.
>
> Jon
>
>
>
> On Wed, Oct 30, 2019 at 7:01 AM <ad...@pm.me.invalid> wrote:
>
>> Hi,
>>
>> I would like to use counters but I am not sure I should.
>>
>> I read a lot of articles on the Internet how counters are bad / wrong /
>> inaccurate etc etc ...
>>
>> Let's be honest, counters in Cassandra have quite a bad reputation.
>>
>> But all stuff I read about that was quite old, I know there was
>> significant improvements in that area especially around 2.1 / 2.2 releases
>> but I can not make my head around so I can definitely be sure if I should
>> use them or not.
>>
>> The literature I read were:
>>
>> 1) That one elaborates about counters from node-lifecycle perspective and
>> there are still some problems of over / undercounting.
>>
>> 2) This one explains the differences between pre and post 2.1
>> implementations and suggests that once counter caches are removed, the
>> implementation will be even better and simplified - but I am not sure what
>> is the outcome of this article? It says that all "wrong" implementation of
>> counters (as we knew them in pre 2.x era) was corrected and we should be
>> all good to use it?
>>
>> 3) These guys said that they have not found any bugs ... huh.
>>
>> So, what is the overall state of counters in 3.11.4 ? (hence 3.11.5)?
>> Would you recommend to use them in production?
>>
>> My usecase is that I have 2 DCs with 3 nodes each and I have a table
>> where I want to track number number of page visits.
>>
>> My perception is that "they will be inconsistent and you can not repair
>> it and it is idempotent" but from what I have tested, when I put 1 node
>> down and I brought it back and read it, it was just fine and numbers were
>> good.
>>
>> So I am not sure if my testing is very naive but the whole mystery about
>> counters and the lack of the authoritative advice what the general status
>> is and where it can go wrong is imho lacking.
>>
>> Are the links below obsolete? Do I have strong guarantee that counters
>> will "just work"? What are the downsides and why would not you use them?
>> Honestly, after reading a lot about that, I am not trusting counters too
>> much but I am not sure if my opinion is biased based on what I read so far.
>>
>> Thanks
>>
>> Links
>>
>> 1) http://datastrophic.io/evaluating-cassandra-2-1-counters-consistency/
>>
>> 2)
>> https://www.datastax.com/blog/2014/05/whats-new-cassandra-21-better-implementation-counters
>>
>> 3) https://www.datastax.com/blog/2016/01/testing-apache-cassandra-jepsen
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>>
>>
>

Re: What is the status of counters? Should I use them?

Posted by ad...@pm.me.INVALID.
What about repairs? Can I just repair that table on a regular basis as any other?

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, 30 October 2019 16:26, Jon Haddad <jo...@jonhaddad.com> wrote:

> Counters are good for things like page views, bad for money.  Yes they can under or overcount in certain situations.  If your cluster is stable, you'll see very little of it in practice.
>
> I've done quite a bit of tuning of counters.  Here's the main takeaways:
>
> * They do a read before a write, so use low latency disks (SSD)
> * Dial back read ahead to 4KB, this is a big deal (in fact, always do this even if you're not using counters and you are using SSDs)
> * Use 4KB compression chunk length
> * Bump up you counter cache
> * Some basic JVM tuning (ParNew + CMS, 16GB heap 10GB new, max tenuring threshold 4, survivor ratio 6)
>
> The last 3 will give you a 10-20x perf improvement over stock Cassandra if you've got a lot of counters.
>
> Jon
>
> On Wed, Oct 30, 2019 at 7:01 AM <ad...@pm.me.invalid> wrote:
>
>> Hi,
>>
>> I would like to use counters but I am not sure I should.
>>
>> I read a lot of articles on the Internet how counters are bad / wrong / inaccurate etc etc ...
>>
>> Let's be honest, counters in Cassandra have quite a bad reputation.
>>
>> But all stuff I read about that was quite old, I know there was significant improvements in that area especially around 2.1 / 2.2 releases but I can not make my head around so I can definitely be sure if I should use them or not.
>>
>> The literature I read were:
>>
>> 1) That one elaborates about counters from node-lifecycle perspective and there are still some problems of over / undercounting.
>>
>> 2) This one explains the differences between pre and post 2.1 implementations and suggests that once counter caches are removed, the implementation will be even better and simplified - but I am not sure what is the outcome of this article? It says that all "wrong" implementation of counters (as we knew them in pre 2.x era) was corrected and we should be all good to use it?
>>
>> 3) These guys said that they have not found any bugs ... huh.
>>
>> So, what is the overall state of counters in 3.11.4 ? (hence 3.11.5)? Would you recommend to use them in production?
>>
>> My usecase is that I have 2 DCs with 3 nodes each and I have a table where I want to track number number of page visits.
>>
>> My perception is that "they will be inconsistent and you can not repair it and it is idempotent" but from what I have tested, when I put 1 node down and I brought it back and read it, it was just fine and numbers were good.
>>
>> So I am not sure if my testing is very naive but the whole mystery about counters and the lack of the authoritative advice what the general status is and where it can go wrong is imho lacking.
>>
>> Are the links below obsolete? Do I have strong guarantee that counters will "just work"? What are the downsides and why would not you use them? Honestly, after reading a lot about that, I am not trusting counters too much but I am not sure if my opinion is biased based on what I read so far.
>>
>> Thanks
>>
>> Links
>>
>> 1) http://datastrophic.io/evaluating-cassandra-2-1-counters-consistency/
>>
>> 2) https://www.datastax.com/blog/2014/05/whats-new-cassandra-21-better-implementation-counters
>>
>> 3) https://www.datastax.com/blog/2016/01/testing-apache-cassandra-jepsen
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org

Re: What is the status of counters? Should I use them?

Posted by Jon Haddad <jo...@jonhaddad.com>.
Counters are good for things like page views, bad for money.  Yes they can
under or overcount in certain situations.  If your cluster is stable,
you'll see very little of it in practice.

I've done quite a bit of tuning of counters.  Here's the main takeaways:

* They do a read before a write, so use low latency disks (SSD)
* Dial back read ahead to 4KB, this is a big deal (in fact, always do this
even if you're not using counters and you are using SSDs)
* Use 4KB compression chunk length
* Bump up you counter cache
* Some basic JVM tuning (ParNew + CMS, 16GB heap 10GB new, max tenuring
threshold 4, survivor ratio 6)

The last 3 will give you a 10-20x perf improvement over stock Cassandra if
you've got a lot of counters.

Jon



On Wed, Oct 30, 2019 at 7:01 AM <ad...@pm.me.invalid> wrote:

> Hi,
>
> I would like to use counters but I am not sure I should.
>
> I read a lot of articles on the Internet how counters are bad / wrong /
> inaccurate etc etc ...
>
> Let's be honest, counters in Cassandra have quite a bad reputation.
>
> But all stuff I read about that was quite old, I know there was
> significant improvements in that area especially around 2.1 / 2.2 releases
> but I can not make my head around so I can definitely be sure if I should
> use them or not.
>
> The literature I read were:
>
> 1) That one elaborates about counters from node-lifecycle perspective and
> there are still some problems of over / undercounting.
>
> 2) This one explains the differences between pre and post 2.1
> implementations and suggests that once counter caches are removed, the
> implementation will be even better and simplified - but I am not sure what
> is the outcome of this article? It says that all "wrong" implementation of
> counters (as we knew them in pre 2.x era) was corrected and we should be
> all good to use it?
>
> 3) These guys said that they have not found any bugs ... huh.
>
> So, what is the overall state of counters in 3.11.4 ? (hence 3.11.5)?
> Would you recommend to use them in production?
>
> My usecase is that I have 2 DCs with 3 nodes each and I have a table where
> I want to track number number of page visits.
>
> My perception is that "they will be inconsistent and you can not repair it
> and it is idempotent" but from what I have tested, when I put 1 node down
> and I brought it back and read it, it was just fine and numbers were good.
>
> So I am not sure if my testing is very naive but the whole mystery about
> counters and the lack of the authoritative advice what the general status
> is and where it can go wrong is imho lacking.
>
> Are the links below obsolete? Do I have strong guarantee that counters
> will "just work"? What are the downsides and why would not you use them?
> Honestly, after reading a lot about that, I am not trusting counters too
> much but I am not sure if my opinion is biased based on what I read so far.
>
> Thanks
>
> Links
>
> 1) http://datastrophic.io/evaluating-cassandra-2-1-counters-consistency/
>
> 2)
> https://www.datastax.com/blog/2014/05/whats-new-cassandra-21-better-implementation-counters
>
> 3) https://www.datastax.com/blog/2016/01/testing-apache-cassandra-jepsen
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>