You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Branton Davis <br...@spanning.com> on 2015/10/18 20:55:06 UTC

"invalid global counter shard detected" warning on 2.1.3 and 2.1.10

Hey all.

We've been seeing this warning on one of our clusters:

2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
org.apache.cassandra.db.context.CounterContext invalid global counter shard
detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
(4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
pick highest to self-heal on compaction


>From what I've read and heard in the IRC channel, this warning could be
related to not running upgradesstables after upgrading from 2.0.x to
2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
November.  Looking back, the warnings start appearing around June, when no
maintenance had been performed on the cluster.  At that time, we had been
on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
(the upgrade was when we noticed this warning for the first time).

>From a suggestion in IRC, I went ahead and ran upgradesstables on all the
nodes.  Our weekly repair also ran this morning.  But the warnings still
show up throughout the day.

So, we have many questions:

   - How much should we be freaking out?
   - Why is this recurring?  If I understand what's happening, this is a
   self-healing process.  So, why would it keep happening?  Are we possibly
   using counters incorrectly?
   - What does it even mean that there were multiple shards for the same
   counter?  How does that situation even occur?

We're pretty lost here, so any help would be greatly appreciated.

Thanks!

Re: "invalid global counter shard detected" warning on 2.1.3 and 2.1.10

Posted by Branton Davis <br...@spanning.com>.
Sebastián, thanks so much for the info!

On Tue, Oct 20, 2015 at 11:34 AM, Sebastian Estevez <
sebastian.estevez@datastax.com> wrote:

> Hi Branton,
>
>
>>    - How much should we be freaking out?
>>
>> The impact of this is possible counter inaccuracy (over counting or under
> counting). If you are expecting counters to be exactly accurate, you are
> already in trouble because they are not. This is because of the fact that
> they are not idempotent operations operating in a distributed system
> (you've probably read Aleksey's
> <http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters>
> post by now).
>
>>
>>    - Why is this recurring?  If I understand what's happening, this is a
>>    self-healing process.  So, why would it keep happening?  Are we possibly
>>    using counters incorrectly?
>>
>> Even after running sstableupgrade, your counter cells will not be
> upgraded until they have all been incremented. You may still seeing the
> warning happening on pre 2.1 counter cells which have not been incremented
> yet.
>
>>
>>    - What does it even mean that there were multiple shards for the same
>>    counter?  How does that situation even occur?
>>
>> We used to maintain "counter shards" at the sstable level in pre 2.1
> counters. This means that on compaction or reads we would essentially add
> the shards together when getting the value or merging the cells. This
> caused a series of problems including the warning you are still seeing.
> TL;DR, we now store the final value of the counter (not the
> increment/shard) at the commitlog level and beyond in post 2.1 counters, so
> this is no longer an issue. Again, read Aleksey's post
> <http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters>
> .
>
> Many users started fresh tables after upgrading to 2.1, update only the
> new tables, and added application logic to decide what table to read from.
> Something like monthly tables works well if you're doing time series
> counters, and would ensure that you stop seeing the warnings on the
> new/active tables and get the benefits of 2.1 counters quickly.
>
>
>
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
> <http://goog_410786983>
>
>
> <http://www.datastax.com/gartner-magic-quadrant-odbms>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Tue, Oct 20, 2015 at 12:21 PM, Branton Davis <
> branton.davis@spanning.com> wrote:
>
>> Howdy Cassandra folks.
>>
>> Crickets here and it's sort of unsettling that we're alone with this
>> issue.  Is it appropriate to create a JIRA issue for this or is there maybe
>> another way to deal with it?
>>
>> Thanks!
>>
>> On Sun, Oct 18, 2015 at 1:55 PM, Branton Davis <
>> branton.davis@spanning.com> wrote:
>>
>>> Hey all.
>>>
>>> We've been seeing this warning on one of our clusters:
>>>
>>> 2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
>>> org.apache.cassandra.db.context.CounterContext invalid global counter shard
>>> detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
>>> (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
>>> pick highest to self-heal on compaction
>>>
>>>
>>> From what I've read and heard in the IRC channel, this warning could be
>>> related to not running upgradesstables after upgrading from 2.0.x to
>>> 2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
>>> November.  Looking back, the warnings start appearing around June, when no
>>> maintenance had been performed on the cluster.  At that time, we had been
>>> on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
>>> (the upgrade was when we noticed this warning for the first time).
>>>
>>> From a suggestion in IRC, I went ahead and ran upgradesstables on all
>>> the nodes.  Our weekly repair also ran this morning.  But the warnings
>>> still show up throughout the day.
>>>
>>> So, we have many questions:
>>>
>>>    - How much should we be freaking out?
>>>    - Why is this recurring?  If I understand what's happening, this is
>>>    a self-healing process.  So, why would it keep happening?  Are we possibly
>>>    using counters incorrectly?
>>>    - What does it even mean that there were multiple shards for the
>>>    same counter?  How does that situation even occur?
>>>
>>> We're pretty lost here, so any help would be greatly appreciated.
>>>
>>> Thanks!
>>>
>>
>>
>

Re: "invalid global counter shard detected" warning on 2.1.3 and 2.1.10

Posted by Sebastian Estevez <se...@datastax.com>.
Hi Branton,


>    - How much should we be freaking out?
>
> The impact of this is possible counter inaccuracy (over counting or under
counting). If you are expecting counters to be exactly accurate, you are
already in trouble because they are not. This is because of the fact that
they are not idempotent operations operating in a distributed system
(you've probably read Aleksey's
<http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters>
post by now).

>
>    - Why is this recurring?  If I understand what's happening, this is a
>    self-healing process.  So, why would it keep happening?  Are we possibly
>    using counters incorrectly?
>
> Even after running sstableupgrade, your counter cells will not be upgraded
until they have all been incremented. You may still seeing the warning
happening on pre 2.1 counter cells which have not been incremented yet.

>
>    - What does it even mean that there were multiple shards for the same
>    counter?  How does that situation even occur?
>
> We used to maintain "counter shards" at the sstable level in pre 2.1
counters. This means that on compaction or reads we would essentially add
the shards together when getting the value or merging the cells. This
caused a series of problems including the warning you are still seeing.
TL;DR, we now store the final value of the counter (not the
increment/shard) at the commitlog level and beyond in post 2.1 counters, so
this is no longer an issue. Again, read Aleksey's post
<http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters>
.

Many users started fresh tables after upgrading to 2.1, update only the new
tables, and added application logic to decide what table to read from.
Something like monthly tables works well if you're doing time series
counters, and would ensure that you stop seeing the warnings on the
new/active tables and get the benefits of 2.1 counters quickly.




All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>
<http://goog_410786983>


<http://www.datastax.com/gartner-magic-quadrant-odbms>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Oct 20, 2015 at 12:21 PM, Branton Davis <br...@spanning.com>
wrote:

> Howdy Cassandra folks.
>
> Crickets here and it's sort of unsettling that we're alone with this
> issue.  Is it appropriate to create a JIRA issue for this or is there maybe
> another way to deal with it?
>
> Thanks!
>
> On Sun, Oct 18, 2015 at 1:55 PM, Branton Davis <branton.davis@spanning.com
> > wrote:
>
>> Hey all.
>>
>> We've been seeing this warning on one of our clusters:
>>
>> 2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
>> org.apache.cassandra.db.context.CounterContext invalid global counter shard
>> detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
>> (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
>> pick highest to self-heal on compaction
>>
>>
>> From what I've read and heard in the IRC channel, this warning could be
>> related to not running upgradesstables after upgrading from 2.0.x to
>> 2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
>> November.  Looking back, the warnings start appearing around June, when no
>> maintenance had been performed on the cluster.  At that time, we had been
>> on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
>> (the upgrade was when we noticed this warning for the first time).
>>
>> From a suggestion in IRC, I went ahead and ran upgradesstables on all the
>> nodes.  Our weekly repair also ran this morning.  But the warnings still
>> show up throughout the day.
>>
>> So, we have many questions:
>>
>>    - How much should we be freaking out?
>>    - Why is this recurring?  If I understand what's happening, this is a
>>    self-healing process.  So, why would it keep happening?  Are we possibly
>>    using counters incorrectly?
>>    - What does it even mean that there were multiple shards for the same
>>    counter?  How does that situation even occur?
>>
>> We're pretty lost here, so any help would be greatly appreciated.
>>
>> Thanks!
>>
>
>

Re: "invalid global counter shard detected" warning on 2.1.3 and 2.1.10

Posted by Branton Davis <br...@spanning.com>.
Howdy Cassandra folks.

Crickets here and it's sort of unsettling that we're alone with this
issue.  Is it appropriate to create a JIRA issue for this or is there maybe
another way to deal with it?

Thanks!

On Sun, Oct 18, 2015 at 1:55 PM, Branton Davis <br...@spanning.com>
wrote:

> Hey all.
>
> We've been seeing this warning on one of our clusters:
>
> 2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
> org.apache.cassandra.db.context.CounterContext invalid global counter shard
> detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
> (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
> pick highest to self-heal on compaction
>
>
> From what I've read and heard in the IRC channel, this warning could be
> related to not running upgradesstables after upgrading from 2.0.x to
> 2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
> November.  Looking back, the warnings start appearing around June, when no
> maintenance had been performed on the cluster.  At that time, we had been
> on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
> (the upgrade was when we noticed this warning for the first time).
>
> From a suggestion in IRC, I went ahead and ran upgradesstables on all the
> nodes.  Our weekly repair also ran this morning.  But the warnings still
> show up throughout the day.
>
> So, we have many questions:
>
>    - How much should we be freaking out?
>    - Why is this recurring?  If I understand what's happening, this is a
>    self-healing process.  So, why would it keep happening?  Are we possibly
>    using counters incorrectly?
>    - What does it even mean that there were multiple shards for the same
>    counter?  How does that situation even occur?
>
> We're pretty lost here, so any help would be greatly appreciated.
>
> Thanks!
>