You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alain RODRIGUEZ <ar...@gmail.com> on 2012/09/03 10:31:03 UTC

Re: Invalid Counter Shard errors?

Hello,

I'm running a 1.1.2 Cassandra 2 nodes wide cluster with RF=2 (CL = 1,
nodes are m1.large from Amazon).

I had this error 524 times last month on the node 1 and 2805 time on
the second node.

Should I worry about it ? How can I fix these errors ?

Alain

2012/6/2 Peter Schuller <pe...@infidyne.com>:
>> We're running a three node cluster of cassandra 1.1 servers, originally
>> 1.0.7 and immediately after the upgrade the error logs of all three servers
>> began filling up with the following message:
>
> The message you are receiving is new, but the problem it identifies is
> not. The checking for this condition, and the logging, was added so
> that certain kinds of counter corruption would be self-healed
> eventually instead of remaining forever incorrect. Likely nothing is
> wrong that wasn't before; you're just seeing it being logged now.
>
> And I can confirm having seen this on 1.1, so the root cause remains
> unknown as far as I can tell (had previously hoped the root cause were
> thread-unsafe shard merging, or one of the other counter related
> issues fixed during the 0.8 run).
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Invalid Counter Shard errors?

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Oh, i just saw your first mail.

"I don't see a negative number in you paste?"

(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 1, -1) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff,
1, 1)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, -5000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 20000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 19, -3) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff,
19, 19)

The counts on the left parentheses are negative values and we
never decrements counters.

Thanks for your explanations.

Alain

2012/9/20 Alain RODRIGUEZ <ar...@gmail.com>

> "I think that's inconsistent with the hypothesis that unclean shutdown is
> the sole cause of these problems"
>
> I agree, we just never shut down any node, neither had any crash, and yet
> we have these bugs.
>
> About your side note :
>
> We know about it, but we couldn't find any other way to be able to provide
> real-time analytics. If you do so, we would be really glad to hear about it.
>  We need both to serve statistics in real-time and be accurate about
> prices and we need a coherence between what's shown in our graphics and
> tables and the invoices we provide to our customers.
> What we do is trying to avoid timeouts as much as possible (increasing the
> time before a timeout and getting a the lowest CPU load possible). In order
> to keep a low latency for the user we write first the events in a queue
> message (Kestrel) and then we process it with storm, which writes the
> events and increments counters in Cassandra.
>
> Once again if you got a clue about a better way of doing this, we are
> always happy to learn and try to enhance our architecture and our process.
>
> Alain
>
>
> 2012/9/20 Peter Schuller <pe...@infidyne.com>
>
>> The significance I think is: If it is indeed the case that the higher
>> value is always *in fact* correct, I think that's inconsistent with
>> the hypothesis that unclean shutdown is the sole cause of these
>> problems - as long as the client is truly submitting non-idempotent
>> counter increments without a read-before-write.
>>
>> As a side note: If hou're using these counters for stuff like
>> determining amounts of money to be payed by somebody, consider the
>> non-idempotense of counter increments. Any write that increments a
>> counter, that fails by e.g. Timeout *MAY OR MAY NOT* have been applied
>> and cannot be safely retried. Cassandra counters are generally not
>> useful if *strict* correctness is desired, for this reason.
>>
>> --
>> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>>
>
>

Re: Invalid Counter Shard errors?

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

"I think that's inconsistent with the hypothesis that unclean shutdown is
the sole cause of these problems"

I agree, we just never shut down any node, neither had any crash, and yet
we have these bugs.

About your side note :

We know about it, but we couldn't find any other way to be able to provide
real-time analytics. If you do so, we would be really glad to hear about it.
We need both to serve statistics in real-time and be accurate about prices
and we need a coherence between what's shown in our graphics and tables and
the invoices we provide to our customers.
What we do is trying to avoid timeouts as much as possible (increasing the
time before a timeout and getting a the lowest CPU load possible). In order
to keep a low latency for the user we write first the events in a queue
message (Kestrel) and then we process it with storm, which writes the
events and increments counters in Cassandra.

Once again if you got a clue about a better way of doing this, we are
always happy to learn and try to enhance our architecture and our process.

Alain

2012/9/20 Peter Schuller <pe...@infidyne.com>

> The significance I think is: If it is indeed the case that the higher
> value is always *in fact* correct, I think that's inconsistent with
> the hypothesis that unclean shutdown is the sole cause of these
> problems - as long as the client is truly submitting non-idempotent
> counter increments without a read-before-write.
>
> As a side note: If hou're using these counters for stuff like
> determining amounts of money to be payed by somebody, consider the
> non-idempotense of counter increments. Any write that increments a
> counter, that fails by e.g. Timeout *MAY OR MAY NOT* have been applied
> and cannot be safely retried. Cassandra counters are generally not
> useful if *strict* correctness is desired, for this reason.
>
> --
> / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>

Re: Invalid Counter Shard errors?

Posted by Peter Schuller <pe...@infidyne.com>.

The significance I think is: If it is indeed the case that the higher
value is always *in fact* correct, I think that's inconsistent with
the hypothesis that unclean shutdown is the sole cause of these
problems - as long as the client is truly submitting non-idempotent
counter increments without a read-before-write.

As a side note: If hou're using these counters for stuff like
determining amounts of money to be payed by somebody, consider the
non-idempotense of counter increments. Any write that increments a
counter, that fails by e.g. Timeout *MAY OR MAY NOT* have been applied
and cannot be safely retried. Cassandra counters are generally not
useful if *strict* correctness is desired, for this reason.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Invalid Counter Shard errors?

Posted by Peter Schuller <pe...@infidyne.com>.

> I don't understand what the three in parentheses values are exactly. I guess
> the last number is the count and the middle one is the number of increments,
> is that true ? What is the first string (identical in all the errors) ?

It's (UUID, clock, increment). Very  briefly, counter columns in
Cassandra are made up of multiple "shards". In the write path, a
particular counter increment is executed by one "leader" which is one
of the replicas of the counter. The leader will increment it's own
value, read it's own full value (this is why "Replicate On Write" has
to do reads in the write path for counters) and replicas to other
nodes.

UUID "roughly" corresponds to a node in the cluster (UUID:s are
sometimes refreshed, so it's not a strict correlation). Clockid is
supposed to be monotonically increasing for a given UUID.

> How can the last number (assuming it's the count) be negative knowing that I
> only sum positive numbers ?

I don't see a negative number in you paste?

> An other point is that the highest value seems to be *always* the good one
> (assuming this time that the middle number is the number of increments).

DISCLAIMER: This is me responding off the cuff without digging into it further.

Depends on the source of the problem. If the problem, as theorized in
the ticket, is caused by non-clean shutdown of nodes the expected
result *should* be that we effectively "loose" counter increments.
Given a particular leader among the replicas, suppose you increment
counter C by N1, followed by un-clean shutdown with the value never
having been written to the commit log. On the next increment of C by
N2, a counter shard would be generated which has the value being
base+N2 instead of base+N1 (assuming the memtable wasn't flushed and
no other writes to the same counter column happened).

When this gets replicated to other nodes, they would see a value based
on N1 and a value based on N2, both with the same clock. It would
choose the higher one. In either case as far as I can tell (off the
top of my head), *some* counter increment is lost. The only way I can
see (again off the top of my head) the resulting value being correct
is if the later increment (N2 in this case) is somehow including N1 as
well (e.g., because it was generated by first reading the current
counter value).

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)

Re: Invalid Counter Shard errors?

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

I would like to understand or do my best helping you to understand this
issue.

I got the following (shortened) logs:

(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 3) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 0)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 6, 6) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 6, 2)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 5, 3) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 5, 5)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 45000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 15000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 13, 13) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 13, 7)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 1) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 3)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 10000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 1, -1) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 1, 1)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 6, 2) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 6, 6)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 30000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 0)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 10000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 0)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 4) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 0)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 8, 5) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 8, 8)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 62, 62) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 62, 57)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 36000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 12000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 5, 1) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 5, 5)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 2) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 4)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 5000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 15000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 10000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 45, 504000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 45, 540000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 15000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 45000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 10000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 0)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 2) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 0)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 3)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 4)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 2)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 5, 2) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 5, 5)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 28, 25) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 28, 28)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 57, 7) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 57, 57)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 19, -3) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 19, 19)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 1, -1) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 1, 1)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 3) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 0)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 1) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 3)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 28, 588000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 28, 294000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 16, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 16, 16)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 5000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 15000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, -5000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 20000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 5, 60000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 5, 36000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 2, 10000)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 4)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 3) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 0)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 4)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 4)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 4) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 4, 0)
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 0) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 3, 3)

I don't understand what the three in parentheses values are exactly. I
guess the last number is the count and the middle one is the number of
increments, is that true ? What is the first string (identical in all the
errors) ?

How can the last number (assuming it's the count) be negative knowing that
I only sum positive numbers ?

An other point is that the highest value seems to be *always* the good one
(assuming this time that the middle number is the number of increments).

I'll try to make myself clear:

I count events and accrued incomes or a costs attached to the events.

In the following line I'm sure that I'm summing cost

(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 28, 588000) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 28, 294000)

588000/28 = 21000 && 294000/28 = 10500. These are euros stored multiplying
them by 100000. So I get 0,21 euros int he first case and 0,105 euros in
the second case as unitary price. We never put 3 decimals in our prices and
we do have a lot of products that cost 0.21, which makes me think that the
first value, the highest out of the two, is the correct one.

In the following line I guess that I'm summing events (one by one so)

(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 19, -3) and
(03a227f0-a5c3-11e1-0000-b7f5e49dceff, 19, 19)

So obviously, if all my assumptions are right, the good value is again the
highest. And I don't get how the count could possibly be "-3".

So the highest value would be always the highest one making your fix
efficient (in my cas at least).

I hope this could be useful to Sylvain Lebresne or anyone working on this
bug.

Alain

Re: Invalid Counter Shard errors?

Posted by Charles Brophy <cb...@zulily.com>.

I have a very reliable repro case on our cluster involving nodetool repair.
I posted a summary in a comment on the issue. Let me know if more details
are needed.

Charles

On Fri, Sep 7, 2012 at 8:35 AM, Sylvain Lebresne <sy...@datastax.com>wrote:

> > Is there a way to fix this error ? What is its impact on my data ?
>
> The fact that the message shows means that Cassandra has attempted to
> "repair" the problem so there isn't much to do. However the fact that
> you do get the messages in the first means that there is a bug
> somewhere that generate those.
> Now as Peter said, we don't know what is that bug that generate this
> problem
>
> > What is its impact on my data ?
>
> The problem is that as Peter said, we actually don't know what is
> causing that problem. What the message said though is that two
> different values have been found for a given counter (it's two
> different values for a sub-part of the counter but that's a technical
> detail). Now what the code does to "repair" in that case is to pick
> the higher of the two value it has. But honestly that's random,
> there's a 50/50 chance that it will pick the right value.
>
> The main problem is that I have not clue how to reproduce this easily,
> which makes it really hard to track. If someone finds a way to
> reproduce, please do share by all mean (on
> https://issues.apache.org/jira/browse/CASSANDRA-4417 typically). What
> I can suggest is that if you have a log with multiple instances of
> said log message, you attach it to the ticket. I can have a look to
> see if there is some pattern between the different occurrences that
> suggest a reason why this happen. But to be honest I have some doubts
> that it will help much short of having a way to reproduce.
>
> I will also note that we did fixed a bug that was affecting counters
> in 1.1.3 (https://issues.apache.org/jira/browse/CASSANDRA-4436). I
> don't really think this could be the cause of what you are seeing, but
> there is a slim chance that I'm wrong on that. So it's probably worth
> upgrading to be sure.
>
> --
> Sylvain
>

Re: Invalid Counter Shard errors?

Posted by Sylvain Lebresne <sy...@datastax.com>.

> Is there a way to fix this error ? What is its impact on my data ?

The fact that the message shows means that Cassandra has attempted to
"repair" the problem so there isn't much to do. However the fact that
you do get the messages in the first means that there is a bug
somewhere that generate those.
Now as Peter said, we don't know what is that bug that generate this problem

> What is its impact on my data ?

The problem is that as Peter said, we actually don't know what is
causing that problem. What the message said though is that two
different values have been found for a given counter (it's two
different values for a sub-part of the counter but that's a technical
detail). Now what the code does to "repair" in that case is to pick
the higher of the two value it has. But honestly that's random,
there's a 50/50 chance that it will pick the right value.

The main problem is that I have not clue how to reproduce this easily,
which makes it really hard to track. If someone finds a way to
reproduce, please do share by all mean (on
https://issues.apache.org/jira/browse/CASSANDRA-4417 typically). What
I can suggest is that if you have a log with multiple instances of
said log message, you attach it to the ticket. I can have a look to
see if there is some pattern between the different occurrences that
suggest a reason why this happen. But to be honest I have some doubts
that it will help much short of having a way to reproduce.

I will also note that we did fixed a bug that was affecting counters
in 1.1.3 (https://issues.apache.org/jira/browse/CASSANDRA-4436). I
don't really think this could be the cause of what you are seeing, but
there is a slim chance that I'm wrong on that. So it's probably worth
upgrading to be sure.

--
Sylvain

Re: Invalid Counter Shard errors?

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

"This problem is not new to 1.1"

That's what I understood from your last comment.

Is there a way to fix this error ? What is its impact on my data ?

Alain

2012/9/7 Peter Schuller <pe...@infidyne.com>

> This problem is not new to 1.1.
> On Sep 6, 2012 5:51 AM, "Radim Kolar" <hs...@filez.com> wrote:
>
>> i would migrate to 1.0 because 1.1 is highly unstable.
>>
>

Re: Invalid Counter Shard errors?

Posted by Peter Schuller <pe...@infidyne.com>.

This problem is not new to 1.1.
On Sep 6, 2012 5:51 AM, "Radim Kolar" <hs...@filez.com> wrote:

> i would migrate to 1.0 because 1.1 is highly unstable.
>

Re: Invalid Counter Shard errors?

Posted by Radim Kolar <hs...@filez.com>.

i would migrate to 1.0 because 1.1 is highly unstable.

Re: Invalid Counter Shard errors?

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Hi nobody knows about this ?

Alain

2012/9/3 Alain RODRIGUEZ <ar...@gmail.com>

> Hello,
>
> I'm running a 1.1.2 Cassandra 2 nodes wide cluster with RF=2 (CL = 1,
> nodes are m1.large from Amazon).
>
> I had this error 524 times last month on the node 1 and 2805 time on
> the second node.
>
> Should I worry about it ? How can I fix these errors ?
>
> Alain
>
> 2012/6/2 Peter Schuller <pe...@infidyne.com>:
> >> We're running a three node cluster of cassandra 1.1 servers, originally
> >> 1.0.7 and immediately after the upgrade the error logs of all three
> servers
> >> began filling up with the following message:
> >
> > The message you are receiving is new, but the problem it identifies is
> > not. The checking for this condition, and the logging, was added so
> > that certain kinds of counter corruption would be self-healed
> > eventually instead of remaining forever incorrect. Likely nothing is
> > wrong that wasn't before; you're just seeing it being logged now.
> >
> > And I can confirm having seen this on 1.1, so the root cause remains
> > unknown as far as I can tell (had previously hoped the root cause were
> > thread-unsafe shard merging, or one of the other counter related
> > issues fixed during the 0.8 run).
> >
> > --
> > / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
>