You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Boris Yen <yu...@gmail.com> on 2011/08/09 07:28:06 UTC

Enormous counter problem?

Hi,

I am not sure if this is a bug or we use the counter the wrong way, but I
keep getting a enormous counter number in our deployment. After a few tries,
I am finally able to reproduce it. The following are the settings of my
development:
-----------------------------------------------------
I have two-node cluster with the following keyspace and column family
settings.

Cluster Information:
   Snitch: org.apache.cassandra.locator.SimpleSnitch
   Partitioner: org.apache.cassandra.dht.RandomPartitioner
   Schema versions:
63fda700-c243-11e0-0000-2d03dcafebdf: [172.17.19.151, 172.17.19.152]

Keyspace: test:
  Replication Strategy: org.apache.cassandra.locator.NetworkTopologyStrategy
  Durable Writes: true
    Options: [datacenter1:2]
  Column Families:
    ColumnFamily: testCounter (Super)
    "APP status information."
      Key Validation Class: org.apache.cassandra.db.marshal.BytesType
      Default column value validator:
org.apache.cassandra.db.marshal.CounterColumnType
      Columns sorted by:
org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
      Row cache size / save period in seconds: 0.0/0
      Key cache size / save period in seconds: 200000.0/14400
      Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 1.0
      Replicate on write: true
      Built indexes: []

Then, I use a test program based on hector to add a counter column
(testCounter[sc][column]) 1000 times. In the middle the adding process, I
intentional shut down the node 172.17.19.152. In addition to that, the test
program is smart enough to switch the consistency level from Quorum to One,
so that the following adding actions would not fail.

After all the adding actions are done, I start the cassandra
on 172.17.19.152, and I use cassandra-cli to check if the counter is correct
on both nodes, and I got a result 1001 which should be reasonable because
hector will retry once. However, when I shut down 172.17.19.151 and
after 172.17.19.152 is aware of 172.17.19.151 is down, I try to start the
cassandra on 172.17.19.151 again. Then, I check the counter again, this time
I got a result 481387 which is so wrong.

I was wondering if anyone could explain why this happens, is this a bug or
do I use the counter the wrong way?.

Regards
Boris

Re: Enormous counter problem?

Posted by Boris Yen <yu...@gmail.com>.

ticket opened, https://issues.apache.org/jira/browse/CASSANDRA-3006

On Tue, Aug 9, 2011 at 5:38 PM, Boris Yen <yu...@gmail.com> wrote:

> Actually, I reproduced this on 0.8.3, so it seems to me that it is not
> fixed yet.
>
> Boris
>
>
> On Tue, Aug 9, 2011 at 5:32 PM, Sylvain Lebresne <sy...@datastax.com>wrote:
>
>> Yes, if you can reproduce easily, please see if 0.8.3 fixes it for by
>> any chance.
>> Otherwise, please open a JIRA ticket with as much info on how to
>> reproduce.
>>
>> --
>> Sylvain
>>
>> On Tue, Aug 9, 2011 at 11:04 AM, Andrii Denysenko <an...@gmail.com>
>> wrote:
>> > Try 0.8.3
>> > They fixed https://issues.apache.org/jira/browse/CASSANDRA-2968 - and
>> this
>> > produced erroneous records for counters.
>> > Not sure this is exactly yours, but similar.
>> >
>> > On Tue, Aug 9, 2011 at 5:28 AM, Boris Yen <yu...@gmail.com> wrote:
>> >>
>> >> Hi,
>> >> I am not sure if this is a bug or we use the counter the wrong way, but
>> I
>> >> keep getting a enormous counter number in our deployment. After a few
>> tries,
>> >> I am finally able to reproduce it. The following are the settings of my
>> >> development:
>> >> -----------------------------------------------------
>> >> I have two-node cluster with the following keyspace and column family
>> >> settings.
>> >> Cluster Information:
>> >>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>> >>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>> >>    Schema versions:
>> >> 63fda700-c243-11e0-0000-2d03dcafebdf: [172.17.19.151, 172.17.19.152]
>> >> Keyspace: test:
>> >>   Replication Strategy:
>> >> org.apache.cassandra.locator.NetworkTopologyStrategy
>> >>   Durable Writes: true
>> >>     Options: [datacenter1:2]
>> >>   Column Families:
>> >>     ColumnFamily: testCounter (Super)
>> >>     "APP status information."
>> >>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>> >>       Default column value validator:
>> >> org.apache.cassandra.db.marshal.CounterColumnType
>> >>       Columns sorted by:
>> >>
>> org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
>> >>       Row cache size / save period in seconds: 0.0/0
>> >>       Key cache size / save period in seconds: 200000.0/14400
>> >>       Memtable thresholds: 1.1578125/1440/247 (millions of
>> ops/MB/minutes)
>> >>       GC grace seconds: 864000
>> >>       Compaction min/max thresholds: 4/32
>> >>       Read repair chance: 1.0
>> >>       Replicate on write: true
>> >>       Built indexes: []
>> >> Then, I use a test program based on hector to add a counter column
>> >> (testCounter[sc][column]) 1000 times. In the middle the adding process,
>> I
>> >> intentional shut down the node 172.17.19.152. In addition to that, the
>> test
>> >> program is smart enough to switch the consistency level from Quorum to
>> One,
>> >> so that the following adding actions would not fail.
>> >> After all the adding actions are done, I start the cassandra
>> >> on 172.17.19.152, and I use cassandra-cli to check if the counter is
>> correct
>> >> on both nodes, and I got a result 1001 which should be reasonable
>> because
>> >> hector will retry once. However, when I shut down 172.17.19.151 and
>> >> after 172.17.19.152 is aware of 172.17.19.151 is down, I try to start
>> the
>> >> cassandra on 172.17.19.151 again. Then, I check the counter again, this
>> time
>> >> I got a result 481387 which is so wrong.
>> >> I was wondering if anyone could explain why this happens, is this a bug
>> or
>> >> do I use the counter the wrong way?.
>> >> Regards
>> >> Boris
>> >
>> >
>> > --
>> > Regards,
>> > Andriy Denysenko
>> >
>> >
>>
>
>

Re: Enormous counter problem?

Posted by Boris Yen <yu...@gmail.com>.

Actually, I reproduced this on 0.8.3, so it seems to me that it is not fixed
yet.

Boris

On Tue, Aug 9, 2011 at 5:32 PM, Sylvain Lebresne <sy...@datastax.com>wrote:

> Yes, if you can reproduce easily, please see if 0.8.3 fixes it for by
> any chance.
> Otherwise, please open a JIRA ticket with as much info on how to reproduce.
>
> --
> Sylvain
>
> On Tue, Aug 9, 2011 at 11:04 AM, Andrii Denysenko <an...@gmail.com>
> wrote:
> > Try 0.8.3
> > They fixed https://issues.apache.org/jira/browse/CASSANDRA-2968 - and
> this
> > produced erroneous records for counters.
> > Not sure this is exactly yours, but similar.
> >
> > On Tue, Aug 9, 2011 at 5:28 AM, Boris Yen <yu...@gmail.com> wrote:
> >>
> >> Hi,
> >> I am not sure if this is a bug or we use the counter the wrong way, but
> I
> >> keep getting a enormous counter number in our deployment. After a few
> tries,
> >> I am finally able to reproduce it. The following are the settings of my
> >> development:
> >> -----------------------------------------------------
> >> I have two-node cluster with the following keyspace and column family
> >> settings.
> >> Cluster Information:
> >>    Snitch: org.apache.cassandra.locator.SimpleSnitch
> >>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
> >>    Schema versions:
> >> 63fda700-c243-11e0-0000-2d03dcafebdf: [172.17.19.151, 172.17.19.152]
> >> Keyspace: test:
> >>   Replication Strategy:
> >> org.apache.cassandra.locator.NetworkTopologyStrategy
> >>   Durable Writes: true
> >>     Options: [datacenter1:2]
> >>   Column Families:
> >>     ColumnFamily: testCounter (Super)
> >>     "APP status information."
> >>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
> >>       Default column value validator:
> >> org.apache.cassandra.db.marshal.CounterColumnType
> >>       Columns sorted by:
> >>
> org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
> >>       Row cache size / save period in seconds: 0.0/0
> >>       Key cache size / save period in seconds: 200000.0/14400
> >>       Memtable thresholds: 1.1578125/1440/247 (millions of
> ops/MB/minutes)
> >>       GC grace seconds: 864000
> >>       Compaction min/max thresholds: 4/32
> >>       Read repair chance: 1.0
> >>       Replicate on write: true
> >>       Built indexes: []
> >> Then, I use a test program based on hector to add a counter column
> >> (testCounter[sc][column]) 1000 times. In the middle the adding process,
> I
> >> intentional shut down the node 172.17.19.152. In addition to that, the
> test
> >> program is smart enough to switch the consistency level from Quorum to
> One,
> >> so that the following adding actions would not fail.
> >> After all the adding actions are done, I start the cassandra
> >> on 172.17.19.152, and I use cassandra-cli to check if the counter is
> correct
> >> on both nodes, and I got a result 1001 which should be reasonable
> because
> >> hector will retry once. However, when I shut down 172.17.19.151 and
> >> after 172.17.19.152 is aware of 172.17.19.151 is down, I try to start
> the
> >> cassandra on 172.17.19.151 again. Then, I check the counter again, this
> time
> >> I got a result 481387 which is so wrong.
> >> I was wondering if anyone could explain why this happens, is this a bug
> or
> >> do I use the counter the wrong way?.
> >> Regards
> >> Boris
> >
> >
> > --
> > Regards,
> > Andriy Denysenko
> >
> >
>

Re: Enormous counter problem?

Posted by Sylvain Lebresne <sy...@datastax.com>.

Yes, if you can reproduce easily, please see if 0.8.3 fixes it for by
any chance.
Otherwise, please open a JIRA ticket with as much info on how to reproduce.

--
Sylvain

On Tue, Aug 9, 2011 at 11:04 AM, Andrii Denysenko <an...@gmail.com> wrote:
> Try 0.8.3
> They fixed https://issues.apache.org/jira/browse/CASSANDRA-2968 - and this
> produced erroneous records for counters.
> Not sure this is exactly yours, but similar.
>
> On Tue, Aug 9, 2011 at 5:28 AM, Boris Yen <yu...@gmail.com> wrote:
>>
>> Hi,
>> I am not sure if this is a bug or we use the counter the wrong way, but I
>> keep getting a enormous counter number in our deployment. After a few tries,
>> I am finally able to reproduce it. The following are the settings of my
>> development:
>> -----------------------------------------------------
>> I have two-node cluster with the following keyspace and column family
>> settings.
>> Cluster Information:
>>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>>    Schema versions:
>> 63fda700-c243-11e0-0000-2d03dcafebdf: [172.17.19.151, 172.17.19.152]
>> Keyspace: test:
>>   Replication Strategy:
>> org.apache.cassandra.locator.NetworkTopologyStrategy
>>   Durable Writes: true
>>     Options: [datacenter1:2]
>>   Column Families:
>>     ColumnFamily: testCounter (Super)
>>     "APP status information."
>>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>>       Default column value validator:
>> org.apache.cassandra.db.marshal.CounterColumnType
>>       Columns sorted by:
>> org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
>>       Row cache size / save period in seconds: 0.0/0
>>       Key cache size / save period in seconds: 200000.0/14400
>>       Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
>>       GC grace seconds: 864000
>>       Compaction min/max thresholds: 4/32
>>       Read repair chance: 1.0
>>       Replicate on write: true
>>       Built indexes: []
>> Then, I use a test program based on hector to add a counter column
>> (testCounter[sc][column]) 1000 times. In the middle the adding process, I
>> intentional shut down the node 172.17.19.152. In addition to that, the test
>> program is smart enough to switch the consistency level from Quorum to One,
>> so that the following adding actions would not fail.
>> After all the adding actions are done, I start the cassandra
>> on 172.17.19.152, and I use cassandra-cli to check if the counter is correct
>> on both nodes, and I got a result 1001 which should be reasonable because
>> hector will retry once. However, when I shut down 172.17.19.151 and
>> after 172.17.19.152 is aware of 172.17.19.151 is down, I try to start the
>> cassandra on 172.17.19.151 again. Then, I check the counter again, this time
>> I got a result 481387 which is so wrong.
>> I was wondering if anyone could explain why this happens, is this a bug or
>> do I use the counter the wrong way?.
>> Regards
>> Boris
>
>
> --
> Regards,
> Andriy Denysenko
>
>

Re: Enormous counter problem?

Posted by Andrii Denysenko <an...@gmail.com>.

Try 0.8.3
They fixed https://issues.apache.org/jira/browse/CASSANDRA-2968 - and this
produced erroneous records for counters.
Not sure this is exactly yours, but similar.

On Tue, Aug 9, 2011 at 5:28 AM, Boris Yen <yu...@gmail.com> wrote:

> Hi,
>
> I am not sure if this is a bug or we use the counter the wrong way, but I
> keep getting a enormous counter number in our deployment. After a few tries,
> I am finally able to reproduce it. The following are the settings of my
> development:
> -----------------------------------------------------
> I have two-node cluster with the following keyspace and column family
> settings.
>
> Cluster Information:
>    Snitch: org.apache.cassandra.locator.SimpleSnitch
>    Partitioner: org.apache.cassandra.dht.RandomPartitioner
>    Schema versions:
> 63fda700-c243-11e0-0000-2d03dcafebdf: [172.17.19.151, 172.17.19.152]
>
> Keyspace: test:
>   Replication Strategy:
> org.apache.cassandra.locator.NetworkTopologyStrategy
>   Durable Writes: true
>     Options: [datacenter1:2]
>   Column Families:
>     ColumnFamily: testCounter (Super)
>     "APP status information."
>       Key Validation Class: org.apache.cassandra.db.marshal.BytesType
>       Default column value validator:
> org.apache.cassandra.db.marshal.CounterColumnType
>       Columns sorted by:
> org.apache.cassandra.db.marshal.BytesType/org.apache.cassandra.db.marshal.BytesType
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 1.1578125/1440/247 (millions of ops/MB/minutes)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: true
>       Built indexes: []
>
> Then, I use a test program based on hector to add a counter column
> (testCounter[sc][column]) 1000 times. In the middle the adding process, I
> intentional shut down the node 172.17.19.152. In addition to that, the test
> program is smart enough to switch the consistency level from Quorum to One,
> so that the following adding actions would not fail.
>
> After all the adding actions are done, I start the cassandra
> on 172.17.19.152, and I use cassandra-cli to check if the counter is correct
> on both nodes, and I got a result 1001 which should be reasonable because
> hector will retry once. However, when I shut down 172.17.19.151 and
> after 172.17.19.152 is aware of 172.17.19.151 is down, I try to start the
> cassandra on 172.17.19.151 again. Then, I check the counter again, this time
> I got a result 481387 which is so wrong.
>
> I was wondering if anyone could explain why this happens, is this a bug or
> do I use the counter the wrong way?.
>
> Regards
> Boris
>



-- 
Regards,
Andriy Denysenko