You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Greg Saylor <gr...@net-virtual.com> on 2017/08/17 08:45:12 UTC

write time corrupted and not sure how

Hello,

We have a Cassandra database that is about 5 years old and has gone through multiple upgrades.   Today I noticed a very odd thing (current timestamp would be around 1502957436214912):

   cqlsh:siq_prod> select id,account_id,sweep_id from items where id=34681132;

    id       | account_id | sweep_id
   ----------+------------+----------
    34681132 |      13896 |         

I then attempted to delete it:


cqlsh:siq_prod> delete from items where id=34681132;

But its still there, so I thought I’d look at the writteime of sweep_id:

cqlsh:siq_prod> select id,account_id,sweep_id,writetime(sweep_id) from items where id=34681132;

id       | account_id | sweep_id | writetime(sweep_id)
----------+------------+----------+---------------------
34681132 |       null |          |    1718969631988312

That is Friday, June 21, 2024 11:33:51.988 AM

Is there any way to get rid of this record or update the writetime?  I’ve done a look around the database and there are many more examples of this.  There’s nothing we can think of that would have caused this, this record was inserted back in 2013 and there are other records within seconds of that one which are just fine.

I suspect something must have gone awry during an upgrade or there was a subtle bug in the version of Cassandra we were running at the time.

What started down this path was a tool an engineer was running that was written in NodeJS, apparently the cassandra driver for Node can’t parse this:

{ RangeError: Index out of range
    at checkOffset (buffer.js:821:11)
    at Buffer.readInt32BE (buffer.js:986:5)

Most other drivers I’ve tested just return nil.

Is there any way to get out of this situation?  I can’t delete it or update it.  This table has about 1 billion rows in it.

Thank you,

Greg Saylor
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: write time corrupted and not sure how

Posted by Jeff Jirsa <jj...@gmail.com>.

There are certainly cases where corruption has happened in cassandra (rare,
thankfully), but like I mentioned, I'm not aware of any that only corrupted
timestamps. It wouldn't surprise me to see a really broken clock, and it
wouldnt' surprise me to see bit flips on bad hardware (even hardware with
checksumming isn't guaranteed to catch all corruption issues, but if you're
using non-ECC RAM, of if you're doing TCP offloading on the NICs, bits
definitely flip from time to time).

Good luck.


On Thu, Aug 17, 2017 at 10:09 AM, Greg Saylor <gr...@net-virtual.com> wrote:

> Thanks for your help, I wrote a script to cycle through these early
> records and try to update them (some columns were missing, but could be
> gleaned from another db), then do the update, re-read, and if its not
> correct figure out the write time and re-issue the update with a timestamp
> + 1.  We’re exporting the data to a cluster so we can bring it into one
> with murmur3 partition, so hopefully this will address it.
>
> I don’t think this was a time shift thing, so far I’ve found 4 records in
> 2013 and one of them has a date like 3673-01-26 16:46:00 +0000 - that would
> be quite a clock skew :)
>
> Thanks for the thoughtful reply.
>
> - Greg
>
> > On Aug 17, 2017, at 9:22 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> >
> > It's a long, so you can't grab it with readInt - 8 bytes instead of 4
> >
> > You can delete it by issuing a delete with an explicit time stamp at
> least 1 higher the. The timestamp on the cell
> >
> > DELETE FROM table USING TIMESTAMP=? WHERE ....
> >
> > https://cassandra.apache.org/doc/latest/cql/dml.html#delete
> >
> > This could happen if - in the past - one of your clients or servers had
> a very incorrect clock. I'm not aware of any bugs that corrupted timestamp
> anytime in the past 6-7 years of the database, but my memory isn't perfect.
> >
> >
> > --
> > Jeff Jirsa
> >
> >
> > On Aug 17, 2017, at 1:45 AM, Greg Saylor <gr...@net-virtual.com> wrote:
> >
> >> Hello,
> >>
> >> We have a Cassandra database that is about 5 years old and has gone
> through multiple upgrades.   Today I noticed a very odd thing (current
> timestamp would be around 1502957436214912):
> >>
> >>   cqlsh:siq_prod> select id,account_id,sweep_id from items where
> id=34681132;
> >>
> >>    id       | account_id | sweep_id
> >>   ----------+------------+----------
> >>    34681132 |      13896 |
> >>
> >> I then attempted to delete it:
> >>
> >>
> >> cqlsh:siq_prod> delete from items where id=34681132;
> >>
> >> But its still there, so I thought I’d look at the writteime of sweep_id:
> >>
> >> cqlsh:siq_prod> select id,account_id,sweep_id,writetime(sweep_id) from
> items where id=34681132;
> >>
> >> id       | account_id | sweep_id | writetime(sweep_id)
> >> ----------+------------+----------+---------------------
> >> 34681132 |       null |          |    1718969631988312
> >>
> >> That is Friday, June 21, 2024 11:33:51.988 AM
> >>
> >> Is there any way to get rid of this record or update the writetime?
> I’ve done a look around the database and there are many more examples of
> this.  There’s nothing we can think of that would have caused this, this
> record was inserted back in 2013 and there are other records within seconds
> of that one which are just fine.
> >>
> >> I suspect something must have gone awry during an upgrade or there was
> a subtle bug in the version of Cassandra we were running at the time.
> >>
> >> What started down this path was a tool an engineer was running that was
> written in NodeJS, apparently the cassandra driver for Node can’t parse
> this:
> >>
> >> { RangeError: Index out of range
> >>    at checkOffset (buffer.js:821:11)
> >>    at Buffer.readInt32BE (buffer.js:986:5)
> >>
> >> Most other drivers I’ve tested just return nil.
> >>
> >> Is there any way to get out of this situation?  I can’t delete it or
> update it.  This table has about 1 billion rows in it.
> >>
> >> Thank you,
> >>
> >> Greg Saylor
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> >> For additional commands, e-mail: user-help@cassandra.apache.org
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>

Re: write time corrupted and not sure how

Posted by Greg Saylor <gr...@net-virtual.com>.

Thanks for your help, I wrote a script to cycle through these early records and try to update them (some columns were missing, but could be gleaned from another db), then do the update, re-read, and if its not correct figure out the write time and re-issue the update with a timestamp + 1.  We’re exporting the data to a cluster so we can bring it into one with murmur3 partition, so hopefully this will address it.

I don’t think this was a time shift thing, so far I’ve found 4 records in 2013 and one of them has a date like 3673-01-26 16:46:00 +0000 - that would be quite a clock skew :)

Thanks for the thoughtful reply.

- Greg

> On Aug 17, 2017, at 9:22 AM, Jeff Jirsa <jj...@gmail.com> wrote:
> 
> It's a long, so you can't grab it with readInt - 8 bytes instead of 4
> 
> You can delete it by issuing a delete with an explicit time stamp at least 1 higher the. The timestamp on the cell
> 
> DELETE FROM table USING TIMESTAMP=? WHERE .... 
> 
> https://cassandra.apache.org/doc/latest/cql/dml.html#delete
> 
> This could happen if - in the past - one of your clients or servers had a very incorrect clock. I'm not aware of any bugs that corrupted timestamp anytime in the past 6-7 years of the database, but my memory isn't perfect.
> 
> 
> -- 
> Jeff Jirsa
> 
> 
> On Aug 17, 2017, at 1:45 AM, Greg Saylor <gr...@net-virtual.com> wrote:
> 
>> Hello,
>> 
>> We have a Cassandra database that is about 5 years old and has gone through multiple upgrades.   Today I noticed a very odd thing (current timestamp would be around 1502957436214912):
>> 
>>   cqlsh:siq_prod> select id,account_id,sweep_id from items where id=34681132;
>> 
>>    id       | account_id | sweep_id
>>   ----------+------------+----------
>>    34681132 |      13896 |         
>> 
>> I then attempted to delete it:
>> 
>> 
>> cqlsh:siq_prod> delete from items where id=34681132;
>> 
>> But its still there, so I thought I’d look at the writteime of sweep_id:
>> 
>> cqlsh:siq_prod> select id,account_id,sweep_id,writetime(sweep_id) from items where id=34681132;
>> 
>> id       | account_id | sweep_id | writetime(sweep_id)
>> ----------+------------+----------+---------------------
>> 34681132 |       null |          |    1718969631988312
>> 
>> That is Friday, June 21, 2024 11:33:51.988 AM
>> 
>> Is there any way to get rid of this record or update the writetime?  I’ve done a look around the database and there are many more examples of this.  There’s nothing we can think of that would have caused this, this record was inserted back in 2013 and there are other records within seconds of that one which are just fine.
>> 
>> I suspect something must have gone awry during an upgrade or there was a subtle bug in the version of Cassandra we were running at the time.
>> 
>> What started down this path was a tool an engineer was running that was written in NodeJS, apparently the cassandra driver for Node can’t parse this:
>> 
>> { RangeError: Index out of range
>>    at checkOffset (buffer.js:821:11)
>>    at Buffer.readInt32BE (buffer.js:986:5)
>> 
>> Most other drivers I’ve tested just return nil.
>> 
>> Is there any way to get out of this situation?  I can’t delete it or update it.  This table has about 1 billion rows in it.
>> 
>> Thank you,
>> 
>> Greg Saylor
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
>> For additional commands, e-mail: user-help@cassandra.apache.org
>> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

Re: write time corrupted and not sure how

Posted by Jeff Jirsa <jj...@gmail.com>.

It's a long, so you can't grab it with readInt - 8 bytes instead of 4

You can delete it by issuing a delete with an explicit time stamp at least 1 higher the. The timestamp on the cell

DELETE FROM table USING TIMESTAMP=? WHERE .... 

https://cassandra.apache.org/doc/latest/cql/dml.html#delete

This could happen if - in the past - one of your clients or servers had a very incorrect clock. I'm not aware of any bugs that corrupted timestamp anytime in the past 6-7 years of the database, but my memory isn't perfect.


-- 
Jeff Jirsa


> On Aug 17, 2017, at 1:45 AM, Greg Saylor <gr...@net-virtual.com> wrote:
> 
> Hello,
> 
> We have a Cassandra database that is about 5 years old and has gone through multiple upgrades.   Today I noticed a very odd thing (current timestamp would be around 1502957436214912):
> 
>   cqlsh:siq_prod> select id,account_id,sweep_id from items where id=34681132;
> 
>    id       | account_id | sweep_id
>   ----------+------------+----------
>    34681132 |      13896 |         
> 
> I then attempted to delete it:
> 
> 
> cqlsh:siq_prod> delete from items where id=34681132;
> 
> But its still there, so I thought I’d look at the writteime of sweep_id:
> 
> cqlsh:siq_prod> select id,account_id,sweep_id,writetime(sweep_id) from items where id=34681132;
> 
> id       | account_id | sweep_id | writetime(sweep_id)
> ----------+------------+----------+---------------------
> 34681132 |       null |          |    1718969631988312
> 
> That is Friday, June 21, 2024 11:33:51.988 AM
> 
> Is there any way to get rid of this record or update the writetime?  I’ve done a look around the database and there are many more examples of this.  There’s nothing we can think of that would have caused this, this record was inserted back in 2013 and there are other records within seconds of that one which are just fine.
> 
> I suspect something must have gone awry during an upgrade or there was a subtle bug in the version of Cassandra we were running at the time.
> 
> What started down this path was a tool an engineer was running that was written in NodeJS, apparently the cassandra driver for Node can’t parse this:
> 
> { RangeError: Index out of range
>    at checkOffset (buffer.js:821:11)
>    at Buffer.readInt32BE (buffer.js:986:5)
> 
> Most other drivers I’ve tested just return nil.
> 
> Is there any way to get out of this situation?  I can’t delete it or update it.  This table has about 1 billion rows in it.
> 
> Thank you,
> 
> Greg Saylor
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>