You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mathijs Vogelzang <ma...@apptornado.com> on 2013/12/11 15:27:48 UTC

Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Hi all,

We're running into a weird problem trying to migrate our data from a
1.2.10 cluster to a 2.0.3 one.

I've taken a snapshot on the old cluster, and for each host there, I'm running
sstableloader -d <host of new cluster> KEYSPACE/COLUMNFAMILY
(the sstableloader process from the 2.0.3 distribution, the one from
1.2.10 only gets java.lang.RuntimeException: java.io.IOException:
Connection reset by peer)

it then copies the data successfully but when checking the data i
noticed some rows seem to be missing. It turned out the data is not
missing, but has been tombstoned.
When I use sstable2json on the sstable on the destination cluster, it has
"metadata": {"deletionInfo":
{"markedForDeleteAt":1796952039620607,"localDeletionTime":0}}, whereas
it doesn't have that in the source sstable.
(Yes, this is a timestamp far into the future. All our hosts are
properly synced through ntp).

This has happened for a bunch of random rows. How is this possible?
Naturally, copying the data again doesn't work to fix it, as the
tombstone is far in the future. Apart from not having this happen at
all, how can it be fixed?

Best regards,

Mathijs

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by "olek.stasiak@gmail.com" <ol...@gmail.com>.
Ok, but will upgrade "resurrect" my data? Or maybe I should perform
additional action to bring my system to correct state?

best regards

Aleksander
3 lut 2014 17:08 "Yuki Morishita" <mo...@gmail.com> napisał(a):

> if you are using < 2.0.4, then you are hitting
> https://issues.apache.org/jira/browse/CASSANDRA-6527
>
>
> On Mon, Feb 3, 2014 at 2:51 AM, olek.stasiak@gmail.com
> <ol...@gmail.com> wrote:
> > Hi All,
> > We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
> > 1.2.10). Probably after upgradesstable  (but it's only a guess,
> > because we noticed problem few weeks later), some rows became
> > tombstoned. They just disappear from results of queries. After
> > inverstigation I've noticed, that they are reachable via sstable2json.
> > Example output for "non-existent" row:
> >
> > {"key": "6e6e37716c6d665f6f61695f6463","metadata": {"deletionInfo":
> > {"markedForDeleteAt":2201170739199,"localDeletionTime":0}},"columns":
> > [["DATA","3c6f61695f64633a64(...)",1357677928108]]}
> > ]
> >
> > If I understand correctly row is marked as deleted with timestamp in
> > the far future, but it's still on the disk. Also localDeletionTime is
> > set to 0, which may means, that it's kind of internal bug, not effect
> > of client error. So my question is: is it true, that upgradesstable
> > may do soemthing like that? How to find reasons for such strange
> > cassandra behaviour? Is there any option of recovering such strange
> > marked nodes?
> > This problem touches about 500K rows of all 14M in our database, so
> > the percentage is quite big.
> > best regards
> > Aleksander
> >
> > 2013-12-12 Robert Coli <rc...@eventbrite.com>:
> >> On Wed, Dec 11, 2013 at 6:27 AM, Mathijs Vogelzang <
> mathijs@apptornado.com>
> >> wrote:
> >>>
> >>> When I use sstable2json on the sstable on the destination cluster, it
> has
> >>> "metadata": {"deletionInfo":
> >>> {"markedForDeleteAt":1796952039620607,"localDeletionTime":0}}, whereas
> >>> it doesn't have that in the source sstable.
> >>> (Yes, this is a timestamp far into the future. All our hosts are
> >>> properly synced through ntp).
> >>
> >>
> >> This seems like a bug in sstableloader, I would report it on JIRA.
> >>
> >>>
> >>> Naturally, copying the data again doesn't work to fix it, as the
> >>> tombstone is far in the future. Apart from not having this happen at
> >>> all, how can it be fixed?
> >>
> >>
> >> Briefly, you'll want to purge that tombstone and then reload the data
> with a
> >> reasonable timestamp.
> >>
> >> Dealing with rows with data (and tombstones) in the far future is
> described
> >> in detail here :
> >>
> >>
> http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html
> >>
> >> =Rob
> >>
>
>
>
> --
> Yuki Morishita
>  t:yukim (http://twitter.com/yukim)
>

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by Yuki Morishita <mo...@gmail.com>.
if you are using < 2.0.4, then you are hitting
https://issues.apache.org/jira/browse/CASSANDRA-6527


On Mon, Feb 3, 2014 at 2:51 AM, olek.stasiak@gmail.com
<ol...@gmail.com> wrote:
> Hi All,
> We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
> 1.2.10). Probably after upgradesstable  (but it's only a guess,
> because we noticed problem few weeks later), some rows became
> tombstoned. They just disappear from results of queries. After
> inverstigation I've noticed, that they are reachable via sstable2json.
> Example output for "non-existent" row:
>
> {"key": "6e6e37716c6d665f6f61695f6463","metadata": {"deletionInfo":
> {"markedForDeleteAt":2201170739199,"localDeletionTime":0}},"columns":
> [["DATA","3c6f61695f64633a64(...)",1357677928108]]}
> ]
>
> If I understand correctly row is marked as deleted with timestamp in
> the far future, but it's still on the disk. Also localDeletionTime is
> set to 0, which may means, that it's kind of internal bug, not effect
> of client error. So my question is: is it true, that upgradesstable
> may do soemthing like that? How to find reasons for such strange
> cassandra behaviour? Is there any option of recovering such strange
> marked nodes?
> This problem touches about 500K rows of all 14M in our database, so
> the percentage is quite big.
> best regards
> Aleksander
>
> 2013-12-12 Robert Coli <rc...@eventbrite.com>:
>> On Wed, Dec 11, 2013 at 6:27 AM, Mathijs Vogelzang <ma...@apptornado.com>
>> wrote:
>>>
>>> When I use sstable2json on the sstable on the destination cluster, it has
>>> "metadata": {"deletionInfo":
>>> {"markedForDeleteAt":1796952039620607,"localDeletionTime":0}}, whereas
>>> it doesn't have that in the source sstable.
>>> (Yes, this is a timestamp far into the future. All our hosts are
>>> properly synced through ntp).
>>
>>
>> This seems like a bug in sstableloader, I would report it on JIRA.
>>
>>>
>>> Naturally, copying the data again doesn't work to fix it, as the
>>> tombstone is far in the future. Apart from not having this happen at
>>> all, how can it be fixed?
>>
>>
>> Briefly, you'll want to purge that tombstone and then reload the data with a
>> reasonable timestamp.
>>
>> Dealing with rows with data (and tombstones) in the far future is described
>> in detail here :
>>
>> http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html
>>
>> =Rob
>>



-- 
Yuki Morishita
 t:yukim (http://twitter.com/yukim)

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by "olek.stasiak@gmail.com" <ol...@gmail.com>.
Seems good. I'll discus it with data owners and we choose the best method.
Best regards,
Aleksander
4 lut 2014 19:40 "Robert Coli" <rc...@eventbrite.com> napisał(a):

> On Tue, Feb 4, 2014 at 12:21 AM, olek.stasiak@gmail.com <
> olek.stasiak@gmail.com> wrote:
>
>> I don't know what is the real cause of my problem. We are still guessing.
>> All operations I have done one cluster are described on timeline:
>> 1.1.7-> 1.2.10 -> upgradesstable -> 2.0.2 -> normal operations ->2.0.3
>> -> normal operations -> now
>> normal operations means reads/writes/repairs.
>> Could you please, describe briefly how to recover data? I have a
>> problem with scenario described under link:
>>
>> http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html,
>> I can't apply this solution to my case.
>>
>
> I think your only option is the following :
>
> 1) determine which SSTables contain rows have doomstones (tombstones from
> the far future)
> 2) determine whether these tombstones mask a live or dead version of the
> row, by looking at other row fragments
> 3) dump/filter/re-write all your data via some method, probably
> sstable2json/json2sstable
> 4) load the corrected sstables by starting a node with the sstables in the
> data directory
>
> I understand you have a lot of data, but I am pretty sure there is no way
> for you to fix it within Cassandra. Perhaps ask for advice on the JIRA
> ticket mentioned upthread if this answer is not sufficient?
>
> =Rob
>
>

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by Robert Coli <rc...@eventbrite.com>.
On Tue, Feb 4, 2014 at 12:21 AM, olek.stasiak@gmail.com <
olek.stasiak@gmail.com> wrote:

> I don't know what is the real cause of my problem. We are still guessing.
> All operations I have done one cluster are described on timeline:
> 1.1.7-> 1.2.10 -> upgradesstable -> 2.0.2 -> normal operations ->2.0.3
> -> normal operations -> now
> normal operations means reads/writes/repairs.
> Could you please, describe briefly how to recover data? I have a
> problem with scenario described under link:
>
> http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html,
> I can't apply this solution to my case.
>

I think your only option is the following :

1) determine which SSTables contain rows have doomstones (tombstones from
the far future)
2) determine whether these tombstones mask a live or dead version of the
row, by looking at other row fragments
3) dump/filter/re-write all your data via some method, probably
sstable2json/json2sstable
4) load the corrected sstables by starting a node with the sstables in the
data directory

I understand you have a lot of data, but I am pretty sure there is no way
for you to fix it within Cassandra. Perhaps ask for advice on the JIRA
ticket mentioned upthread if this answer is not sufficient?

=Rob

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by "olek.stasiak@gmail.com" <ol...@gmail.com>.
I don't know what is the real cause of my problem. We are still guessing.
All operations I have done one cluster are described on timeline:
1.1.7-> 1.2.10 -> upgradesstable -> 2.0.2 -> normal operations ->2.0.3
-> normal operations -> now
normal operations means reads/writes/repairs.
Could you please, describe briefly how to recover data? I have a
problem with scenario described under link:
http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html ,
I can't apply this solution to my case.
regards
Olek

2014-02-03 Robert Coli <rc...@eventbrite.com>:
> On Mon, Feb 3, 2014 at 2:17 PM, olek.stasiak@gmail.com
> <ol...@gmail.com> wrote:
>>
>> No, i've done repair after upgrade sstables. In fact it was about 4
>> weeks after, because of bug:
>
>
> If you only did a repair after you upgraded SSTables, when did you have an
> opportunity to hit :
>
> https://issues.apache.org/jira/browse/CASSANDRA-6527
>
> ... which relies on you having multiple versions of SStables while
> streaming?
>
> Did you do any operation which involves streaming? (Add/Remove/Replace a
> node?)
>
> =Rob
>

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by Robert Coli <rc...@eventbrite.com>.
On Mon, Feb 3, 2014 at 2:17 PM, olek.stasiak@gmail.com <
olek.stasiak@gmail.com> wrote:

> No, i've done repair after upgrade sstables. In fact it was about 4
> weeks after, because of bug:
>

If you only did a repair after you upgraded SSTables, when did you have an
opportunity to hit :

https://issues.apache.org/jira/browse/CASSANDRA-6527

... which relies on you having multiple versions of SStables while
streaming?

Did you do any operation which involves streaming? (Add/Remove/Replace a
node?)

=Rob

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by "olek.stasiak@gmail.com" <ol...@gmail.com>.
2014-02-03 Robert Coli <rc...@eventbrite.com>:
> On Mon, Feb 3, 2014 at 1:02 PM, olek.stasiak@gmail.com
> <ol...@gmail.com> wrote:
>>
>> Today I've noticed that oldest files with broken values appear during
>> repair (we do repair once a week on each node). Maybe it's the repair
>> operation, which caused data loss?
>
>
> Yes, unless you added or removed or replaced nodes, it would have to be the
> repair operation, which streams SSTables. Did you run the repair during the
> upgradesstables?

No, i've done repair after upgrade sstables. In fact it was about 4
weeks after, because of bug:
https://issues.apache.org/jira/browse/CASSANDRA-6277. We upgrded cass
to 2.0.2 and then after ca 1 month to 2.0.3 because of 6277. Then we
were able to do repair, so I set up cron to do it weekly on each node.
(it was about 10 dec 2013) the loss was discovered about new year's
eve.

>
>>
>> I've no idea. Currently our cluster
>> is runing 2.0.3 version.
>
>
> 2.0.3 has serious bugs, upgrade to 2.0.4 ASAP.
OK
>
>>
>> But our most crucial question is: can we recover loss, or should we
>> start to think how to re-gather them?
>
>
> If I were you, I would do the latter. You can to some extent recover them
> via manual processes dumping with sstable2json and so forth, but it will be
> quite painful.
>
> http://thelastpickle.com/2011/12/15/Anatomy-of-a-Cassandra-Partition/
>
> Contains an explanation of how one could deal with it.
Sorry, but I have to admit, that i can't transfer this solution to my
problem. Could you briefly describe steps I should perform to recover?
best regards
Aleksander

>
> =Rob
>
>
>
>>
>> best regards
>> Aleksander
>> ps. I like your link Rob, i'll pin it over my desk ;) In Oracle there
>> were a rule: never deploy RDBMS before release 2 ;)
>>
>> 2014-02-03 Robert Coli <rc...@eventbrite.com>:
>> > On Mon, Feb 3, 2014 at 12:51 AM, olek.stasiak@gmail.com
>> > <ol...@gmail.com> wrote:
>> >>
>> >> We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
>> >> 1.2.10). Probably after upgradesstable  (but it's only a guess,
>> >> because we noticed problem few weeks later), some rows became
>> >> tombstoned.
>> >
>> >
>> > To be clear, you didn't run SSTableloader at all? If so, this is the
>> > hypothetical case where normal streaming operations (replacing a node?
>> > what
>> > streaming did you do?) results in data loss...
>> >
>> > Also, CASSANDRA-6527 is a good reminder regarding the following :
>> >
>> >
>> > https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>> >
>> > =Rob
>
>

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by Robert Coli <rc...@eventbrite.com>.
On Mon, Feb 3, 2014 at 1:02 PM, olek.stasiak@gmail.com <
olek.stasiak@gmail.com> wrote:

> Today I've noticed that oldest files with broken values appear during
> repair (we do repair once a week on each node). Maybe it's the repair
> operation, which caused data loss?


Yes, unless you added or removed or replaced nodes, it would have to be the
repair operation, which streams SSTables. Did you run the repair during the
upgradesstables?


> I've no idea. Currently our cluster
> is runing 2.0.3 version.
>

2.0.3 has serious bugs, upgrade to 2.0.4 ASAP.


> But our most crucial question is: can we recover loss, or should we
> start to think how to re-gather them?
>

If I were you, I would do the latter. You can to some extent recover them
via manual processes dumping with sstable2json and so forth, but it will be
quite painful.

http://thelastpickle.com/2011/12/15/Anatomy-of-a-Cassandra-Partition/

Contains an explanation of how one could deal with it.

=Rob




> best regards
> Aleksander
> ps. I like your link Rob, i'll pin it over my desk ;) In Oracle there
> were a rule: never deploy RDBMS before release 2 ;)
>
> 2014-02-03 Robert Coli <rc...@eventbrite.com>:
> > On Mon, Feb 3, 2014 at 12:51 AM, olek.stasiak@gmail.com
> > <ol...@gmail.com> wrote:
> >>
> >> We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
> >> 1.2.10). Probably after upgradesstable  (but it's only a guess,
> >> because we noticed problem few weeks later), some rows became
> >> tombstoned.
> >
> >
> > To be clear, you didn't run SSTableloader at all? If so, this is the
> > hypothetical case where normal streaming operations (replacing a node?
> what
> > streaming did you do?) results in data loss...
> >
> > Also, CASSANDRA-6527 is a good reminder regarding the following :
> >
> >
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
> >
> > =Rob
>

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by "olek.stasiak@gmail.com" <ol...@gmail.com>.
Yes, I haven't run sstableloader. The data loss apperared somwhere on the line:
1.1.7-> 1.2.10 -> upgradesstable -> 2.0.2 -> normal operations ->2.0.3
normal operations -> now
Today I've noticed that oldest files with broken values appear during
repair (we do repair once a week on each node). Maybe it's the repair
operation, which caused data loss? I've no idea. Currently our cluster
is runing 2.0.3 version.
We can do some tests on data to give you all info to track the bug.
But our most crucial question is: can we recover loss, or should we
start to think how to re-gather them?
best regards
Aleksander
ps. I like your link Rob, i'll pin it over my desk ;) In Oracle there
were a rule: never deploy RDBMS before release 2 ;)

2014-02-03 Robert Coli <rc...@eventbrite.com>:
> On Mon, Feb 3, 2014 at 12:51 AM, olek.stasiak@gmail.com
> <ol...@gmail.com> wrote:
>>
>> We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
>> 1.2.10). Probably after upgradesstable  (but it's only a guess,
>> because we noticed problem few weeks later), some rows became
>> tombstoned.
>
>
> To be clear, you didn't run SSTableloader at all? If so, this is the
> hypothetical case where normal streaming operations (replacing a node? what
> streaming did you do?) results in data loss...
>
> Also, CASSANDRA-6527 is a good reminder regarding the following :
>
> https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/
>
> =Rob

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by Robert Coli <rc...@eventbrite.com>.
On Mon, Feb 3, 2014 at 12:51 AM, olek.stasiak@gmail.com <
olek.stasiak@gmail.com> wrote:

> We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
> 1.2.10). Probably after upgradesstable  (but it's only a guess,
> because we noticed problem few weeks later), some rows became
> tombstoned.


To be clear, you didn't run SSTableloader at all? If so, this is the
hypothetical case where normal streaming operations (replacing a node? what
streaming did you do?) results in data loss...

Also, CASSANDRA-6527 is a good reminder regarding the following :

https://engineering.eventbrite.com/what-version-of-cassandra-should-i-run/

=Rob

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by "olek.stasiak@gmail.com" <ol...@gmail.com>.
Hi All,
We've faced very similar effect after upgrade from 1.1.7 to 2.0 (via
1.2.10). Probably after upgradesstable  (but it's only a guess,
because we noticed problem few weeks later), some rows became
tombstoned. They just disappear from results of queries. After
inverstigation I've noticed, that they are reachable via sstable2json.
Example output for "non-existent" row:

{"key": "6e6e37716c6d665f6f61695f6463","metadata": {"deletionInfo":
{"markedForDeleteAt":2201170739199,"localDeletionTime":0}},"columns":
[["DATA","3c6f61695f64633a64(...)",1357677928108]]}
]

If I understand correctly row is marked as deleted with timestamp in
the far future, but it's still on the disk. Also localDeletionTime is
set to 0, which may means, that it's kind of internal bug, not effect
of client error. So my question is: is it true, that upgradesstable
may do soemthing like that? How to find reasons for such strange
cassandra behaviour? Is there any option of recovering such strange
marked nodes?
This problem touches about 500K rows of all 14M in our database, so
the percentage is quite big.
best regards
Aleksander

2013-12-12 Robert Coli <rc...@eventbrite.com>:
> On Wed, Dec 11, 2013 at 6:27 AM, Mathijs Vogelzang <ma...@apptornado.com>
> wrote:
>>
>> When I use sstable2json on the sstable on the destination cluster, it has
>> "metadata": {"deletionInfo":
>> {"markedForDeleteAt":1796952039620607,"localDeletionTime":0}}, whereas
>> it doesn't have that in the source sstable.
>> (Yes, this is a timestamp far into the future. All our hosts are
>> properly synced through ntp).
>
>
> This seems like a bug in sstableloader, I would report it on JIRA.
>
>>
>> Naturally, copying the data again doesn't work to fix it, as the
>> tombstone is far in the future. Apart from not having this happen at
>> all, how can it be fixed?
>
>
> Briefly, you'll want to purge that tombstone and then reload the data with a
> reasonable timestamp.
>
> Dealing with rows with data (and tombstones) in the far future is described
> in detail here :
>
> http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html
>
> =Rob
>

Re: Data tombstoned during bulk loading 1.2.10 -> 2.0.3

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Dec 11, 2013 at 6:27 AM, Mathijs Vogelzang
<ma...@apptornado.com>wrote:

> When I use sstable2json on the sstable on the destination cluster, it has
> "metadata": {"deletionInfo":
> {"markedForDeleteAt":1796952039620607,"localDeletionTime":0}}, whereas
> it doesn't have that in the source sstable.
> (Yes, this is a timestamp far into the future. All our hosts are
> properly synced through ntp).
>

This seems like a bug in sstableloader, I would report it on JIRA.


> Naturally, copying the data again doesn't work to fix it, as the
> tombstone is far in the future. Apart from not having this happen at
> all, how can it be fixed?
>

Briefly, you'll want to purge that tombstone and then reload the data with
a reasonable timestamp.

Dealing with rows with data (and tombstones) in the far future is described
in detail here :

http://thelastpickle.com/blog/2011/12/15/Anatomy-of-a-Cassandra-Partition.html

=Rob