You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Shashi Yachavaram <sh...@gmail.com> on 2017/11/02 22:12:21 UTC

sstablescrum fails with OOM

We are cassandra 2.0.17 and have corrupted sstables. Ran offline
sstablescrub but it fails with OOM. Increased the MAX_HEAP_SIZE to 8G it
still fails.

Can we move the corrupted sstable file and rerun sstablescrub followed by
repair.

-shashi..

Re: sstablescrum fails with OOM

Posted by kurt greaves <ku...@instaclustr.com>.

Try run nodetool refresh or restarting Cassandra after removing the
corrupted file

On 4 Nov. 2017 03:54, "Shashi Yachavaram" <sh...@gmail.com> wrote:

> When i tried to simulate this in the lab by moving files (mv
> KS-CF-ka-10143-* /tmp/files).
>
> Ran repair but it fails during snapshot creation. Where does it get
> the list of files and how do we update this list so we can get rid of
> corrupted files, update list/index and move on with offline scrub/repair.
>
> Thanks
>
> shashi
>
>
> On Thu, Nov 2, 2017 at 10:33 PM, Jeff Jirsa <jj...@gmail.com> wrote:
>
>> This is not guaranteed to be safe
>>
>> If the corrupted sstable has a tombstone past gc grace, and another
>> sstable has shadowed deleted data, removing the corrupt sstable will cause
>> the data to come back to life, and repair will spread it around the ring
>>
>> If that’s problematic to you, you should consider the entire node failed,
>> run repair among the surviving replicas and then replace the down server
>>
>> If you don’t do deletes, and write with consistency higher than ONE,
>> there’s a bit less risk to removing a single sstable
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Nov 2, 2017, at 7:58 PM, sai krishnam raju potturi <
>> pskraju88@gmail.com> wrote:
>>
>> Yes. Move the corrupt sstable, and run a repair on this node, so that it
>> gets in sync with it's peers.
>>
>> On Thu, Nov 2, 2017 at 6:12 PM, Shashi Yachavaram <sh...@gmail.com>
>> wrote:
>>
>>> We are cassandra 2.0.17 and have corrupted sstables. Ran offline
>>> sstablescrub but it fails with OOM. Increased the MAX_HEAP_SIZE to 8G it
>>> still fails.
>>>
>>> Can we move the corrupted sstable file and rerun sstablescrub followed
>>> by repair.
>>>
>>> -shashi..
>>>
>>
>>
>

Re: sstablescrum fails with OOM

Posted by Shashi Yachavaram <sh...@gmail.com>.

When i tried to simulate this in the lab by moving files (mv
KS-CF-ka-10143-* /tmp/files).

Ran repair but it fails during snapshot creation. Where does it get
the list of files and how do we update this list so we can get rid of
corrupted files, update list/index and move on with offline scrub/repair.

Thanks

shashi


On Thu, Nov 2, 2017 at 10:33 PM, Jeff Jirsa <jj...@gmail.com> wrote:

> This is not guaranteed to be safe
>
> If the corrupted sstable has a tombstone past gc grace, and another
> sstable has shadowed deleted data, removing the corrupt sstable will cause
> the data to come back to life, and repair will spread it around the ring
>
> If that’s problematic to you, you should consider the entire node failed,
> run repair among the surviving replicas and then replace the down server
>
> If you don’t do deletes, and write with consistency higher than ONE,
> there’s a bit less risk to removing a single sstable
>
>
> --
> Jeff Jirsa
>
>
> On Nov 2, 2017, at 7:58 PM, sai krishnam raju potturi <ps...@gmail.com>
> wrote:
>
> Yes. Move the corrupt sstable, and run a repair on this node, so that it
> gets in sync with it's peers.
>
> On Thu, Nov 2, 2017 at 6:12 PM, Shashi Yachavaram <sh...@gmail.com>
> wrote:
>
>> We are cassandra 2.0.17 and have corrupted sstables. Ran offline
>> sstablescrub but it fails with OOM. Increased the MAX_HEAP_SIZE to 8G it
>> still fails.
>>
>> Can we move the corrupted sstable file and rerun sstablescrub followed by
>> repair.
>>
>> -shashi..
>>
>
>

Re: sstablescrum fails with OOM

Posted by Jeff Jirsa <jj...@gmail.com>.

This is not guaranteed to be safe

If the corrupted sstable has a tombstone past gc grace, and another sstable has shadowed deleted data, removing the corrupt sstable will cause the data to come back to life, and repair will spread it around the ring

If that’s problematic to you, you should consider the entire node failed, run repair among the surviving replicas and then replace the down server

If you don’t do deletes, and write with consistency higher than ONE, there’s a bit less risk to removing a single sstable


-- 
Jeff Jirsa


> On Nov 2, 2017, at 7:58 PM, sai krishnam raju potturi <ps...@gmail.com> wrote:
> 
> Yes. Move the corrupt sstable, and run a repair on this node, so that it gets in sync with it's peers.
> 
>> On Thu, Nov 2, 2017 at 6:12 PM, Shashi Yachavaram <sh...@gmail.com> wrote:
>> We are cassandra 2.0.17 and have corrupted sstables. Ran offline sstablescrub but it fails with OOM. Increased the MAX_HEAP_SIZE to 8G it still fails. 
>> 
>> Can we move the corrupted sstable file and rerun sstablescrub followed by repair.
>> 
>> -shashi..
>

Re: sstablescrum fails with OOM

Posted by sai krishnam raju potturi <ps...@gmail.com>.

Yes. Move the corrupt sstable, and run a repair on this node, so that it
gets in sync with it's peers.

On Thu, Nov 2, 2017 at 6:12 PM, Shashi Yachavaram <sh...@gmail.com>
wrote:

> We are cassandra 2.0.17 and have corrupted sstables. Ran offline
> sstablescrub but it fails with OOM. Increased the MAX_HEAP_SIZE to 8G it
> still fails.
>
> Can we move the corrupted sstable file and rerun sstablescrub followed by
> repair.
>
> -shashi..
>