You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Roy Burstein <bu...@gmail.com> on 2019/05/06 14:20:06 UTC

Corrupted sstables

Hi ,
We are having issues with Cassandra 3.11.4 , after adding node to the
cluster we get many corrupted files across the cluster (almost all nodes)
,this is reproducible in our env.  .
We  have 69 nodes in the cluster ,disk_access_mode: standard .

The stack trace :

WARN  [ReadStage-4] 2019-05-06 06:44:19,843
AbstractLocalAwareExecutorService.java:167 - Uncaught exception on
thread Thread[ReadStage-4,5,main]: {}
java.lang.RuntimeException:
org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted:
/var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
ndex.db
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2588)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0-zing_19.03.0.0]
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134)
[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114)
[apache-cassandra-3.11.4.jar:3.11.4]
        at java.lang.Thread.run(Thread.java:748) [na:1.8.0-zing_19.03.0.0]
Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException:
Corrupted: /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-Index.db
        at org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:275)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1586)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:64)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.initializeIterator(UnfilteredRowIteratorWithLowerBound.java:108)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:99)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:119)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:525)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:385)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.rows.UnfilteredRowIterator.isEmpty(UnfilteredRowIterator.java:67)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.SinglePartitionReadCommand.withSSTablesIterated(SinglePartitionReadCommand.java:853)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:797)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:670)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(SinglePartitionReadCommand.java:504)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:423)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1874)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2584)
~[apache-cassandra-3.11.4.jar:3.11.4]
        ... 5 common frames omitted

Caused by: java.io.EOFException: EOF after 508 bytes out of 1154
        at org.apache.cassandra.io.util.DataInputPlus.skipBytesFully(DataInputPlus.java:58)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.RowIndexEntry$Serializer.skipPromotedIndex(RowIndexEntry.java:385)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.db.RowIndexEntry$Serializer.skip(RowIndexEntry.java:376)
~[apache-cassandra-3.11.4.jar:3.11.4]
        at org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:269)
~[apache-cassandra-3.11.4.jar:3.11.4]

Thanks,
Roy

Re: Corrupted sstables

Posted by Roy Burstein <bu...@gmail.com>.
It happened  on all the servers in the cluster every time I have added node
.
This is new cluster nothing was upgraded here , we have a similar cluster
running on C* 2.1.15 with no issues .
We are aware to the scrub utility just it reproduce every time we added
node to the cluster .

We have many tables therethe DDL of the corrupted sstables looks the same:
CREATE TABLE rawdata.a1 (
    session_start_time_timeslice bigint,
    uid_bucket int,
    vid_bucket int,
    pid int,
    uid text,
    sid bigint,
    vid bigint,
    data_type text,
    data_id bigint,
    data blob,
    PRIMARY KEY ((session_start_time_timeslice, uid_bucket, vid_bucket),
pid, uid, sid, vid, data_type, data_id)
) WITH CLUSTERING ORDER BY (pid ASC, uid ASC, sid ASC, vid ASC, data_type
ASC, data_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.2
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

CREATE TABLE rawdata.a2 (
    session_start_time_timeslice bigint,
    uid_bucket int,
    vid_bucket int,
    pid int,
    uid text,
    sid bigint,
    vid bigint,
    data_type text,
    data_id bigint,
    data blob,
    PRIMARY KEY ((session_start_time_timeslice, uid_bucket, vid_bucket),
pid, uid, sid, vid, data_type, data_id)
) WITH CLUSTERING ORDER BY (pid ASC, uid ASC, sid ASC, vid ASC, data_type
ASC, data_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.2
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

CREATE TABLE rawdata.a3 (
    session_start_time_timeslice bigint,
    uid_bucket int,
    vid_bucket int,
    pid int,
    uid text,
    sid bigint,
    vid bigint,
    data_type text,
    data_id bigint,
    data blob,
    PRIMARY KEY ((session_start_time_timeslice, uid_bucket, vid_bucket),
pid, uid, sid, vid, data_type, data_id)
) WITH CLUSTERING ORDER BY (pid ASC, uid ASC, sid ASC, vid ASC, data_type
ASC, data_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.2
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';


CREATE TABLE rawdata.a4 (
    session_start_time_timeslice bigint,
    uid_bucket int,
    vid_bucket int,
    pid int,
    uid text,
    sid bigint,
    vid bigint,
    data_type text,
    data_id bigint,
    data blob,
    PRIMARY KEY ((session_start_time_timeslice, uid_bucket, vid_bucket),
pid, uid, sid, vid, data_type, data_id)
) WITH CLUSTERING ORDER BY (pid ASC, uid ASC, sid ASC, vid ASC, data_type
ASC, data_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class':
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy',
'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.2
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';



On Mon, May 6, 2019 at 9:44 PM Jeff Jirsa <jj...@gmail.com> wrote:

> Before you scrub, from which version were you upgrading and can you post
> a(n anonymized) schema?
>
> --
> Jeff Jirsa
>
>
> On May 6, 2019, at 11:37 AM, Nitan Kainth <ni...@gmail.com> wrote:
>
> Did you try sstablescrub?
> If that doesn't work, you can delete all files of this sstable id and then
> run repair -pr on this node.
>
> On Mon, May 6, 2019 at 9:20 AM Roy Burstein <bu...@gmail.com>
> wrote:
>
>> Hi ,
>> We are having issues with Cassandra 3.11.4 , after adding node to the
>> cluster we get many corrupted files across the cluster (almost all nodes)
>> ,this is reproducible in our env.  .
>> We  have 69 nodes in the cluster ,disk_access_mode: standard .
>>
>> The stack trace :
>>
>> WARN  [ReadStage-4] 2019-05-06 06:44:19,843 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-4,5,main]: {}
>> java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
>> ndex.db
>>         at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2588) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0-zing_19.03.0.0]
>>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0-zing_19.03.0.0]
>> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-Index.db
>>         at org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:275) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1586) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:64) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.initializeIterator(UnfilteredRowIteratorWithLowerBound.java:108) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:99) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:119) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:525) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:385) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIterator.isEmpty(UnfilteredRowIterator.java:67) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.SinglePartitionReadCommand.withSSTablesIterated(SinglePartitionReadCommand.java:853) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:797) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:670) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(SinglePartitionReadCommand.java:504) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:423) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1874) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2584) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         ... 5 common frames omitted
>>
>> Caused by: java.io.EOFException: EOF after 508 bytes out of 1154
>>         at org.apache.cassandra.io.util.DataInputPlus.skipBytesFully(DataInputPlus.java:58) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.RowIndexEntry$Serializer.skipPromotedIndex(RowIndexEntry.java:385) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.RowIndexEntry$Serializer.skip(RowIndexEntry.java:376) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:269) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>
>> Thanks,
>> Roy
>>
>

Re: Corrupted sstables

Posted by Jeff Jirsa <jj...@gmail.com>.
Before you scrub, from which version were you upgrading and can you post a(n anonymized) schema?

-- 
Jeff Jirsa


> On May 6, 2019, at 11:37 AM, Nitan Kainth <ni...@gmail.com> wrote:
> 
> Did you try sstablescrub?
> If that doesn't work, you can delete all files of this sstable id and then run repair -pr on this node.
> 
>> On Mon, May 6, 2019 at 9:20 AM Roy Burstein <bu...@gmail.com> wrote:
>> Hi , 
>> We are having issues with Cassandra 3.11.4 , after adding node to the cluster we get many corrupted files across the cluster (almost all nodes) ,this is reproducible in our env.  .
>> We  have 69 nodes in the cluster ,disk_access_mode: standard . 
>> 
>> The stack trace : 
>> WARN  [ReadStage-4] 2019-05-06 06:44:19,843 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-4,5,main]: {}
>> java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
>> ndex.db
>>         at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2588) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0-zing_19.03.0.0]
>>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114) [apache-cassandra-3.11.4.jar:3.11.4]
>>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0-zing_19.03.0.0]
>> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-Index.db
>>         at org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:275) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1586) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:64) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.initializeIterator(UnfilteredRowIteratorWithLowerBound.java:108) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:99) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:119) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:525) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:385) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.rows.UnfilteredRowIterator.isEmpty(UnfilteredRowIterator.java:67) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.SinglePartitionReadCommand.withSSTablesIterated(SinglePartitionReadCommand.java:853) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:797) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:670) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(SinglePartitionReadCommand.java:504) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:423) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1874) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2584) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         ... 5 common frames omitted
>> Caused by: java.io.EOFException: EOF after 508 bytes out of 1154
>>         at org.apache.cassandra.io.util.DataInputPlus.skipBytesFully(DataInputPlus.java:58) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.RowIndexEntry$Serializer.skipPromotedIndex(RowIndexEntry.java:385) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.db.RowIndexEntry$Serializer.skip(RowIndexEntry.java:376) ~[apache-cassandra-3.11.4.jar:3.11.4]
>>         at org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:269) ~[apache-cassandra-3.11.4.jar:3.11.4]
>> Thanks,
>> Roy

Re: Corrupted sstables

Posted by Nitan Kainth <ni...@gmail.com>.
Did you try sstablescrub?
If that doesn't work, you can delete all files of this sstable id and then
run repair -pr on this node.

On Mon, May 6, 2019 at 9:20 AM Roy Burstein <bu...@gmail.com> wrote:

> Hi ,
> We are having issues with Cassandra 3.11.4 , after adding node to the
> cluster we get many corrupted files across the cluster (almost all nodes)
> ,this is reproducible in our env.  .
> We  have 69 nodes in the cluster ,disk_access_mode: standard .
>
> The stack trace :
>
> WARN  [ReadStage-4] 2019-05-06 06:44:19,843 AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread Thread[ReadStage-4,5,main]: {}
> java.lang.RuntimeException: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-I
> ndex.db
>         at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2588) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0-zing_19.03.0.0]
>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:162) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:134) [apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:114) [apache-cassandra-3.11.4.jar:3.11.4]
>         at java.lang.Thread.run(Thread.java:748) [na:1.8.0-zing_19.03.0.0]
> Caused by: org.apache.cassandra.io.sstable.CorruptSSTableException: Corrupted: /var/lib/cassandra/data/disk1/sessions_rawdata/sessions_v2_2019_05_06-9cae0c20585411e99aa867a11519e31c/md-816-big-Index.db
>         at org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:275) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.io.sstable.format.SSTableReader.getPosition(SSTableReader.java:1586) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.io.sstable.format.big.BigTableReader.iterator(BigTableReader.java:64) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.initializeIterator(UnfilteredRowIteratorWithLowerBound.java:108) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.computeNext(LazilyInitializedUnfilteredRowIterator.java:99) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:119) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound.computeNext(UnfilteredRowIteratorWithLowerBound.java:48) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:374) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:186) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:155) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:525) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:385) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.rows.UnfilteredRowIterator.isEmpty(UnfilteredRowIterator.java:67) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.SinglePartitionReadCommand.withSSTablesIterated(SinglePartitionReadCommand.java:853) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:797) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:670) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(SinglePartitionReadCommand.java:504) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:423) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1874) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2584) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         ... 5 common frames omitted
>
> Caused by: java.io.EOFException: EOF after 508 bytes out of 1154
>         at org.apache.cassandra.io.util.DataInputPlus.skipBytesFully(DataInputPlus.java:58) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.RowIndexEntry$Serializer.skipPromotedIndex(RowIndexEntry.java:385) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.db.RowIndexEntry$Serializer.skip(RowIndexEntry.java:376) ~[apache-cassandra-3.11.4.jar:3.11.4]
>         at org.apache.cassandra.io.sstable.format.big.BigTableReader.getPosition(BigTableReader.java:269) ~[apache-cassandra-3.11.4.jar:3.11.4]
>
> Thanks,
> Roy
>