You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Maxim Parkachov <la...@gmail.com> on 2020/12/16 08:42:08 UTC

Repairs on table with daily full load

Hi everyone,

There are a lot of articles, and, probably this question was asked already
many times, but I still not 100% sure.

We have a table, which we load almost full every night with spark job and
consistency LOCAL_QUORUM and record TTL 7 days. This is to remove some
records if they are not present in last 7 imports. Table is located in 2
DCs. We are interested only in the last record state. Definition of the
table below. After the load, we are running repair with reaper on this
table, which takes lot of time and resources. We have multiple such tables
and most of the repair time is busy with such tables. Running full load
again takes less time than repair on this table.

Question is: Do we, actually, need to run repairs on this table at all ? If
yes, how offten, daily, weekly ?

Thanks in advance,
Maxim.

WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class':
'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
    AND compression = {'chunk_length_in_kb': '16', 'class':
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

Re: Repairs on table with daily full load

Posted by Jeff Jirsa <jj...@gmail.com>.
Or, write with LOCAL_QUORUM, and only do a repair if you're going to
replace a host (if one host fails, repair the surviving replicas before you
bootstrap the replacement), and let read repairs handle consistency. This
is only strictly safe because if all of your writes are TTL'd and you never
delete anything explicitly.



On Thu, Dec 17, 2020 at 4:23 PM Elliott Sims <el...@backblaze.com> wrote:

> Are you running with RF=3 and QUORUM on both read and write?
> If so, I think as long as your fill job reports errors and retries you can
> probably get away without repairing.
> You can also hedge your bets by doing the data load with ALL, though of
> course that has an availability tradeoff.
>
> Personally, I'd probably look at running the initial load with ALL,
> falling back on QUORUM and recording which data had to fall back.  That way
> you'll know if there were inconsistencies and can correct them manually
> (full repair or rebuild of a host that was down, or replaying the write
> with ALL later), but without adding significant overhead to the process.
>
> On Wed, Dec 16, 2020 at 12:43 AM Maxim Parkachov <la...@gmail.com>
> wrote:
>
>> Hi everyone,
>>
>> There are a lot of articles, and, probably this question was asked
>> already many times, but I still not 100% sure.
>>
>> We have a table, which we load almost full every night with spark job and
>> consistency LOCAL_QUORUM and record TTL 7 days. This is to remove some
>> records if they are not present in last 7 imports. Table is located in 2
>> DCs. We are interested only in the last record state. Definition of the
>> table below. After the load, we are running repair with reaper on this
>> table, which takes lot of time and resources. We have multiple such tables
>> and most of the repair time is busy with such tables. Running full load
>> again takes less time than repair on this table.
>>
>> Question is: Do we, actually, need to run repairs on this table at all ?
>> If yes, how offten, daily, weekly ?
>>
>> Thanks in advance,
>> Maxim.
>>
>> WITH bloom_filter_fp_chance = 0.01
>>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>>     AND comment = ''
>>     AND compaction = {'class':
>> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>>     AND compression = {'chunk_length_in_kb': '16', 'class':
>> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>>     AND crc_check_chance = 1.0
>>     AND dclocal_read_repair_chance = 0.1
>>     AND default_time_to_live = 0
>>     AND gc_grace_seconds = 864000
>>     AND max_index_interval = 2048
>>     AND memtable_flush_period_in_ms = 0
>>     AND min_index_interval = 128
>>     AND read_repair_chance = 0.0
>>     AND speculative_retry = '99PERCENTILE';
>>
>

Re: Repairs on table with daily full load

Posted by Elliott Sims <el...@backblaze.com>.
Are you running with RF=3 and QUORUM on both read and write?
If so, I think as long as your fill job reports errors and retries you can
probably get away without repairing.
You can also hedge your bets by doing the data load with ALL, though of
course that has an availability tradeoff.

Personally, I'd probably look at running the initial load with ALL, falling
back on QUORUM and recording which data had to fall back.  That way you'll
know if there were inconsistencies and can correct them manually (full
repair or rebuild of a host that was down, or replaying the write with ALL
later), but without adding significant overhead to the process.

On Wed, Dec 16, 2020 at 12:43 AM Maxim Parkachov <la...@gmail.com>
wrote:

> Hi everyone,
>
> There are a lot of articles, and, probably this question was asked already
> many times, but I still not 100% sure.
>
> We have a table, which we load almost full every night with spark job and
> consistency LOCAL_QUORUM and record TTL 7 days. This is to remove some
> records if they are not present in last 7 imports. Table is located in 2
> DCs. We are interested only in the last record state. Definition of the
> table below. After the load, we are running repair with reaper on this
> table, which takes lot of time and resources. We have multiple such tables
> and most of the repair time is busy with such tables. Running full load
> again takes less time than repair on this table.
>
> Question is: Do we, actually, need to run repairs on this table at all ?
> If yes, how offten, daily, weekly ?
>
> Thanks in advance,
> Maxim.
>
> WITH bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class':
> 'org.apache.cassandra.db.compaction.LeveledCompactionStrategy'}
>     AND compression = {'chunk_length_in_kb': '16', 'class':
> 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
>