You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeremy Hanna (Jira)" <ji...@apache.org> on 2019/09/03 17:58:00 UTC
[jira] [Commented] (CASSANDRA-15298) Cassandra node cannot be restored using documented backup method

    [ https://issues.apache.org/jira/browse/CASSANDRA-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921614#comment-16921614 ] 

Jeremy Hanna commented on CASSANDRA-15298:
------------------------------------------

The summary is a little misleading in that a restore works for many cases but has a gap when the schema has dropped columns and the sstables still contain values for those dropped columns.  My guess is that if you run offline scrub on those sstables, it will clear out the dropped columns and you'll be able to run sstableloader with the files with the given schema.

> Cassandra node cannot be restored using documented backup method
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-15298
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15298
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Charlemange Lasse
>            Priority: Normal
>
> I have a single cassandra 3.11.4 node. It contains various tables and UDFs. The [documentation describes a method to backup this node|https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsBackupTakesSnapshot.html]:
>  * use "DESCRIBE SCHEMA" in cqlsh to get the schema
>  * create a snapshot using nodetool
>  * copy the snapshot + schema to a new (completely disconnected) node
>  * load schema into new node
>  * load sstables again using nodetool
> But this is a complete bogus method. It will result in errors like: 
>  
> {noformat}
> java.lang.RuntimeException: Unknown column deleted_column during deserialization {noformat}
> And all data in this column is now lost.
> Problem is that the "DESCRIBE SCHEMA" CQL doesn't add the stuff correctly for already deleted (but still existing columns) to the schema. It looks for example like:
> {noformat}
> CREATE TABLE mykeyspace.testcf (
>     primary_uuid uuid,
>     secondary_uuid uuid,
>     name text,
>     PRIMARY KEY (main_uuid, secondary_uuid)
> ) WITH CLUSTERING ORDER BY (secondary_uuid ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
> {noformat}
> But it must actually look like:
> {noformat}
> CREATE TABLE IF NOT EXISTS mykeyspace.testcf (
>         primary_uuid uuid,
>         secondary_uuid uuid,
>         name text,
>         deleted_column boolean,
>         PRIMARY KEY (main_uuid, secondary_uuid)
>         WITH ID = a1afdd4d-b61e-4f2a-b806-57c296be3948
>         AND CLUSTERING ORDER BY (ap_uuid ASC)
>         AND bloom_filter_fp_chance = 0.01
>         AND dclocal_read_repair_chance = 0.1
>         AND crc_check_chance = 1.0
>         AND default_time_to_live = 0
>         AND gc_grace_seconds = 864000
>         AND min_index_interval = 128
>         AND max_index_interval = 2048
>         AND memtable_flush_period_in_ms = 0
>         AND read_repair_chance = 0.0
>         AND speculative_retry = '99PERCENTILE'
>         AND comment = ''
>         AND caching = { 'keys': 'ALL', 'rows_per_partition': 'NONE' }
>         AND compaction = { 'max_threshold': '32', 'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
>         AND compression = { 'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor' }
>         AND cdc = false
>         AND extensions = {  };
> ALTER TABLE mykeyspace.testcf DROP deleted_column USING TIMESTAMP 1563978151561000;
> {noformat}
> This was taken from the snapshot's (column family specific) schema.cql. Which of course is not compatible with the main schema because it will only create the tables when they don't exist (which they are because the main "DESCRIBE SCHEMA" file already creates them) and is missing all other kind of stuff like UDFs.
> It is currently not possible (using the builtin mechanisms from cassandra 3.11.4) to migrate a keyspace from one separated server to another separated server.
> This behavior also breaks various backup systems which try to store cassandra cluster information to an offline storage.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org