You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeremy Hanna (Jira)" <ji...@apache.org> on 2019/09/03 19:05:00 UTC

[jira] [Comment Edited] (CASSANDRA-15298) Cassandra node cannot be restored using documented backup method

    [ https://issues.apache.org/jira/browse/CASSANDRA-15298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16921665#comment-16921665 ] 

Jeremy Hanna edited comment on CASSANDRA-15298 at 9/3/19 7:04 PM:
------------------------------------------------------------------

I was just speaking with Jeremiah about this as well and you're right, scrub requires the schema to do the work.  The schema is recorded in the schema.cql that is used when doing the backup though, so to your original question about the normal flow, it sounds like that can be used by users and tools to recreate the schema in a way for the restore to work properly.  The schema describe using cql isn't going to contain that information though, like you say.

And you're correct - the documentation is misleading, I was just saying that if you haven't dropped columns in the past, then the documentation is correct.


was (Author: jeromatron):
I was just speaking with Jeremiah about this as well and you're right, scrub requires the schema to do the work.  The schema is recorded in the schema.cql that is used when doing the backup though, so to your original question about the normal flow, it sounds like that can be used by users and tools to recreate the schema in a way for the restore to work properly.  The schema describe using cql isn't going to contain that information though, like you say.

> Cassandra node cannot be restored using documented backup method
> ----------------------------------------------------------------
>
>                 Key: CASSANDRA-15298
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15298
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Charlemange Lasse
>            Priority: Normal
>
> I have a single cassandra 3.11.4 node. It contains various tables and UDFs. The [documentation describes a method to backup this node|https://docs.datastax.com/en/archived/cassandra/3.0/cassandra/operations/opsBackupTakesSnapshot.html]:
>  * use "DESCRIBE SCHEMA" in cqlsh to get the schema
>  * create a snapshot using nodetool
>  * copy the snapshot + schema to a new (completely disconnected) node
>  * load schema into new node
>  * load sstables again using nodetool
> But this is a complete bogus method. It will result in errors like: 
>  
> {noformat}
> java.lang.RuntimeException: Unknown column deleted_column during deserialization {noformat}
> And all data in this column is now lost.
> Problem is that the "DESCRIBE SCHEMA" CQL doesn't add the stuff correctly for already deleted (but still existing columns) to the schema. It looks for example like:
> {noformat}
> CREATE TABLE mykeyspace.testcf (
>     primary_uuid uuid,
>     secondary_uuid uuid,
>     name text,
>     PRIMARY KEY (main_uuid, secondary_uuid)
> ) WITH CLUSTERING ORDER BY (secondary_uuid ASC)
>     AND bloom_filter_fp_chance = 0.01
>     AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
>     AND comment = ''
>     AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
>     AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
>     AND crc_check_chance = 1.0
>     AND dclocal_read_repair_chance = 0.1
>     AND default_time_to_live = 0
>     AND gc_grace_seconds = 864000
>     AND max_index_interval = 2048
>     AND memtable_flush_period_in_ms = 0
>     AND min_index_interval = 128
>     AND read_repair_chance = 0.0
>     AND speculative_retry = '99PERCENTILE';
> {noformat}
> But it must actually look like:
> {noformat}
> CREATE TABLE IF NOT EXISTS mykeyspace.testcf (
>         primary_uuid uuid,
>         secondary_uuid uuid,
>         name text,
>         deleted_column boolean,
>         PRIMARY KEY (main_uuid, secondary_uuid)
>         WITH ID = a1afdd4d-b61e-4f2a-b806-57c296be3948
>         AND CLUSTERING ORDER BY (ap_uuid ASC)
>         AND bloom_filter_fp_chance = 0.01
>         AND dclocal_read_repair_chance = 0.1
>         AND crc_check_chance = 1.0
>         AND default_time_to_live = 0
>         AND gc_grace_seconds = 864000
>         AND min_index_interval = 128
>         AND max_index_interval = 2048
>         AND memtable_flush_period_in_ms = 0
>         AND read_repair_chance = 0.0
>         AND speculative_retry = '99PERCENTILE'
>         AND comment = ''
>         AND caching = { 'keys': 'ALL', 'rows_per_partition': 'NONE' }
>         AND compaction = { 'max_threshold': '32', 'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy' }
>         AND compression = { 'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor' }
>         AND cdc = false
>         AND extensions = {  };
> ALTER TABLE mykeyspace.testcf DROP deleted_column USING TIMESTAMP 1563978151561000;
> {noformat}
> This was taken from the snapshot's (column family specific) schema.cql. Which of course is not compatible with the main schema because it will only create the tables when they don't exist (which they are because the main "DESCRIBE SCHEMA" file already creates them) and is missing all other kind of stuff like UDFs.
> It is currently not possible (using the builtin mechanisms from cassandra 3.11.4) to migrate a keyspace from one separated server to another separated server.
> This behavior also breaks various backup systems which try to store cassandra cluster information to an offline storage.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org