You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "maxwellguo (Jira)" <ji...@apache.org> on 2020/06/02 03:44:00 UTC
[jira] [Updated] (CASSANDRA-15844) Create table Asynchronously or creating table contact the same node from many client threads at same time may causing data lose

     [ https://issues.apache.org/jira/browse/CASSANDRA-15844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

maxwellguo updated CASSANDRA-15844:
-----------------------------------
    Summary: Create table Asynchronously or creating table contact the same node from many client threads at same time may causing data lose  (was: Create table Asynchronously or contact the same node from many client threads at same time may causing data lose)

> Create table Asynchronously or creating table contact the same node from many client threads at same time may causing data lose
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15844
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15844
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Schema
>            Reporter: maxwellguo
>            Assignee: maxwellguo
>            Priority: Normal
>         Attachments: createkeyspace.jpg, keyspace inner.jpg, schemaversion.jpg
>
>
> When creating a table from on one coordinator node from some client threads at the same time, Or creating a table using session.executeAsync() method, may cause the schema'information incorrect. Seriously will causing data lose.
> For my test. I use executeAsync() to create table one by one using the same table name (Though I do konw create table should be synchronously, but some of our customers may create table using executAsync() ). My expectations is that the last cql 
> {code:java}
> CREATE TABLE ks.tb (name text PRIMARY KEY , age int, adds text, height text)
> {code}
> should take effect . 
>  !createkeyspace.jpg! 
> But after runing the code, I foud that the result is not what I am expected.  the schema struct is is :
> {code:java}
> CREATE TABLE ks.tb (name text PRIMARY KEY , age int, adds text, sex int, height int)
> {code}
>  !keyspace inner.jpg! 
> And the schema version in the memory and on the disk is not the same. 
>  !schemaversion.jpg! 
> When add a new columnfamily (creat a new table), the request of creating same table with different schema definition arrived at the same time from different clients or using 
> executeAsync method. 
> {code:java}
>  private static void announceNewColumnFamily(CFMetaData cfm, boolean announceLocally, boolean throwOnDuplicate, long timestamp) throws ConfigurationException
>     {
>         cfm.validate();
>         KeyspaceMetadata ksm = Schema.instance.getKSMetaData(cfm.ksName);
>         if (ksm == null)
>             throw new ConfigurationException(String.format("Cannot add table '%s' to non existing keyspace '%s'.", cfm.cfName, cfm.ksName));
>         // If we have a table or a view which has the same name, we can't add a new one
>         else if (throwOnDuplicate && ksm.getTableOrViewNullable(cfm.cfName) != null)
>             throw new AlreadyExistsException(cfm.ksName, cfm.cfName);
>         logger.info("Create new table: {}", cfm);
>         announce(SchemaKeyspace.makeCreateTableMutation(ksm, cfm, timestamp), announceLocally);
>     }
> {code}
> The code of checking table existance may failed. And same table's request may all going to do announce() method;
> {code:java}
> public static synchronized void mergeSchema(Collection<Mutation> mutations, boolean forDynamoTTL)
>     {
>         // only compare the keyspaces affected by this set of schema mutations
>         Set<String> affectedKeyspaces =
>         mutations.stream()
>                  .map(m -> UTF8Type.instance.compose(m.key().getKey()))
>                  .collect(Collectors.toSet());
>         // fetch the current state of schema for the affected keyspaces only
>         Keyspaces before = Schema.instance.getKeyspaces(affectedKeyspaces);
>         // apply the schema mutations and flush
>         mutations.forEach(Mutation::apply);
>         if (FLUSH_SCHEMA_TABLES)
>             flush();
>         // fetch the new state of schema from schema tables (not applied to Schema.instance yet)
>         Keyspaces after = fetchKeyspacesOnly(affectedKeyspaces);
>         mergeSchema(before, after);
>         scheduleDynamoTTLClean(forDynamoTTL, mutations);
>     }
> {code}
> For we may write the new table definition into disk, so at last we saw 
> {code:java}
> CREATE TABLE ks.tb (name text PRIMARY KEY , age int, adds text, sex int, height int)
> {code}
> in our case.
> And we also saw the different version in memory and disk. 
> when writing data we using the schema in memory, but when we doing node restart the schema definition on disk will be used. Then may causing data lose. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org