You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2017/01/10 10:03:59 UTC

[jira] [Commented] (CASSANDRA-9425) Make node-local schema fully immutable

    [ https://issues.apache.org/jira/browse/CASSANDRA-9425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814547#comment-15814547 ] 

Sylvain Lebresne commented on CASSANDRA-9425:
---------------------------------------------

I very much like the changes of this patch in general. I also agree with [~snazy] small follow-up, getting rid of {{keyspaceAndTablePair}} sounds good. The patch is big though so I have a bunch of remarks/suggestions, some of which may or may not be very useful. Some of the comments may also not be entirely new to this patch, I haven't double-checked everything, but as the main goal of this patch is to clean stuff up, let say I don't feel too bad if there is a few such remarks.

I have also a few things, mostly nits, that I just wrote while reviewing and pushed in branch below. One noticeable commit that is not entirely nitty is the introduction of a {{TableId}} class (that just "wrap" a UUID). It felts like this make the code read better with no real downside. It's a trivial patch, just with a big-ish diff.
| [9425-review|https://github.com/pcmanus/cassandra/commits/9425-review] | [utests|http://cassci.datastax.com/job/pcmanus-9425-review-testall] | [dtests|http://cassci.datastax.com/job/pcmanus-9425-review-dtest] |


Other points:
* {{TableMetadataRef.to()}} feels like it could be misused easily so I'd suggest making things a bit more explicit. In practice, it has 3 main usages:
*# in {{Schema}}, which is kind of the one true use, where I'd directly use the ctor (which would be package-protected and have the same comment than on {{set()}}).
*# for offline tools: I'd rename {{to()}} to something like {{forOfflineTool()}} that make it clear it's ok only in that context. A comment would be nice too.
*# for indexes, where we kind of hack around them not having their own ID. I think it's worth pointing out we're doing something specific by having a {{forIndex()}} method with proper comments (and probably an assertion that validate it's indeed called on an index). Or maybe go with my next point.
* Talking of index handling, {{TableMetadataRef.inPlaceUpdateIndexedTableMetadata}} feels dangerous. The {{set()}} method is very explicit about not being for public consumption, but that {{inPlaceUpdateIndexedTableMetadata}} method calls {{set()}} directly and is public. It's also particularly hard to convince oneself that it's used in a safe maner (you check first that {{SecondaryIndexManager.reload()}} is indeed only called protected by migration synchronization, but the call to {{inPlaceUpdateIndexedTableMetadata}} is on a task executed by an {{ExecutorService}} so you have to double-check it's not really asynchronous (despite what the javadoc in {{Index.java}} claims!) and thus is indeed protected).
  Anyway, at a minimum some comments are imo missing, but I wonder if we could make this more obviously correct by pushing it to {{Schema}}, having a map of 'index name' -> TableMetadataRef specific to indexes. Updating that in {{Schema.load()}} shouldn't be terribly complex. Or am I missing a reason why that wouldn't work?
* I'm unclear on the changes to {{AlterTypeStatement}}. The patch removes the code that propagate type changes to other table/view/types, but I neither see where that has moved, nor why that wouldn't be needed anymore (as an aside, if it does is not needed anymore, than we could remove the {{updateTypes()}} and {{updateWith()}} methods that are unused in the patch).
* I'd rename {{addRegularColumn}} and {{alterColumnType}} in {{ViewMetadata}} as they are not modifying the {{ViewMetadata}} but creating copies (say to {{withAddedRegularColumn()}} and {{withAlteredColumnType}}).
* The patch changes some system table options (compared to current trunk), namely:
** {{AuthKeyspace}} tables don't get his {{memtableFlushPeriod}} to 1h.
** {{SystemDistributedKeyspace}} gets a default gcGrace, but it was of 10 days.
** None of the system table gets {{readRepairChance == 0}} (only {{dcLocaReadRepairChance}} is set to 0).
* In {{Indexes.validate()}}, not sure it's worth doing the duplicate name check since we do it at the keyspace level already (and only doing it withing a single table is a false sense of safety).
* {{Keyspaces}} has both a {{get(String)}} that return {{Optional}} and a {{getNullable(String)}}. We should probably remove one.
* There is {{TODO: FIXME}} in ColumnFamilyStore.setCompressionParameters.
* There is a {{TODO FIXME}} in SchemaKeyspace (and not 100% sure what it's about).
* Nit: In {{SSTableTxnWriter}}, it feels slightly inconsistent that {{createRangeAware}} takes a {{TableMetadata}}, but {{create}} takes a {{TableMetadataRef}}.
* In {{Schema}}:
** Realized afterwards that this wasn't new to this patch, but mentioning it nonetheless: loading the system keyspaces in Schema ctor feels dodgy, especially since it's conditional on some initialization methods (and it's a singleton). What if Schema somehow happens to be loaded too early for instance? Anyway, would be nice to move that code to some {{loadSystemTables()}} function called in the proper place. Also, the {{Schema}} ctor should probably be private.
** slightly uncomfortable with {{load()}} as it is. It allow both new and updated keyspace, but doesn't do everything it should on an update (it doesn't reload {{ColumnFamilyStore}}, ...). And while those other parts are done properly by {{merge()}} and true schema migrations, {{load()}} is public so it feels easy to misuse. Even for brand new keyspaces, it's a bit confusing that {{load()}} is only a sub-part of what {{Schema.createKeyspace()}} (it works due to 1) my next point and 2) the fact we happen to not care about notifications when load() is called, but it's a bit messy).
** Some of {{Schema.createKeyspace()}} is actually redundant: {{Keyspace}} ctor already initialize all tables and views, so it ends up being duplicated. It's largely harmless though we do end up reloading each {{ColumnFamilyStore}} for no reason. Taken with my previous point, I wonder if things could be refactored a bit so it's clearer when initialization happens.
** {{Schema.load}} and {{Schema.unload}} should probably be synchronized for their modification of {{keyspaces}}. I acknowledge that those method are called by other methods that _are_ synchronized, but 1) it feels adding synchronization directly on those method at least adds clarify and prevent future misues and 2) not all path are actually synchronized since {{load()}} is public (here again, the concrete usage are actually fine, so not claiming a bug, just suggesting making things more future proof).
** {{validateTable}} seems to replace {{Validation.validateColumnFamily}} but the latter is still there and used in a few instances. Would be nice to clean up. {{Validation.validateKeyspaceNotSystem}} could probably be migrated too for consistency.
** I'd maybe suggest adding a private {{handleDiff(mapDiff, Function onDropped, Function onAdded, Function onUpdated)}} method to simplify a bit.
** Is there a rational for notifying everything at the end, versus calling each notification in the "appropriate" sub-method (notifying table alter in {{alterTable()}}, ....)? The later would feels a tad more readable to me as it's slightly easier to check we haven't forgotten something.
* In {{TableMetadata}}, validateCompatibility() should probably check more stuffs (we can't alter non-PK column types as we want, nor change the partition key size for instance).

Also, as the patch does quite a bit of renaming, allow me a few additional suggestions. I won't get mad if you'd rather not do those though:
* I'd rename {{TableMetadata.table}} to {{TableMetadata.name}} for consistency with other classes ({{KeyspaceMetadata}} and {{IndexMetadata}} at least). While at it, I'd also rename {{ksName}} to {{keyspace}} and {{viewName}} to {{name}} in {{ViewMetadata}}.
* I'd also probably rename {{PartitionColumns}} to {{RegularAndStaticColumns}} now.

> Make node-local schema fully immutable
> --------------------------------------
>
>                 Key: CASSANDRA-9425
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9425
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Aleksey Yeschenko
>            Assignee: Aleksey Yeschenko
>             Fix For: 4.0
>
>
> The way we handle schema changes currently is inherently racy.
> All of our {{SchemaAlteringStatement}} s perform validation on a schema state that's won't necessarily be there when the statement gets executed and mutates schema.
> We should make all the *Metadata classes ({{KeyspaceMetadata, TableMetadata}}, {{ColumnMetadata}}, immutable, and local schema persistently snapshottable, with a single top-level {{AtomicReference}} to the current snapshot. Have DDL statements perform validation and transformation on the same state.
> In pseudo-code, think
> {code}
> public interface DDLStatement
> {
>     /**
>      * Validates that the DDL statement can be applied to the provided schema snapshot.
>      *
>      * @param schema snapshot of schema before executing CREATE KEYSPACE
>      */
>     void validate(SchemaSnapshot schema);
>  
>     /**
>      * Applies the DDL statement to the provided schema snapshot.
>      * Implies that validate() has already been called on the provided snapshot.
>      *
>      * @param schema snapshot of schema before executing the statement
>      * @return snapshot of schema as it would be after executing the statement
>      */
>     SchemaSnapshot transform(SchemaSnapshot schema);
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)