You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jacek Lewandowski (Jira)" <ji...@apache.org> on 2021/11/04 21:12:00 UTC
[jira] [Commented] (CASSANDRA-17044) Refactor schema management to allow for schema source pluggability

    [ https://issues.apache.org/jira/browse/CASSANDRA-17044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17438940#comment-17438940 ] 

Jacek Lewandowski commented on CASSANDRA-17044:
-----------------------------------------------

I reverted {{SchemaManager}} back to {{Schema}} as it seems there is no consensus for that rename. As I said, it is not crucial for this ticket and I can be made any time later if needed.

bq. Even though I generally like that SchemaChangeListener is now an interface rather than abstract class, I’m concerned about potential users who might have relied on it being implemented the way it was. Given this was a useful interface, I think we should be more careful with it. I think it’s possible to bring backwards compatibility back by creating a wrapper class with the old interface. Would still require recompilation for the users, but could will make it painless to migrate.

According to the later discussion I assumed we can keep just an interface

bq. Is there any reason we wouldn’t want to make the maps in TableMetadataRefCache immutable? I realize they can be sizeable, but in this case we could also hide them behind the interface, make default implementation immutable and let other implementations be more efficient if necessary. This way we can atomically swap/publish to all three maps.

Ok, I've implemented it so that it is immutable. However, I have some concerns in that case because the {{TableMetadataRef}} objects are purposely mutable and thus we probably should keep the synchronization (although I removed it in the commit)

bq. I’m not sure about the name SharedSchema class and nomenclature:
bq. Class itself, if the whole point was to add a version to Keyspaces, maybe we should just add the version to Keyspaces, and make sure keyspaces are being updated synchronously with the version. Moreover, I think having version here and version left in SchemaManager is making it more confusing. I realise that you’ve tried to hide the version for SharedSchema in the Default handler, it’s unclear whose responsibility the version actually is.

As far as I understand, version not always makes sense with {{Keyspaces}} - for example, we only care about the version calculated for non-local keyspaces. So I'd prefer the approach with removing that class entirely and keep version separate. On the other hand, in a couple of places keyspaces and version are passed together to a method and it is quite convenient to keep them in a single immutable object.

bq. Nomenclature: I don’t think this schema is “shared” in any way: all keyspaces that aren’t local are just nonlocal, and a subset of them is just “userDefined”. Maybe we could use “replicated” or “distributed” instead of “nonlocal”.

I really had a problem with that. I like {{DistributedSchema}} name and I committed the renaming.

bq. Is there anything that justifies the existing of LocalKeyspaces class and prevents it from being just an instance of Keyspaces which is more functional but seems to be at least equivalent for its current usages?

I have no explanation, perhaps I was working on this for too long and another pair of eyes was needed to catch this. I committed removal of {{LocalKeyspaces}} in favor of using just {{Keyspaces}}.

bq. reloadSchemaAndAnnounceVersion seems to pick all keyspaces and then filter out all system/local ones. Maybe we can just use “shared” (in nomenclature of your patch)

It gets modified in later commits.

bq. in SchemaDiagnostics, “schemata” doesn’t seem to be a typo, but rather a plural of “schema”. There are a few other mentions of schemata in this file, so I’d check it again.

I checked - my understanding is that the schema is a set of all keyspaces metadata rather than metadata of a single keyspace. That's why we have {{Schema}} class to represent those keyspaces. We only deal with multiple schemata during the transformation, in particular we have the old schema and the new schema but this is unrelated to the fixed typo.

bq. Should we make a separate implementation of IEndpointStateChangeSubscriber within SchemaUpdateHandler, since

I'm not sure if I get what do you mean by that - do you think I should create a kinda inner class implementing that interface instead of implementing it on the handler level?

bq. There seems to be an unintentional use of DSE here. I do not mind the presence of this method though even though it seems to be unused.
bq. Nit: here probably you have meant “local keyspace definitions” or something similar.

Hopefully fixed.

bq. Unfortunately, I could not find a full set of passing tests accompanying the patch. Could you, when posting the updates, make sure to include a link to CircleCI?

Indeed, I haven't run CI yet because I expected (possibly significant) changes to be made as a result of the reviews. However, it is not an unproven solution - it was first implemented on a fork in DS and all the tests (excpect upgrade dtests and flaky) were passing

bq. Lastly, I feel like for a 4k lines of code patch, this one has surprisingly few tests. I realise that schema was always a generally undertested area of code, but I think our current quality standards do not allow us to do massive refactoring without substantial testing. Therefore, I suggest we create fuzz test that verifies schema eventual propagation, reads/writes during schema changes, and cross-node consistency of schema operations themselves, including concurrent ones. What seems to be particularly important to test is the gossip propagation of the version ID, which would lead to schema pulls, alongside with active schema pushes that is done when there’s a non-local schema change, and picking up schema during bootstrap.

This will take some time and seems to be a separate task, perhaps a dependency of this ticket. I mean, I'm not fixing schema synchronization or introducing any new functionality. If the implemented tests that you mentioned detect a problem on trunk, I don't know if this patch should address them (maybe it should, but it wouldn't be just a refactoring then)

Not very important in the context of the tests mentioned above, but the PR has over 80% coverage on the added/modified lines by running only JVM tests. 


> Refactor schema management to allow for schema source pluggability
> ------------------------------------------------------------------
>
>                 Key: CASSANDRA-17044
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17044
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Schema
>            Reporter: Jacek Lewandowski
>            Assignee: Jacek Lewandowski
>            Priority: Normal
>             Fix For: 4.1
>
>
> The idea is decompose `Schema` into separate entities responsible for different things. In particular extract what is related to schema storage and synchronization into a separate class so that it is possible to create an extension point there and store schema in a different way than `system_schema` keyspace, for example in etcd. 
> This would also simplify the logic and reduce the number of special cases, make all the things more testable and the logic of internal classes encapsulated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org