You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2020/07/20 05:00:36 UTC

[GitHub] [cassandra-diff] yifan-c opened a new pull request #10: Support auto discover user tables for comparison

yifan-c opened a new pull request #10:
URL: https://github.com/apache/cassandra-diff/pull/10


   - Auto discover user tables to run the diff job, if `keyspace_tables` is not defined. The discovered tables are the intersection of the schema from source and target clusters. 
   - Added a `disallowed_keyspaces` field to exclude tables from those keyspaces. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra-diff] yifan-c commented on a change in pull request #10: CASSANDRA-15953: Support auto discover user tables for comparison

Posted by GitBox <gi...@apache.org>.
yifan-c commented on a change in pull request #10:
URL: https://github.com/apache/cassandra-diff/pull/10#discussion_r458968188



##########
File path: spark-job/src/main/java/org/apache/cassandra/diff/DiffJob.java
##########
@@ -87,15 +89,28 @@ public void run(JobConfiguration configuration, JavaSparkContext sc) {
         ClusterProvider targetProvider = ClusterProvider.getProvider(configuration.clusterConfig("target"), "target");
         String sourcePartitioner;
         String targetPartitioner;
+        List<KeyspaceTablePair> tablesToCompare = configuration.filteredKeyspaceTables();
         try (Cluster sourceCluster = sourceProvider.getCluster();
              Cluster targetCluster = targetProvider.getCluster()) {
             sourcePartitioner = sourceCluster.getMetadata().getPartitioner();
             targetPartitioner = targetCluster.getMetadata().getPartitioner();
+
+            if (!sourcePartitioner.equals(targetPartitioner)) {
+                throw new IllegalStateException(String.format("Cluster partitioners do not match; Source: %s, Target: %s,",
+                                                              sourcePartitioner, targetPartitioner));
+            }
+
+            if (configuration.shouldAutoDiscoverTables()) {
+                Schema sourceSchema = new Schema(sourceCluster.getMetadata(), configuration);
+                Schema targetSchema = new Schema(targetCluster.getMetadata(), configuration);
+                Schema commonSchema = sourceSchema.intersect(targetSchema);
+                if (commonSchema.size() != sourceSchema.size()) {
+                    logger.warn("Found tables that only exist in either source or target cluster. Ignoring those tables for comparision. ");

Review comment:
       The diff job expects the schema from the compared clusters to be the same. If we found schema mismatch, the comparison might not be valid. Therefore, I choose to log it at `warn` level to highlight the unexpected. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra-diff] beobal closed pull request #10: CASSANDRA-15953: Support auto discover user tables for comparison

Posted by GitBox <gi...@apache.org>.
beobal closed pull request #10:
URL: https://github.com/apache/cassandra-diff/pull/10


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra-diff] beobal commented on a change in pull request #10: CASSANDRA-15953: Support auto discover user tables for comparison

Posted by GitBox <gi...@apache.org>.
beobal commented on a change in pull request #10:
URL: https://github.com/apache/cassandra-diff/pull/10#discussion_r458980163



##########
File path: spark-job/src/main/java/org/apache/cassandra/diff/DiffJob.java
##########
@@ -87,15 +89,28 @@ public void run(JobConfiguration configuration, JavaSparkContext sc) {
         ClusterProvider targetProvider = ClusterProvider.getProvider(configuration.clusterConfig("target"), "target");
         String sourcePartitioner;
         String targetPartitioner;
+        List<KeyspaceTablePair> tablesToCompare = configuration.filteredKeyspaceTables();
         try (Cluster sourceCluster = sourceProvider.getCluster();
              Cluster targetCluster = targetProvider.getCluster()) {
             sourcePartitioner = sourceCluster.getMetadata().getPartitioner();
             targetPartitioner = targetCluster.getMetadata().getPartitioner();
+
+            if (!sourcePartitioner.equals(targetPartitioner)) {
+                throw new IllegalStateException(String.format("Cluster partitioners do not match; Source: %s, Target: %s,",
+                                                              sourcePartitioner, targetPartitioner));
+            }
+
+            if (configuration.shouldAutoDiscoverTables()) {
+                Schema sourceSchema = new Schema(sourceCluster.getMetadata(), configuration);
+                Schema targetSchema = new Schema(targetCluster.getMetadata(), configuration);
+                Schema commonSchema = sourceSchema.intersect(targetSchema);
+                if (commonSchema.size() != sourceSchema.size()) {
+                    logger.warn("Found tables that only exist in either source or target cluster. Ignoring those tables for comparision. ");

Review comment:
       Well, it used to expect the two schema to be the same, now it expects them to intersect :) It's not a big deal, I was just thinking that it might not be that unusual for some cases - e.g. if you run the metadata db on either the source or target clusters.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra-diff] yifan-c commented on a change in pull request #10: CASSANDRA-15953: Support auto discover user tables for comparison

Posted by GitBox <gi...@apache.org>.
yifan-c commented on a change in pull request #10:
URL: https://github.com/apache/cassandra-diff/pull/10#discussion_r459205469



##########
File path: spark-job/src/main/java/org/apache/cassandra/diff/DiffJob.java
##########
@@ -87,15 +89,28 @@ public void run(JobConfiguration configuration, JavaSparkContext sc) {
         ClusterProvider targetProvider = ClusterProvider.getProvider(configuration.clusterConfig("target"), "target");
         String sourcePartitioner;
         String targetPartitioner;
+        List<KeyspaceTablePair> tablesToCompare = configuration.filteredKeyspaceTables();
         try (Cluster sourceCluster = sourceProvider.getCluster();
              Cluster targetCluster = targetProvider.getCluster()) {
             sourcePartitioner = sourceCluster.getMetadata().getPartitioner();
             targetPartitioner = targetCluster.getMetadata().getPartitioner();
+
+            if (!sourcePartitioner.equals(targetPartitioner)) {
+                throw new IllegalStateException(String.format("Cluster partitioners do not match; Source: %s, Target: %s,",
+                                                              sourcePartitioner, targetPartitioner));
+            }
+
+            if (configuration.shouldAutoDiscoverTables()) {
+                Schema sourceSchema = new Schema(sourceCluster.getMetadata(), configuration);
+                Schema targetSchema = new Schema(targetCluster.getMetadata(), configuration);
+                Schema commonSchema = sourceSchema.intersect(targetSchema);
+                if (commonSchema.size() != sourceSchema.size()) {
+                    logger.warn("Found tables that only exist in either source or target cluster. Ignoring those tables for comparision. ");

Review comment:
       Yes. It could be sort of annoying when metadata db is one of the clusters. It is easy to reason about IMO. 
   Maybe add the difference of 2 sets in the log statement?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra-diff] beobal commented on a change in pull request #10: CASSANDRA-15953: Support auto discover user tables for comparison

Posted by GitBox <gi...@apache.org>.
beobal commented on a change in pull request #10:
URL: https://github.com/apache/cassandra-diff/pull/10#discussion_r458685641



##########
File path: spark-job/src/main/java/org/apache/cassandra/diff/Schema.java
##########
@@ -0,0 +1,65 @@
+package org.apache.cassandra.diff;
+
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Sets;
+
+import com.datastax.driver.core.KeyspaceMetadata;
+import com.datastax.driver.core.Metadata;
+import com.datastax.driver.core.TableMetadata;
+
+public class Schema {
+    // Filter out all system keyspaces.
+    private static final Set<String> DEFAULT_FILETER = ImmutableSet.of(

Review comment:
       Nit: `s/FILETER/FILTER`

##########
File path: spark-job/src/main/java/org/apache/cassandra/diff/Schema.java
##########
@@ -0,0 +1,65 @@
+package org.apache.cassandra.diff;
+
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import com.google.common.collect.ImmutableSet;
+import com.google.common.collect.Sets;
+
+import com.datastax.driver.core.KeyspaceMetadata;
+import com.datastax.driver.core.Metadata;
+import com.datastax.driver.core.TableMetadata;
+
+public class Schema {
+    // Filter out all system keyspaces.
+    private static final Set<String> DEFAULT_FILETER = ImmutableSet.of(
+        "system", "system_schema", "system_traces", "system_auth", "system_distributed", "system_virtual_schema", "system_views"
+    );
+
+    private final Set<KeyspaceTablePair> qualifiedTables;
+
+    public Schema(Metadata metadata, JobConfiguration configuration) {
+        Set<String> keyspaceFilter = getKeyspaceFilter(configuration);
+        qualifiedTables = new HashSet<>();
+        for (KeyspaceMetadata keyspaceMetadata : metadata.getKeyspaces()) {
+            if (keyspaceFilter.contains(keyspaceMetadata.getName()))
+                continue;
+
+            for (TableMetadata tableMetadata : keyspaceMetadata.getTables()) {
+                qualifiedTables.add(KeyspaceTablePair.from(tableMetadata));
+            }
+        }
+    }
+
+    public Schema(Set<KeyspaceTablePair> qualifiedTables) {
+        this.qualifiedTables = qualifiedTables;
+    }
+
+    public Schema intersect(Schema other) {
+        if (this != other) {
+            Set<KeyspaceTablePair> intersection = Sets.intersection(this.qualifiedTables, other.qualifiedTables);
+            return new Schema(intersection);
+        }
+        return this;
+    }
+
+    public List<KeyspaceTablePair> toQualifiedTableList() {
+        return new ArrayList<>(qualifiedTables);
+    }
+
+    public int size() {
+        return qualifiedTables.size();
+    }
+
+    private Set<String> getKeyspaceFilter(JobConfiguration configuration) {
+        List<String> disallowedKeyspaces = configuration.disallowedKeyspaces();
+        if (null == disallowedKeyspaces) {
+            return DEFAULT_FILETER;
+        } else {
+            return ImmutableSet.copyOf(disallowedKeyspaces);

Review comment:
       Shouldn't this be the union of `disallowedKeyspaces` and the default, system keyspaces?

##########
File path: spark-job/src/main/java/org/apache/cassandra/diff/DiffJob.java
##########
@@ -87,15 +89,28 @@ public void run(JobConfiguration configuration, JavaSparkContext sc) {
         ClusterProvider targetProvider = ClusterProvider.getProvider(configuration.clusterConfig("target"), "target");
         String sourcePartitioner;
         String targetPartitioner;
+        List<KeyspaceTablePair> tablesToCompare = configuration.filteredKeyspaceTables();
         try (Cluster sourceCluster = sourceProvider.getCluster();
              Cluster targetCluster = targetProvider.getCluster()) {
             sourcePartitioner = sourceCluster.getMetadata().getPartitioner();
             targetPartitioner = targetCluster.getMetadata().getPartitioner();
+
+            if (!sourcePartitioner.equals(targetPartitioner)) {
+                throw new IllegalStateException(String.format("Cluster partitioners do not match; Source: %s, Target: %s,",
+                                                              sourcePartitioner, targetPartitioner));
+            }
+
+            if (configuration.shouldAutoDiscoverTables()) {
+                Schema sourceSchema = new Schema(sourceCluster.getMetadata(), configuration);
+                Schema targetSchema = new Schema(targetCluster.getMetadata(), configuration);
+                Schema commonSchema = sourceSchema.intersect(targetSchema);
+                if (commonSchema.size() != sourceSchema.size()) {
+                    logger.warn("Found tables that only exist in either source or target cluster. Ignoring those tables for comparision. ");

Review comment:
       Nit: I wonder if this needs to be a warning, rather than just info. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org