You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2020/04/29 19:32:04 UTC

[GitHub] [cassandra] dcapwell opened a new pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

dcapwell opened a new pull request #575:
URL: https://github.com/apache/cassandra/pull/575


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] dcapwell commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
dcapwell commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r421674154



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       > Do we expect the set of keyspaces in table_estimates and size_estimates to be mutually exclusive? If not, aren't you clobbering the estimates from table_estimates with the values from size_estimates table? Is that the expected behavior?
   
   @dineshjoshi I expect them to conflict 99% of the time (1% for random users writing random data), which is why a set is used to dedup; after a reboot they should have identical tables.
   
   >  clobbering the estimates from table_estimates with the values from size_estimates table?
   
   This function only looks for table names, not values, so no clobbering estimates.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] dcapwell commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
dcapwell commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r421670428



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       @nastra not sure I agree with the comment that toString is less obvious, is there a reason you say that?  In this case we don't care about quoting since it is known, but the `TableMetadata.toString` is a safer way to do this in general, so should be the new norm (in my opinion) rather than the common `format("%s.%s", ks_name, name)` (which is unsafe in general).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] nastra commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r422004654



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       > @nastra not sure I agree with the comment that toString is less obvious, is there a reason you say that? In this case we don't care about quoting since it is known, but the `TableMetadata.toString` is a safer way to do this in general, so should be the new norm (in my opinion) rather than the common `format("%s.%s", ks_name, name)` (which is unsafe in general).
   
   @dcapwell it's really just a personal preference and I don't have any strong objections around how it's done at the end. That's why I marked it as a nit. Your approach makes sense and is perfectly fine




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] dineshjoshi commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
dineshjoshi commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r421061551



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       Do we expect the set of keyspaces in `table_estimates` and `size_estimates` to be mutually exclusive? If not, aren't you clobbering the estimates from `table_estimates` with the values from `size_estimates` table? Is that the expected behavior?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] michaelsembwever commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
michaelsembwever commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r420302886



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       what's the performance impact of going from one table scan to two here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] michaelsembwever closed pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
michaelsembwever closed pull request #575:
URL: https://github.com/apache/cassandra/pull/575


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] dcapwell commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
dcapwell commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r421676438



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))
         {
-            keyspaceTableMap.put(row.getString("keyspace_name"), row.getString("table_name"));
+            UntypedResultSet rs = executeInternal(cql);
+            for (UntypedResultSet.Row row : rs)
+                keyspaceTableMap.put(row.getString("keyspace_name"), row.getString("table_name"));

Review comment:
       This function doesn't look at estimates but rather finds all table names.  This uses the `SetMultimap` to dedup keyspace and table names, so its fine (and expected) if the two tables have the same string values.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] michaelsembwever commented on pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
michaelsembwever commented on pull request #575:
URL: https://github.com/apache/cassandra/pull/575#issuecomment-628453321


   Merged with 01103111ae08b51ccd18bb1c54ac60546546d9df


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] nastra commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
nastra commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r420574041



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       I believe we have to do 2 table scans now until the table `size_estimates` goes away.
   
   

##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))
         {
-            keyspaceTableMap.put(row.getString("keyspace_name"), row.getString("table_name"));
+            UntypedResultSet rs = executeInternal(cql);
+            for (UntypedResultSet.Row row : rs)
+                keyspaceTableMap.put(row.getString("keyspace_name"), row.getString("table_name"));

Review comment:
       are we potentially overwriting estimates that we read from `TABLE_ESTIMATES` with estimates that we read from `LEGACY_SIZE_ESTIMATES` on purpose?

##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       nit: imo it would be more obvious to use `SchemaConstants.SYSTEM_KEYSPACE_NAME, TABLE_ESTIMATES` /  `SchemaConstants.SYSTEM_KEYSPACE_NAME, LEGACY_SIZE_ESTIMATES` in the constructed CQL query than using `TableEstimates.toString()` / `LegacySizeEstimates.toString()`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] dcapwell commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
dcapwell commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r421669222



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       @michaelsembwever the impact is related to the number of tables in the cluster; if you have 10 tables this is small, if you have 1k its much larger cost.  To @nastra point, its sadly unavoidable since the two tables could drift.  If you think about it from the migration standpoint, on T1 we have size_estimates as the source of truth, in T2 its now table_estimates, so if there happens to exist tables in size_estimates which are no longer in the new table_estimates they would live there forever.
   
   The function is only called at two places:
   
   1) CassandraDaemon startup.  This could impact startup times as its now twice the cost; impacts clusters with more tables; Sadly there is no monitoring around this, so only know via stack traces...
   2) JMX. triggered via nodetool or via random JMX call. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org


[GitHub] [cassandra] dcapwell commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

Posted by GitBox <gi...@apache.org>.
dcapwell commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r421697121



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       I switched to truncate as this was a more expensive truncate....  That should resolve all comments another than the toString one.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org