You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2020/05/07 17:24:06 UTC

[GitHub] [cassandra] dcapwell commented on a change in pull request #575: CASSANDRA-15776 port nodetool_test.py test_refresh_size_estimates_clears_invalid_entries to jvm dtest and fixed bug where size_estimates isn't used for clearing tables

dcapwell commented on a change in pull request #575:
URL: https://github.com/apache/cassandra/pull/575#discussion_r421669222



##########
File path: src/java/org/apache/cassandra/db/SystemKeyspace.java
##########
@@ -1344,12 +1344,17 @@ public static void clearEstimates(String keyspace)
     public static synchronized SetMultimap<String, String> getTablesWithSizeEstimates()
     {
         SetMultimap<String, String> keyspaceTableMap = HashMultimap.create();
-        String cql = String.format("SELECT keyspace_name, table_name FROM %s", TableEstimates.toString(), TABLE_ESTIMATES_TYPE_PRIMARY);
-        UntypedResultSet rs = executeInternal(cql);
-        for (UntypedResultSet.Row row : rs)
+        // Its possible that size_estimates knows about a different set of keyspace/tables than table_estimates (mostly
+        // caused by external systems modifying the tables, such as dtest) so query both
+        for (String cql : Arrays.asList(
+            "SELECT keyspace_name, table_name FROM " + TableEstimates.toString(),
+            "SELECT keyspace_name, table_name FROM " + LegacySizeEstimates.toString()))

Review comment:
       @michaelsembwever the impact is related to the number of tables in the cluster; if you have 10 tables this is small, if you have 1k its much larger cost.  To @nastra point, its sadly unavoidable since the two tables could drift.  If you think about it from the migration standpoint, on T1 we have size_estimates as the source of truth, in T2 its now table_estimates, so if there happens to exist tables in size_estimates which are no longer in the new table_estimates they would live there forever.
   
   The function is only called at two places:
   
   1) CassandraDaemon startup.  This could impact startup times as its now twice the cost; impacts clusters with more tables; Sadly there is no monitoring around this, so only know via stack traces...
   2) JMX. triggered via nodetool or via random JMX call. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org