You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/13 02:33:16 UTC
[GitHub] [druid] clintropolis commented on a change in pull request #10366: Proposed changes for making joins cacheable

clintropolis commented on a change in pull request #10366:
URL: https://github.com/apache/druid/pull/10366#discussion_r487467291



##########
File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
##########
@@ -118,6 +126,47 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
     );
   }
 
+  /**
+   * Compute a cache key prefix for data sources that is to be used in segment level and result level caches. The
+   * data source can either be base (clauses is empty) or RHS of a join (clauses is non-empty). In both of the cases,
+   * a non-null cache is returned. However, the cache key is null if there is a join and some of the right data sources
+   * participating in the join do not support caching yet
+   *
+   * @param dataSourceAnalysis
+   * @param joinableFactory
+   * @return
+   */
+  public static Optional<byte[]> computeDataSourceCacheKey(
+      final DataSourceAnalysis dataSourceAnalysis,
+      final JoinableFactory joinableFactory
+  )
+  {
+    final CacheKeyBuilder keyBuilder;
+    final List<PreJoinableClause> clauses = dataSourceAnalysis.getPreJoinableClauses();
+    if (clauses.isEmpty()) {
+      keyBuilder = new CacheKeyBuilder(REGULAR_OPERATION);
+    } else {
+      keyBuilder = new CacheKeyBuilder(JOIN_OPERATION);
+      for (PreJoinableClause clause : clauses) {
+        if (!clause.getCondition().canHashJoin()) {

Review comment:
       Why does this matter? Shouldn't it just be on the joinable factory to decide if it can cache or not and we should supply the `PreJoinableClause` or the `JoinConditionAnalysis` from `PreJoinableClause.getCondition` to it?

##########
File path: server/src/main/java/org/apache/druid/server/coordination/ServerManager.java
##########
@@ -293,21 +302,28 @@ public ServerManager(
         queryMetrics -> queryMetrics.segment(segmentIdString)
     );
 
-    CachingQueryRunner<T> cachingQueryRunner = new CachingQueryRunner<>(
-        segmentIdString,
-        segmentDescriptor,
-        objectMapper,
-        cache,
-        toolChest,
-        metricsEmittingQueryRunnerInner,
-        cachePopulator,
-        cacheConfig
-    );
+    QueryRunner<T> queryRunner;

Review comment:
       `CachingQueryRunner` already has the machinery to decide whether or not to use the caches, should we just pass the optional in and use it as another input to the existing `useCache`/`populateCache` equation so we don't need these if statements in `ServerManager` and `SinkQuerySegmentWalker`?

##########
File path: processing/src/main/java/org/apache/druid/segment/join/MapJoinableFactory.java
##########
@@ -80,4 +80,21 @@ public boolean isDirectlyJoinable(DataSource dataSource)
     }
     return maybeJoinable;
   }
+
+  @Override
+  public Optional<byte[]> computeJoinCacheKey(DataSource dataSource)

Review comment:
       Since multiple joinable factories can support the same class of datasource, I guess this means that this method needs to mimic the same selection logic as `build` to make sure that a joinable factory only computes a cache key for a table that it would build a joinable for in the query? I think we should probably supply the `PreJoinableClause` or the `JoinConditionAnalysis ` to this method so that it can have the `JoinConditionAnalysis` that `build` has access to. Then these factories can make better decisions about if they can compute a key for the query or not.

##########
File path: processing/src/main/java/org/apache/druid/segment/join/table/BroadcastSegmentIndexedTable.java
##########
@@ -241,6 +244,22 @@ public ColumnSelectorFactory makeColumnSelectorFactory(ReadableOffset offset, bo
     );
   }
 
+  @Override
+  public Optional<byte[]> computeCacheKey()
+  {
+    SegmentId segmentId = segment.getId();

Review comment:
       Could this just be a method on `SegmentId`?

##########
File path: server/src/main/java/org/apache/druid/query/ResultLevelCachingQueryRunner.java
##########
@@ -86,7 +92,15 @@ public ResultLevelCachingQueryRunner(
     if (useResultCache || populateResultCache) {
 
       final String cacheKeyStr = StringUtils.fromUtf8(strategy.computeResultLevelCacheKey(query));
-      final byte[] cachedResultSet = fetchResultsFromResultLevelCache(cacheKeyStr);
+      DataSourceAnalysis analysis = DataSourceAnalysis.forDataSource(query.getDataSource());
+      byte[] dataSourceCacheKey = Joinables.computeDataSourceCacheKey(analysis, joinableFactory).orElse(null);
+      if (null == dataSourceCacheKey) {
+        return baseRunner.run(
+            queryPlus,
+            responseContext
+        );
+      }
+      final byte[] cachedResultSet = fetchResultsFromResultLevelCache(cacheKeyStr, dataSourceCacheKey);

Review comment:
       nit: this seems a little strange that the cache key is turned into the namespace and another key is used as the bytes, rather than making some sort of composite, but maybe it is ok?

##########
File path: server/src/main/java/org/apache/druid/segment/join/BroadcastTableJoinableFactory.java
##########
@@ -76,4 +76,27 @@ public boolean isDirectlyJoinable(DataSource dataSource)
     }
     return Optional.empty();
   }
+
+  @Override
+  public Optional<byte[]> computeJoinCacheKey(DataSource dataSource)

Review comment:
       seems like this and `build` could share a method that gets the `ReferenceCountingIndexedTable` if possible, and build can make the joinable and this method compute the key on the results

##########
File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
##########
@@ -118,6 +126,47 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
     );
   }
 
+  /**
+   * Compute a cache key prefix for data sources that is to be used in segment level and result level caches. The
+   * data source can either be base (clauses is empty) or RHS of a join (clauses is non-empty). In both of the cases,
+   * a non-null cache is returned. However, the cache key is null if there is a join and some of the right data sources
+   * participating in the join do not support caching yet
+   *
+   * @param dataSourceAnalysis
+   * @param joinableFactory
+   * @return
+   */
+  public static Optional<byte[]> computeDataSourceCacheKey(
+      final DataSourceAnalysis dataSourceAnalysis,
+      final JoinableFactory joinableFactory
+  )
+  {
+    final CacheKeyBuilder keyBuilder;
+    final List<PreJoinableClause> clauses = dataSourceAnalysis.getPreJoinableClauses();
+    if (clauses.isEmpty()) {
+      keyBuilder = new CacheKeyBuilder(REGULAR_OPERATION);

Review comment:
       I guess this would bust cache on upgrade since all non-join cache keys would have this new byte prefix? Could we just not add a prefix if the query isn't a join query?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org