You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/06/19 04:36:38 UTC

[GitHub] [druid] jon-wei opened a new pull request #10056: Ensure that join filter pre-analysis operates on optimized filters

jon-wei opened a new pull request #10056:
URL: https://github.com/apache/druid/pull/10056


   This PR is a small follow-on to #10015, which adjusts `Joinables.createSegmentMapFn` so that the join filter pre-analysis operates on the optimized forms of the filters in the query (using DimFilter.optimize()), since `HashJoinSegmentStorageAdapter.makeCursors()` will be passed the optimized filter forms.
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] ccaominh merged pull request #10056: Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters

Posted by GitBox <gi...@apache.org>.

ccaominh merged pull request #10056:
URL: https://github.com/apache/druid/pull/10056


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] gianm commented on a change in pull request #10056: Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters

Posted by GitBox <gi...@apache.org>.

gianm commented on a change in pull request #10056:
URL: https://github.com/apache/druid/pull/10056#discussion_r448606808



##########
File path: processing/src/main/java/org/apache/druid/query/filter/AbstractOptimizableDimFilter.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+
+/**
+ * Base class for DimFilters that support optimization.
+ */
+abstract class AbstractOptimizableDimFilter implements DimFilter
+{
+  private Filter cachedOptimizedFilter = null;
+
+  @JsonIgnore
+  @Override
+  public Filter toOptimizedFilter()

Review comment:
       I think this is going to be called by different processing threads simultaneously, so it should be thread-safe. Perhaps use Suppliers.memoize.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] ccaominh commented on a change in pull request #10056: Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters

Posted by GitBox <gi...@apache.org>.

ccaominh commented on a change in pull request #10056:
URL: https://github.com/apache/druid/pull/10056#discussion_r448715436



##########
File path: processing/src/main/java/org/apache/druid/query/filter/DimFilter.java
##########
@@ -23,7 +23,11 @@
 import com.fasterxml.jackson.annotation.JsonTypeInfo;
 import com.google.common.collect.RangeSet;
 import org.apache.druid.java.util.common.Cacheable;
+import org.apache.druid.java.util.common.granularity.Granularity;
+import org.apache.druid.query.QueryMetrics;
 import org.apache.druid.query.extraction.ExtractionFn;
+import org.apache.druid.segment.VirtualColumns;
+import org.joda.time.Interval;

Review comment:
       CI is flagging these 4 as unused imports




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] jon-wei commented on a change in pull request #10056: Ensure that join filter pre-analysis operates on optimized filters

Posted by GitBox <gi...@apache.org>.

jon-wei commented on a change in pull request #10056:
URL: https://github.com/apache/druid/pull/10056#discussion_r447372491



##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -11987,6 +11992,83 @@ public void testNestedGroupByOnInlineDataSourceWithFilter(Map<String, Object> qu
     );
   }
 
+  @Test
+  @Parameters(source = QueryContextForJoinProvider.class)
+  public void testGroupByJoinAsNativeQueryWithUnoptimizedFilter(Map<String, Object> queryContext) throws Exception

Review comment:
       Removed this




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] jon-wei commented on a change in pull request #10056: Ensure that join filter pre-analysis operates on optimized filters

Posted by GitBox <gi...@apache.org>.

jon-wei commented on a change in pull request #10056:
URL: https://github.com/apache/druid/pull/10056#discussion_r447373185



##########
File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
##########
@@ -107,8 +107,17 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
             );
 
             for (Query joinQuery : joinQueryLevels) {
+              // The pre-analysis needs to apply to the optimized form of filters, as this is what will be
+              // passed to HashJoinSegmentAdapter.makeCursors().
+              // The optimize() call here means that the filter optimization will happen twice,
+              // since the query toolchests will call optimize() later.
+              // We do this for simplicity as we cannot override what query will get run later from this context.
+              // A more complicated approach involving wrapping the query runner after the pre-merge decoration
+              // and moving the pre-analysis might be viable.
+              // The additional overhead of the simple approach should be low, as the additional optimize() call
+              // on each filter will only occur once per query.
               preAnalysisGroup.computeJoinFilterPreAnalysisIfAbsent(
-                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().toFilter(),
+                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().optimize().toFilter(),

Review comment:
       I tried to move this test into a new test class in processing, but I found that there's quite a lot of test machinery that has to be created (around making a SegmentWalker)
   
   I don't think that's worth doing at this point for a single test with pretty special requirements.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] jon-wei commented on a change in pull request #10056: Ensure that join filter pre-analysis operates on optimized filters

Posted by GitBox <gi...@apache.org>.

jon-wei commented on a change in pull request #10056:
URL: https://github.com/apache/druid/pull/10056#discussion_r443893331



##########
File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
##########
@@ -107,8 +107,17 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
             );
 
             for (Query joinQuery : joinQueryLevels) {
+              // The pre-analysis needs to apply to the optimized form of filters, as this is what will be
+              // passed to HashJoinSegmentAdapter.makeCursors().
+              // The optimize() call here means that the filter optimization will happen twice,
+              // since the query toolchests will call optimize() later.
+              // We do this for simplicity as we cannot override what query will get run later from this context.
+              // A more complicated approach involving wrapping the query runner after the pre-merge decoration
+              // and moving the pre-analysis might be viable.
+              // The additional overhead of the simple approach should be low, as the additional optimize() call
+              // on each filter will only occur once per query.
               preAnalysisGroup.computeJoinFilterPreAnalysisIfAbsent(
-                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().toFilter(),
+                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().optimize().toFilter(),

Review comment:
       I'll look into moving the test into `processing`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] ccaominh commented on a change in pull request #10056: Ensure that join filter pre-analysis operates on optimized filters

Posted by GitBox <gi...@apache.org>.

ccaominh commented on a change in pull request #10056:
URL: https://github.com/apache/druid/pull/10056#discussion_r443886344



##########
File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
##########
@@ -107,8 +107,17 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
             );
 
             for (Query joinQuery : joinQueryLevels) {
+              // The pre-analysis needs to apply to the optimized form of filters, as this is what will be
+              // passed to HashJoinSegmentAdapter.makeCursors().
+              // The optimize() call here means that the filter optimization will happen twice,
+              // since the query toolchests will call optimize() later.
+              // We do this for simplicity as we cannot override what query will get run later from this context.
+              // A more complicated approach involving wrapping the query runner after the pre-merge decoration
+              // and moving the pre-analysis might be viable.
+              // The additional overhead of the simple approach should be low, as the additional optimize() call
+              // on each filter will only occur once per query.
               preAnalysisGroup.computeJoinFilterPreAnalysisIfAbsent(
-                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().toFilter(),
+                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().optimize().toFilter(),

Review comment:
       The code coverage checker is flagging this change as uncovered (by the druid-processing unit tests):
   https://travis-ci.org/github/apache/druid/jobs/699956397#L1415
   
   Note: The new `CalciteQueryTest` added in this PR to druid-sql does hit this code path. 

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -11987,6 +11992,83 @@ public void testNestedGroupByOnInlineDataSourceWithFilter(Map<String, Object> qu
     );
   }
 
+  @Test
+  @Parameters(source = QueryContextForJoinProvider.class)
+  public void testGroupByJoinAsNativeQueryWithUnoptimizedFilter(Map<String, Object> queryContext) throws Exception

Review comment:
       Intellij inspection check is flagging this line since `Exception` is not thrown by the method body:
   https://travis-ci.org/github/apache/druid/jobs/699956394#L11326




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] jon-wei commented on a change in pull request #10056: Add DimFilter.toOptimizedFilter(), ensure that join filter pre-analysis operates on optimized filters

Posted by GitBox <gi...@apache.org>.

jon-wei commented on a change in pull request #10056:
URL: https://github.com/apache/druid/pull/10056#discussion_r448611416



##########
File path: processing/src/main/java/org/apache/druid/query/filter/AbstractOptimizableDimFilter.java
##########
@@ -0,0 +1,40 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter;
+
+import com.fasterxml.jackson.annotation.JsonIgnore;
+
+/**
+ * Base class for DimFilters that support optimization.
+ */
+abstract class AbstractOptimizableDimFilter implements DimFilter
+{
+  private Filter cachedOptimizedFilter = null;
+
+  @JsonIgnore
+  @Override
+  public Filter toOptimizedFilter()

Review comment:
       Ah, I was originally thinking of just letting the processing threads for non-joins possibly do some redundant computation (for joins, the pre-analysis would call toOptimizedFilter before the processing threads run the query), but I can adjust it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] jon-wei commented on a change in pull request #10056: Ensure that join filter pre-analysis operates on optimized filters

Posted by GitBox <gi...@apache.org>.

jon-wei commented on a change in pull request #10056:
URL: https://github.com/apache/druid/pull/10056#discussion_r447373185



##########
File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
##########
@@ -107,8 +107,17 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
             );
 
             for (Query joinQuery : joinQueryLevels) {
+              // The pre-analysis needs to apply to the optimized form of filters, as this is what will be
+              // passed to HashJoinSegmentAdapter.makeCursors().
+              // The optimize() call here means that the filter optimization will happen twice,
+              // since the query toolchests will call optimize() later.
+              // We do this for simplicity as we cannot override what query will get run later from this context.
+              // A more complicated approach involving wrapping the query runner after the pre-merge decoration
+              // and moving the pre-analysis might be viable.
+              // The additional overhead of the simple approach should be low, as the additional optimize() call
+              // on each filter will only occur once per query.
               preAnalysisGroup.computeJoinFilterPreAnalysisIfAbsent(
-                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().toFilter(),
+                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().optimize().toFilter(),

Review comment:
       I tried to move this test into a new test class in processing, but I found that there's quite a lot of test machinery that has to be created (around making a SegmentWalker). The existing test infrastructure for that all exists outside of the processing module.
   
   I don't think that's worth doing at this point for a single test with pretty special requirements.

##########
File path: processing/src/main/java/org/apache/druid/segment/join/Joinables.java
##########
@@ -107,8 +107,17 @@ public static boolean isPrefixedBy(final String columnName, final String prefix)
             );
 
             for (Query joinQuery : joinQueryLevels) {
+              // The pre-analysis needs to apply to the optimized form of filters, as this is what will be
+              // passed to HashJoinSegmentAdapter.makeCursors().
+              // The optimize() call here means that the filter optimization will happen twice,
+              // since the query toolchests will call optimize() later.
+              // We do this for simplicity as we cannot override what query will get run later from this context.
+              // A more complicated approach involving wrapping the query runner after the pre-merge decoration
+              // and moving the pre-analysis might be viable.
+              // The additional overhead of the simple approach should be low, as the additional optimize() call
+              // on each filter will only occur once per query.
               preAnalysisGroup.computeJoinFilterPreAnalysisIfAbsent(
-                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().toFilter(),
+                  joinQuery.getFilter() == null ? null : joinQuery.getFilter().optimize().toFilter(),

Review comment:
       I tried to move this test into a new test class in processing, but I found that there's quite a lot of test machinery that has to be created (around making a SegmentWalker). The existing test infrastructure for that all exists outside of the processing module, so it cannot be reused.
   
   I don't think that's worth doing at this point for a single test with pretty special requirements.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org