You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/01/14 01:33:13 UTC

[GitHub] [druid] gianm opened a new pull request #10758: Retain order of AND, OR filter children.

gianm opened a new pull request #10758:
URL: https://github.com/apache/druid/pull/10758


   If we retain the order, it enables short-circuiting. People can put a
   more selective filter earlier in the list and lower the chance that
   later filters will need to be evaluated.
   
   Short-circuiting was working before #9608, which switched to unordered
   sets to solve a different problem. This patch tries to solve that
   problem a different way.
   
   This patch moves filter simplification logic from "optimize" to
   "toFilter", because that allows the code to be shared with Filters.and
   and Filters.or. The simplification has become more complicated and so
   it's useful to share it.
   
   This patch also removes code from CalciteCnfHelper that is no longer
   necessary because Filters.and and Filters.or are now doing the work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r560733953



##########
File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
##########
@@ -480,68 +485,105 @@ public static boolean shouldUseBitmapIndex(
   }
 
   /**
-   * Create a filter representing an AND relationship across a list of filters.
+   * Create a filter representing an AND relationship across a list of filters. Deduplicates filters, flattens stacks,
+   * and removes literal "false" filters.
    *
-   * @param filterList List of filters
+   * @param filters List of filters
    *
-   * @return If filterList has more than one element, return an AND filter composed of the filters from filterList
-   * If filterList has a single element, return that element alone
-   * If filterList is empty, return null
+   * @return If "filters" has more than one filter remaining after processing, returns {@link AndFilter}.
+   * If "filters" has a single element remaining after processing, return that filter alone.
+   *
+   * @throws IllegalArgumentException if "filters" is empty
    */
-  @Nullable
-  public static Filter and(List<Filter> filterList)
+  public static Filter and(final List<Filter> filters)
   {
-    if (filterList.isEmpty()) {
-      return null;
-    }
+    return maybeAnd(filters).orElseThrow(() -> new IAE("Expected nonempty filters list"));
+  }
 
-    if (filterList.size() == 1) {
-      return filterList.get(0);
+  /**
+   * Like {@link #and}, but returns an empty Optional instead of throwing an exception if "filters" is empty.
+   */
+  public static Optional<Filter> maybeAnd(List<Filter> filters)
+  {
+    if (filters.isEmpty()) {
+      return Optional.empty();
     }
 
-    return new AndFilter(filterList);
+    final List<Filter> filtersToUse = flattenAndChildren(filters);
+
+    if (filtersToUse.isEmpty()) {
+      assert !filters.isEmpty();
+      // Original "filters" list must have been 100% literally-true filters.
+      return Optional.of(TrueFilter.instance());
+    } else if (filtersToUse.stream().anyMatch(filter -> filter instanceof FalseFilter)) {

Review comment:
       I'm not sure exactly what you're suggesting, so I left this as-is.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r558088939



##########
File path: processing/src/test/java/org/apache/druid/segment/join/JoinFilterAnalyzerTest.java
##########
@@ -2545,18 +2545,10 @@ public void test_filterPushDown_factToRegionToCountryLeftFilterOnTwoRHSColumnsSa
     JoinFilterSplit expectedFilterSplit = new JoinFilterSplit(
         new AndFilter(
             ImmutableList.of(
-                new AndFilter(
-                    ImmutableList.of(
-                        new InDimFilter("countryIsoCode", ImmutableSet.of("MMMM", "AAAA"), null, null).toFilter(),
-                        new InDimFilter("regionIsoCode", ImmutableSet.of("MMMM", "AAAA"), null, null).toFilter()
-                    )
-                ),
-                new AndFilter(
-                    ImmutableList.of(
-                        new InDimFilter("countryIsoCode", ImmutableSet.of("US"), null, null).toFilter(),
-                        new InDimFilter("regionIsoCode", ImmutableSet.of("CA"), null, null).toFilter()
-                    )
-                )
+                new InDimFilter("countryIsoCode", ImmutableSet.of("US"), null, null).toFilter(),

Review comment:
       This rewrite got better!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r558804246



##########
File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
##########
@@ -480,68 +485,105 @@ public static boolean shouldUseBitmapIndex(
   }
 
   /**
-   * Create a filter representing an AND relationship across a list of filters.
+   * Create a filter representing an AND relationship across a list of filters. Deduplicates filters, flattens stacks,
+   * and removes literal "false" filters.
    *
-   * @param filterList List of filters
+   * @param filters List of filters
    *
-   * @return If filterList has more than one element, return an AND filter composed of the filters from filterList
-   * If filterList has a single element, return that element alone
-   * If filterList is empty, return null
+   * @return If "filters" has more than one filter remaining after processing, returns {@link AndFilter}.
+   * If "filters" has a single element remaining after processing, return that filter alone.
+   *
+   * @throws IllegalArgumentException if "filters" is empty
    */
-  @Nullable
-  public static Filter and(List<Filter> filterList)
+  public static Filter and(final List<Filter> filters)
   {
-    if (filterList.isEmpty()) {
-      return null;
-    }
+    return maybeAnd(filters).orElseThrow(() -> new IAE("Expected nonempty filters list"));
+  }
 
-    if (filterList.size() == 1) {
-      return filterList.get(0);
+  /**
+   * Like {@link #and}, but returns an empty Optional instead of throwing an exception if "filters" is empty.
+   */
+  public static Optional<Filter> maybeAnd(List<Filter> filters)
+  {
+    if (filters.isEmpty()) {
+      return Optional.empty();
     }
 
-    return new AndFilter(filterList);
+    final List<Filter> filtersToUse = flattenAndChildren(filters);
+
+    if (filtersToUse.isEmpty()) {
+      assert !filters.isEmpty();
+      // Original "filters" list must have been 100% literally-true filters.
+      return Optional.of(TrueFilter.instance());
+    } else if (filtersToUse.stream().anyMatch(filter -> filter instanceof FalseFilter)) {

Review comment:
       nit: You can avoid this `anyMatch` check by doing the FalseFilter optimization in `flattenAndChildren` by making it return a single `FalseFilter`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r559635634



##########
File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
##########
@@ -552,4 +594,96 @@ public static boolean filterMatchesNull(Filter filter)
     ValueMatcher valueMatcher = filter.makeMatcher(ALL_NULL_COLUMN_SELECTOR_FACTORY);
     return valueMatcher.matches();
   }
+
+  /**
+   * Flattens children of an AND, removes duplicates, and removes literally-true filters.
+   */
+  private static List<Filter> flattenAndChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof AndFilter) {
+        retVal.addAll(flattenAndChildren(((AndFilter) child).getFilters()));
+      } else if (!(child instanceof TrueFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Flattens children of an OR, removes duplicates, and removes literally-false filters.
+   */
+  private static List<Filter> flattenOrChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof OrFilter) {
+        retVal.addAll(flattenOrChildren(((OrFilter) child).getFilters()));
+      } else if (!(child instanceof FalseFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Wraps a {@link Filter} in an object whose {@link #equals} and {@link #hashCode} methods are overridden to
+   * ignore order of children for {@link AndFilter} and {@link OrFilter}. Useful for deduplication in
+   * {@link #flattenAndChildren} and {@link #flattenOrChildren}.
+   */
+  private static class EquivalenceCheckedFilter
+  {
+    private final Filter filter;
+
+    EquivalenceCheckedFilter(Filter filter)
+    {
+      this.filter = filter;
+    }
+
+    @Override
+    public boolean equals(Object o)
+    {
+      if (this == o) {
+        return true;
+      }
+      if (o == null || getClass() != o.getClass()) {
+        return false;
+      }
+      EquivalenceCheckedFilter that = (EquivalenceCheckedFilter) o;
+
+      if (filter instanceof BooleanFilter && filter.getClass().equals(that.filter.getClass())) {
+        // Order of children doesn't matter for equivalence.
+        final Set<Filter> ourFilterSet = new HashSet<>(((BooleanFilter) filter).getFilters());

Review comment:
       will your changes fix https://github.com/apache/druid/pull/10316? I am guessing, your changes will make the situations describe in that PR less likely as Set is being created only when two AndFilter/OrFilters are compared. could we also compare the size of child filter and return early before creating a possible expensive hashSet? 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s merged pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
suneet-s merged pull request #10758:
URL: https://github.com/apache/druid/pull/10758


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r560744780



##########
File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
##########
@@ -552,4 +594,96 @@ public static boolean filterMatchesNull(Filter filter)
     ValueMatcher valueMatcher = filter.makeMatcher(ALL_NULL_COLUMN_SELECTOR_FACTORY);
     return valueMatcher.matches();
   }
+
+  /**
+   * Flattens children of an AND, removes duplicates, and removes literally-true filters.
+   */
+  private static List<Filter> flattenAndChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof AndFilter) {
+        retVal.addAll(flattenAndChildren(((AndFilter) child).getFilters()));
+      } else if (!(child instanceof TrueFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Flattens children of an OR, removes duplicates, and removes literally-false filters.
+   */
+  private static List<Filter> flattenOrChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof OrFilter) {
+        retVal.addAll(flattenOrChildren(((OrFilter) child).getFilters()));
+      } else if (!(child instanceof FalseFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Wraps a {@link Filter} in an object whose {@link #equals} and {@link #hashCode} methods are overridden to
+   * ignore order of children for {@link AndFilter} and {@link OrFilter}. Useful for deduplication in
+   * {@link #flattenAndChildren} and {@link #flattenOrChildren}.
+   */
+  private static class EquivalenceCheckedFilter
+  {
+    private final Filter filter;
+
+    EquivalenceCheckedFilter(Filter filter)
+    {
+      this.filter = filter;
+    }
+
+    @Override
+    public boolean equals(Object o)
+    {
+      if (this == o) {
+        return true;
+      }
+      if (o == null || getClass() != o.getClass()) {
+        return false;
+      }
+      EquivalenceCheckedFilter that = (EquivalenceCheckedFilter) o;
+
+      if (filter instanceof BooleanFilter && filter.getClass().equals(that.filter.getClass())) {
+        // Order of children doesn't matter for equivalence.
+        final Set<Filter> ourFilterSet = new HashSet<>(((BooleanFilter) filter).getFilters());
+        final Set<Filter> theirFilterSet = new HashSet<>(((BooleanFilter) that.filter).getFilters());
+        return ourFilterSet.equals(theirFilterSet);

Review comment:
       It handles them correctly in that it's not wrong. But, it isn't necessarily doing the best possible optimization. I edited it to use EquivalenceCheckedFilter recursively and added some more tests (see FiltersTest for the new ones).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r558804322



##########
File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
##########
@@ -480,68 +485,105 @@ public static boolean shouldUseBitmapIndex(
   }
 
   /**
-   * Create a filter representing an AND relationship across a list of filters.
+   * Create a filter representing an AND relationship across a list of filters. Deduplicates filters, flattens stacks,
+   * and removes literal "false" filters.
    *
-   * @param filterList List of filters
+   * @param filters List of filters
    *
-   * @return If filterList has more than one element, return an AND filter composed of the filters from filterList
-   * If filterList has a single element, return that element alone
-   * If filterList is empty, return null
+   * @return If "filters" has more than one filter remaining after processing, returns {@link AndFilter}.
+   * If "filters" has a single element remaining after processing, return that filter alone.
+   *
+   * @throws IllegalArgumentException if "filters" is empty
    */
-  @Nullable
-  public static Filter and(List<Filter> filterList)
+  public static Filter and(final List<Filter> filters)
   {
-    if (filterList.isEmpty()) {
-      return null;
-    }
+    return maybeAnd(filters).orElseThrow(() -> new IAE("Expected nonempty filters list"));
+  }
 
-    if (filterList.size() == 1) {
-      return filterList.get(0);
+  /**
+   * Like {@link #and}, but returns an empty Optional instead of throwing an exception if "filters" is empty.
+   */
+  public static Optional<Filter> maybeAnd(List<Filter> filters)
+  {
+    if (filters.isEmpty()) {
+      return Optional.empty();
     }
 
-    return new AndFilter(filterList);
+    final List<Filter> filtersToUse = flattenAndChildren(filters);
+
+    if (filtersToUse.isEmpty()) {
+      assert !filters.isEmpty();
+      // Original "filters" list must have been 100% literally-true filters.
+      return Optional.of(TrueFilter.instance());
+    } else if (filtersToUse.stream().anyMatch(filter -> filter instanceof FalseFilter)) {
+      // AND of FALSE with anything is FALSE.
+      return Optional.of(FalseFilter.instance());
+    } else if (filtersToUse.size() == 1) {
+      return Optional.of(filtersToUse.get(0));
+    } else {
+      return Optional.of(new AndFilter(filtersToUse));
+    }
   }
 
   /**
-   * Create a filter representing an OR relationship across a set of filters.
+   * Create a filter representing an OR relationship across a list of filters. Deduplicates filters, flattens stacks,
+   * and removes literal "false" filters.
+   *
+   * @param filters List of filters
    *
-   * @param filterSet Set of filters
+   * @return If "filters" has more than one filter remaining after processing, returns {@link OrFilter}.
+   * If "filters" has a single element remaining after processing, return that filter alone.
    *
-   * @return If filterSet has more than one element, return an OR filter composed of the filters from filterSet
-   * If filterSet has a single element, return that element alone
-   * If filterSet is empty, return null
+   * @throws IllegalArgumentException if "filters" is empty
    */
-  @Nullable
-  public static Filter or(Set<Filter> filterSet)
+  public static Filter or(final List<Filter> filters)
   {
-    if (filterSet.isEmpty()) {
-      return null;
-    }
+    return maybeOr(filters).orElseThrow(() -> new IAE("Expected nonempty filters list"));
+  }
 
-    if (filterSet.size() == 1) {
-      return filterSet.iterator().next();
+  /**
+   * Like {@link #or}, but returns an empty Optional instead of throwing an exception if "filters" is empty.
+   */
+  public static Optional<Filter> maybeOr(final List<Filter> filters)
+  {
+    if (filters.isEmpty()) {
+      return Optional.empty();
     }
 
-    return new OrFilter(filterSet);
+    final List<Filter> filtersToUse = flattenOrChildren(filters);
+
+    if (filtersToUse.isEmpty()) {
+      assert !filters.isEmpty();
+      // Original "filters" list must have been 100% literally-false filters.
+      return Optional.of(FalseFilter.instance());
+    } else if (filtersToUse.stream().anyMatch(filter -> filter instanceof TrueFilter)) {

Review comment:
       nit: similar comment to line 518




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r560758303



##########
File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
##########
@@ -552,4 +594,96 @@ public static boolean filterMatchesNull(Filter filter)
     ValueMatcher valueMatcher = filter.makeMatcher(ALL_NULL_COLUMN_SELECTOR_FACTORY);
     return valueMatcher.matches();
   }
+
+  /**
+   * Flattens children of an AND, removes duplicates, and removes literally-true filters.
+   */
+  private static List<Filter> flattenAndChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof AndFilter) {
+        retVal.addAll(flattenAndChildren(((AndFilter) child).getFilters()));
+      } else if (!(child instanceof TrueFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Flattens children of an OR, removes duplicates, and removes literally-false filters.
+   */
+  private static List<Filter> flattenOrChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof OrFilter) {
+        retVal.addAll(flattenOrChildren(((OrFilter) child).getFilters()));
+      } else if (!(child instanceof FalseFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Wraps a {@link Filter} in an object whose {@link #equals} and {@link #hashCode} methods are overridden to
+   * ignore order of children for {@link AndFilter} and {@link OrFilter}. Useful for deduplication in
+   * {@link #flattenAndChildren} and {@link #flattenOrChildren}.
+   */
+  private static class EquivalenceCheckedFilter
+  {
+    private final Filter filter;
+
+    EquivalenceCheckedFilter(Filter filter)
+    {
+      this.filter = filter;
+    }
+
+    @Override
+    public boolean equals(Object o)
+    {
+      if (this == o) {
+        return true;
+      }
+      if (o == null || getClass() != o.getClass()) {
+        return false;
+      }
+      EquivalenceCheckedFilter that = (EquivalenceCheckedFilter) o;
+
+      if (filter instanceof BooleanFilter && filter.getClass().equals(that.filter.getClass())) {
+        // Order of children doesn't matter for equivalence.
+        final Set<Filter> ourFilterSet = new HashSet<>(((BooleanFilter) filter).getFilters());

Review comment:
       We're back to a Set due to https://github.com/apache/druid/pull/10758#discussion_r558805842, so this won't change much. Probably memoizing the hashCode is still the way to go.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r558087010



##########
File path: processing/src/test/java/org/apache/druid/segment/filter/FilterCnfConversionTest.java
##########
@@ -253,54 +253,54 @@ public void testToCnfWithComplexFilterIncludingNotAndOr()
         )
     );
     final Filter expected = FilterTestUtils.and(
+        // The below OR filter could be eliminated because this filter also has

Review comment:
       note: these result changes were just re-orderings.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r558805842



##########
File path: processing/src/main/java/org/apache/druid/query/filter/BooleanFilter.java
##########
@@ -23,13 +23,14 @@
 import org.apache.druid.segment.ColumnSelectorFactory;
 
 import java.util.HashSet;
+import java.util.List;
 import java.util.Set;
 
 public interface BooleanFilter extends Filter
 {
   ValueMatcher[] EMPTY_VALUE_MATCHER_ARRAY = new ValueMatcher[0];
 
-  Set<Filter> getFilters();
+  List<Filter> getFilters();

Review comment:
       Could you add a javadoc to this function explaining why we a list is used here so that we can preserve the order of the filters to enable short-circuiting.
   
   Writing out this comment, made me think perhaps we could achieve the same outcome by using a LinkedHashSet instead - that would make it clearer in the rest of the code that there are no duplicates in the filters




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r558087569



##########
File path: processing/src/test/java/org/apache/druid/segment/filter/FilterCnfConversionTest.java
##########
@@ -356,65 +356,65 @@ public void testToNormalizedOrClausesWithComplexFilterIncludingNotAndOr()
             )
         )
     );
-    final Set<Filter> expected = ImmutableSet.of(
+    final List<Filter> expected = ImmutableList.of(

Review comment:
       another note: these were also just reorderings.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r560756111



##########
File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
##########
@@ -552,4 +594,96 @@ public static boolean filterMatchesNull(Filter filter)
     ValueMatcher valueMatcher = filter.makeMatcher(ALL_NULL_COLUMN_SELECTOR_FACTORY);
     return valueMatcher.matches();
   }
+
+  /**
+   * Flattens children of an AND, removes duplicates, and removes literally-true filters.
+   */
+  private static List<Filter> flattenAndChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof AndFilter) {
+        retVal.addAll(flattenAndChildren(((AndFilter) child).getFilters()));
+      } else if (!(child instanceof TrueFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Flattens children of an OR, removes duplicates, and removes literally-false filters.
+   */
+  private static List<Filter> flattenOrChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof OrFilter) {
+        retVal.addAll(flattenOrChildren(((OrFilter) child).getFilters()));
+      } else if (!(child instanceof FalseFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Wraps a {@link Filter} in an object whose {@link #equals} and {@link #hashCode} methods are overridden to
+   * ignore order of children for {@link AndFilter} and {@link OrFilter}. Useful for deduplication in
+   * {@link #flattenAndChildren} and {@link #flattenOrChildren}.
+   */
+  private static class EquivalenceCheckedFilter
+  {
+    private final Filter filter;
+
+    EquivalenceCheckedFilter(Filter filter)
+    {
+      this.filter = filter;
+    }
+
+    @Override
+    public boolean equals(Object o)
+    {
+      if (this == o) {
+        return true;
+      }
+      if (o == null || getClass() != o.getClass()) {
+        return false;
+      }
+      EquivalenceCheckedFilter that = (EquivalenceCheckedFilter) o;
+
+      if (filter instanceof BooleanFilter && filter.getClass().equals(that.filter.getClass())) {
+        // Order of children doesn't matter for equivalence.
+        final Set<Filter> ourFilterSet = new HashSet<>(((BooleanFilter) filter).getFilters());
+        final Set<Filter> theirFilterSet = new HashSet<>(((BooleanFilter) that.filter).getFilters());
+        return ourFilterSet.equals(theirFilterSet);

Review comment:
       Nevermind, I decided to take your advice and switch the List to a LinkedHashSet, which makes this EquivalenceCheckedFilter technique no longer necessary.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r560756535



##########
File path: processing/src/main/java/org/apache/druid/query/filter/BooleanFilter.java
##########
@@ -23,13 +23,14 @@
 import org.apache.druid.segment.ColumnSelectorFactory;
 
 import java.util.HashSet;
+import java.util.List;
 import java.util.Set;
 
 public interface BooleanFilter extends Filter
 {
   ValueMatcher[] EMPTY_VALUE_MATCHER_ARRAY = new ValueMatcher[0];
 
-  Set<Filter> getFilters();
+  List<Filter> getFilters();

Review comment:
       I think doing the LinkedHashSet makes sense, so I went with that approach.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r558805012



##########
File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
##########
@@ -552,4 +594,96 @@ public static boolean filterMatchesNull(Filter filter)
     ValueMatcher valueMatcher = filter.makeMatcher(ALL_NULL_COLUMN_SELECTOR_FACTORY);
     return valueMatcher.matches();
   }
+
+  /**
+   * Flattens children of an AND, removes duplicates, and removes literally-true filters.
+   */
+  private static List<Filter> flattenAndChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof AndFilter) {
+        retVal.addAll(flattenAndChildren(((AndFilter) child).getFilters()));
+      } else if (!(child instanceof TrueFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Flattens children of an OR, removes duplicates, and removes literally-false filters.
+   */
+  private static List<Filter> flattenOrChildren(final List<Filter> filters)
+  {
+    final List<Filter> retVal = new ArrayList<>();
+    final Set<EquivalenceCheckedFilter> seenFilters = new HashSet<>();
+
+    for (Filter child : filters) {
+      if (child instanceof OrFilter) {
+        retVal.addAll(flattenOrChildren(((OrFilter) child).getFilters()));
+      } else if (!(child instanceof FalseFilter) && seenFilters.add(new EquivalenceCheckedFilter(child))) {
+        retVal.add(child);
+      }
+    }
+
+    return retVal;
+  }
+
+  /**
+   * Wraps a {@link Filter} in an object whose {@link #equals} and {@link #hashCode} methods are overridden to
+   * ignore order of children for {@link AndFilter} and {@link OrFilter}. Useful for deduplication in
+   * {@link #flattenAndChildren} and {@link #flattenOrChildren}.
+   */
+  private static class EquivalenceCheckedFilter
+  {
+    private final Filter filter;
+
+    EquivalenceCheckedFilter(Filter filter)
+    {
+      this.filter = filter;
+    }
+
+    @Override
+    public boolean equals(Object o)
+    {
+      if (this == o) {
+        return true;
+      }
+      if (o == null || getClass() != o.getClass()) {
+        return false;
+      }
+      EquivalenceCheckedFilter that = (EquivalenceCheckedFilter) o;
+
+      if (filter instanceof BooleanFilter && filter.getClass().equals(that.filter.getClass())) {
+        // Order of children doesn't matter for equivalence.
+        final Set<Filter> ourFilterSet = new HashSet<>(((BooleanFilter) filter).getFilters());
+        final Set<Filter> theirFilterSet = new HashSet<>(((BooleanFilter) that.filter).getFilters());
+        return ourFilterSet.equals(theirFilterSet);

Review comment:
       Does this handle nested BooleanFilters correctly?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r560756535



##########
File path: processing/src/main/java/org/apache/druid/query/filter/BooleanFilter.java
##########
@@ -23,13 +23,14 @@
 import org.apache.druid.segment.ColumnSelectorFactory;
 
 import java.util.HashSet;
+import java.util.List;
 import java.util.Set;
 
 public interface BooleanFilter extends Filter
 {
   ValueMatcher[] EMPTY_VALUE_MATCHER_ARRAY = new ValueMatcher[0];
 
-  Set<Filter> getFilters();
+  List<Filter> getFilters();

Review comment:
       I think doing the LinkedHashSet makes sense, so I went with that approach. It simplified the code a bit, and let me delete the EquivalenceCheckedFilter stuff.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #10758:
URL: https://github.com/apache/druid/pull/10758#discussion_r558087569



##########
File path: processing/src/test/java/org/apache/druid/segment/filter/FilterCnfConversionTest.java
##########
@@ -356,65 +356,65 @@ public void testToNormalizedOrClausesWithComplexFilterIncludingNotAndOr()
             )
         )
     );
-    final Set<Filter> expected = ImmutableSet.of(
+    final List<Filter> expected = ImmutableList.of(

Review comment:
       Another note: these were also just reorderings.

##########
File path: processing/src/test/java/org/apache/druid/segment/filter/FilterCnfConversionTest.java
##########
@@ -253,54 +253,54 @@ public void testToCnfWithComplexFilterIncludingNotAndOr()
         )
     );
     final Filter expected = FilterTestUtils.and(
+        // The below OR filter could be eliminated because this filter also has

Review comment:
       Note: these result changes were just re-orderings.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on pull request #10758: Retain order of AND, OR filter children.

Posted by GitBox <gi...@apache.org>.
gianm commented on pull request #10758:
URL: https://github.com/apache/druid/pull/10758#issuecomment-759867448


   > This patch moves filter simplification logic from "optimize" to
   > "toFilter", because that allows the code to be shared with Filters.and
   > and Filters.or. The simplification has become more complicated and so
   > it's useful to share it.
   
   Note on this part: in theory, there is a performance risk here, since if `toFilter()` is called more often than `optimize()`, it means we'd do the simplification work more often. As far as I can tell, this isn't a big risk, since calls to `toFilter()` are generally either going through AbstractOptimizableDimFilter.toOptimizedFilter (which are cached) or are happening in places that are once per query, not once per segment (like FilteredAggregatorFactory's constructor).
   
   Doing the above analysis made me feel like collapsing DimFilter and Filter would be a good idea.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org