You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/07 06:14:40 UTC

[GitHub] [druid] jihoonson opened a new pull request #9634: More optimize CNF conversion of filters

jihoonson opened a new pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634
 
 
   ### Description
   
   A follow-up of #9608. Even with common subfilters elimination, the CNF conversion can result in a suboptimal CNF which is usually huge. One example is `A && B && ((C && D && E) || (C && D && F))`. The CNF conversion returned `((C || E) && (D || E) && (C || F) && (D || F) && C && D && (D || C) && A && B (F || E))` prior to this PR.
   
   I adopted the `RexUtil.pullFactors()` from Apache Calcite which seems to be able to address this issue. However, by its design, it can create a conjunctive form which is not necessarily a CNF. Since we still need CNF, I modified `toCnf()` method to call the adopted `pull()` method and then perform the existing CNF conversion. I added more unit tests to verify the modified behavior. Also, to verify that the filter is equivalent after conversion, I added a verification that checks the equivalence of filters by comparing their truth table.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#discussion_r405199617
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/filter/CnfHelper.java
 ##########
 @@ -0,0 +1,464 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.filter;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Iterables;
+import org.apache.druid.query.filter.BooleanFilter;
+import org.apache.druid.query.filter.Filter;
+
+import javax.annotation.Nonnull;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * A helper class to convert a filter to CNF.
+ *
+ * The methods in this class are mainly adopted from Apache Hive and Apache Calcite.
+ */
+public class CnfHelper
+{
+  public static Filter toCnf(Filter current)
+  {
+    current = pushDownNot(current);
+    current = flatten(current);
+    current = pull(current);
+    current = convertToCNFInternal(current);
+    current = flatten(current);
+    return current;
+  }
+
+  // A helper function adapted from Apache Hive, see:
+  // https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+  @VisibleForTesting
+  static Filter pushDownNot(Filter current)
+  {
+    if (current instanceof NotFilter) {
+      Filter child = ((NotFilter) current).getBaseFilter();
+      if (child instanceof NotFilter) {
+        return pushDownNot(((NotFilter) child).getBaseFilter());
+      }
+      if (child instanceof AndFilter) {
+        Set<Filter> children = new HashSet<>();
+        for (Filter grandChild : ((AndFilter) child).getFilters()) {
+          children.add(pushDownNot(new NotFilter(grandChild)));
+        }
+        return new OrFilter(children);
+      }
+      if (child instanceof OrFilter) {
+        Set<Filter> children = new HashSet<>();
+        for (Filter grandChild : ((OrFilter) child).getFilters()) {
+          children.add(pushDownNot(new NotFilter(grandChild)));
+        }
+        return new AndFilter(children);
+      }
+    }
+
+    if (current instanceof AndFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((AndFilter) current).getFilters()) {
+        children.add(pushDownNot(child));
+      }
+      return new AndFilter(children);
+    }
+
+    if (current instanceof OrFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((OrFilter) current).getFilters()) {
+        children.add(pushDownNot(child));
+      }
+      return new OrFilter(children);
+    }
+    return current;
+  }
+
+  // A helper function adapted from Apache Hive, see:
+  // https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+  private static Filter convertToCNFInternal(Filter current)
+  {
+    if (current instanceof NotFilter) {
+      return new NotFilter(convertToCNFInternal(((NotFilter) current).getBaseFilter()));
+    }
+    if (current instanceof AndFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((AndFilter) current).getFilters()) {
+        children.add(convertToCNFInternal(child));
+      }
+      return new AndFilter(children);
+    }
+    if (current instanceof OrFilter) {
+      // a list of leaves that weren't under AND expressions
+      List<Filter> nonAndList = new ArrayList<Filter>();
+      // a list of AND expressions that we need to distribute
+      List<Filter> andList = new ArrayList<Filter>();
+      for (Filter child : ((OrFilter) current).getFilters()) {
+        if (child instanceof AndFilter) {
+          andList.add(child);
+        } else if (child instanceof OrFilter) {
+          // pull apart the kids of the OR expression
+          nonAndList.addAll(((OrFilter) child).getFilters());
+        } else {
+          nonAndList.add(child);
+        }
+      }
+      if (!andList.isEmpty()) {
+        Set<Filter> result = new HashSet<>();
+        generateAllCombinations(result, andList, nonAndList);
+        return new AndFilter(result);
+      }
+    }
+    return current;
+  }
+
+  // A helper function adapted from Apache Hive, see:
+  // https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+  @VisibleForTesting
+  static Filter flatten(Filter root)
+  {
+    if (root instanceof BooleanFilter) {
+      List<Filter> children = new ArrayList<>(((BooleanFilter) root).getFilters());
+      // iterate through the index, so that if we add more children,
+      // they don't get re-visited
+      for (int i = 0; i < children.size(); ++i) {
+        Filter child = flatten(children.get(i));
+        // do we need to flatten?
+        if (child.getClass() == root.getClass() && !(child instanceof NotFilter)) {
+          boolean first = true;
+          Set<Filter> grandKids = ((BooleanFilter) child).getFilters();
+          for (Filter grandkid : grandKids) {
+            // for the first grandkid replace the original parent
+            if (first) {
+              first = false;
+              children.set(i, grandkid);
+            } else {
+              children.add(++i, grandkid);
+            }
+          }
+        } else {
+          children.set(i, child);
+        }
+      }
+      // if we have a singleton AND or OR, just return the child
+      if (children.size() == 1 && (root instanceof AndFilter || root instanceof OrFilter)) {
+        return children.get(0);
+      }
+
+      if (root instanceof AndFilter) {
+        return new AndFilter(children);
+      } else if (root instanceof OrFilter) {
+        return new OrFilter(children);
+      }
+    }
+    return root;
+  }
+
+  // A helper function adapted from Apache Hive, see:
+  // https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+  private static void generateAllCombinations(
+      Set<Filter> result,
+      List<Filter> andList,
+      List<Filter> nonAndList
+  )
+  {
+    Set<Filter> children = ((AndFilter) andList.get(0)).getFilters();
+    if (result.isEmpty()) {
+      for (Filter child : children) {
+        Set<Filter> a = new HashSet<>(nonAndList);
+        a.add(child);
+        result.add(new OrFilter(a));
+      }
+    } else {
+      List<Filter> work = new ArrayList<>(result);
+      result.clear();
+      for (Filter child : children) {
+        for (Filter or : work) {
+          Set<Filter> a = new HashSet<>((((OrFilter) or).getFilters()));
+          a.add(child);
+          result.add(new OrFilter(a));
+        }
+      }
+    }
+    if (andList.size() > 1) {
+      generateAllCombinations(result, andList.subList(1, andList.size()), nonAndList);
+    }
+  }
+
+  // All functions below were basically adopted from Apache Calcite and modified to use them in Druid.
 
 Review comment:
   Looking at how the file has multiple inclusions from other projects, can we split the Calcite-based methods into their own class? I think it'd be easier to tell what's from what that way

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on issue #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
jon-wei commented on issue #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#issuecomment-611024073
 
 
   there's a checkstyle failure

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug commented on a change in pull request #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
himanshug commented on a change in pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#discussion_r405769400
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/filter/cnf/HiveCnfHelper.java
 ##########
 @@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.filter.cnf;
+
+import org.apache.druid.query.filter.BooleanFilter;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.segment.filter.AndFilter;
+import org.apache.druid.segment.filter.NotFilter;
+import org.apache.druid.segment.filter.OrFilter;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * All functions in this class were basically adopted from Apache Hive and modified to use them in Druid.
+ * See https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+ * for original implementation.
+ */
+public class HiveCnfHelper
+{
+  public static Filter pushDownNot(Filter current)
+  {
+    if (current instanceof NotFilter) {
+      Filter child = ((NotFilter) current).getBaseFilter();
+      if (child instanceof NotFilter) {
+        return pushDownNot(((NotFilter) child).getBaseFilter());
+      }
+      if (child instanceof AndFilter) {
+        Set<Filter> children = new HashSet<>();
+        for (Filter grandChild : ((AndFilter) child).getFilters()) {
+          children.add(pushDownNot(new NotFilter(grandChild)));
+        }
+        return new OrFilter(children);
+      }
+      if (child instanceof OrFilter) {
+        Set<Filter> children = new HashSet<>();
+        for (Filter grandChild : ((OrFilter) child).getFilters()) {
+          children.add(pushDownNot(new NotFilter(grandChild)));
+        }
+        return new AndFilter(children);
+      }
+    }
+
+    if (current instanceof AndFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((AndFilter) current).getFilters()) {
+        children.add(pushDownNot(child));
+      }
+      return new AndFilter(children);
+    }
+
+    if (current instanceof OrFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((OrFilter) current).getFilters()) {
+        children.add(pushDownNot(child));
+      }
+      return new OrFilter(children);
+    }
+    return current;
+  }
+
+  public static Filter convertToCNFInternal(Filter current)
 
 Review comment:
   is this still "Internal" ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug commented on a change in pull request #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
himanshug commented on a change in pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#discussion_r405769028
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
 ##########
 @@ -426,175 +424,21 @@ public static Filter convertToCNFFromQueryContext(Query query, @Nullable Filter
       return null;
     }
     boolean useCNF = query.getContextBoolean(CTX_KEY_USE_FILTER_CNF, false);
-    return useCNF ? toCNF(filter) : filter;
+    return useCNF ? Filters.toCnf(filter) : filter;
   }
 
-  public static Filter toCNF(Filter current)
+  public static Filter toCnf(Filter current)
   {
-    current = pushDownNot(current);
-    current = flatten(current);
-    current = convertToCNFInternal(current);
-    current = flatten(current);
+    current = HiveCnfHelper.pushDownNot(current);
+    current = HiveCnfHelper.flatten(current);
+    // Pull out AND filters first to convert the filter into a conjunctive form.
+    // This is important to not create a huge CNF.
+    current = CalciteCnfHelper.pull(current);
+    current = HiveCnfHelper.convertToCNFInternal(current);
+    current = HiveCnfHelper.flatten(current);
 
 Review comment:
   It would be nice to add some commentary on each line describing (possibly with example) why it exists. I know there is something in PR description but this is non-trivial.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#discussion_r405205767
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/filter/CnfHelper.java
 ##########
 @@ -0,0 +1,464 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.filter;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Iterables;
+import org.apache.druid.query.filter.BooleanFilter;
+import org.apache.druid.query.filter.Filter;
+
+import javax.annotation.Nonnull;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+
+/**
+ * A helper class to convert a filter to CNF.
+ *
+ * The methods in this class are mainly adopted from Apache Hive and Apache Calcite.
+ */
+public class CnfHelper
+{
+  public static Filter toCnf(Filter current)
+  {
+    current = pushDownNot(current);
+    current = flatten(current);
+    current = pull(current);
+    current = convertToCNFInternal(current);
+    current = flatten(current);
+    return current;
+  }
+
+  // A helper function adapted from Apache Hive, see:
+  // https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+  @VisibleForTesting
+  static Filter pushDownNot(Filter current)
+  {
+    if (current instanceof NotFilter) {
+      Filter child = ((NotFilter) current).getBaseFilter();
+      if (child instanceof NotFilter) {
+        return pushDownNot(((NotFilter) child).getBaseFilter());
+      }
+      if (child instanceof AndFilter) {
+        Set<Filter> children = new HashSet<>();
+        for (Filter grandChild : ((AndFilter) child).getFilters()) {
+          children.add(pushDownNot(new NotFilter(grandChild)));
+        }
+        return new OrFilter(children);
+      }
+      if (child instanceof OrFilter) {
+        Set<Filter> children = new HashSet<>();
+        for (Filter grandChild : ((OrFilter) child).getFilters()) {
+          children.add(pushDownNot(new NotFilter(grandChild)));
+        }
+        return new AndFilter(children);
+      }
+    }
+
+    if (current instanceof AndFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((AndFilter) current).getFilters()) {
+        children.add(pushDownNot(child));
+      }
+      return new AndFilter(children);
+    }
+
+    if (current instanceof OrFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((OrFilter) current).getFilters()) {
+        children.add(pushDownNot(child));
+      }
+      return new OrFilter(children);
+    }
+    return current;
+  }
+
+  // A helper function adapted from Apache Hive, see:
+  // https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+  private static Filter convertToCNFInternal(Filter current)
+  {
+    if (current instanceof NotFilter) {
+      return new NotFilter(convertToCNFInternal(((NotFilter) current).getBaseFilter()));
+    }
+    if (current instanceof AndFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((AndFilter) current).getFilters()) {
+        children.add(convertToCNFInternal(child));
+      }
+      return new AndFilter(children);
+    }
+    if (current instanceof OrFilter) {
+      // a list of leaves that weren't under AND expressions
+      List<Filter> nonAndList = new ArrayList<Filter>();
+      // a list of AND expressions that we need to distribute
+      List<Filter> andList = new ArrayList<Filter>();
+      for (Filter child : ((OrFilter) current).getFilters()) {
+        if (child instanceof AndFilter) {
+          andList.add(child);
+        } else if (child instanceof OrFilter) {
+          // pull apart the kids of the OR expression
+          nonAndList.addAll(((OrFilter) child).getFilters());
+        } else {
+          nonAndList.add(child);
+        }
+      }
+      if (!andList.isEmpty()) {
+        Set<Filter> result = new HashSet<>();
+        generateAllCombinations(result, andList, nonAndList);
+        return new AndFilter(result);
+      }
+    }
+    return current;
+  }
+
+  // A helper function adapted from Apache Hive, see:
+  // https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+  @VisibleForTesting
+  static Filter flatten(Filter root)
+  {
+    if (root instanceof BooleanFilter) {
+      List<Filter> children = new ArrayList<>(((BooleanFilter) root).getFilters());
+      // iterate through the index, so that if we add more children,
+      // they don't get re-visited
+      for (int i = 0; i < children.size(); ++i) {
+        Filter child = flatten(children.get(i));
+        // do we need to flatten?
+        if (child.getClass() == root.getClass() && !(child instanceof NotFilter)) {
+          boolean first = true;
+          Set<Filter> grandKids = ((BooleanFilter) child).getFilters();
+          for (Filter grandkid : grandKids) {
+            // for the first grandkid replace the original parent
+            if (first) {
+              first = false;
+              children.set(i, grandkid);
+            } else {
+              children.add(++i, grandkid);
+            }
+          }
+        } else {
+          children.set(i, child);
+        }
+      }
+      // if we have a singleton AND or OR, just return the child
+      if (children.size() == 1 && (root instanceof AndFilter || root instanceof OrFilter)) {
+        return children.get(0);
+      }
+
+      if (root instanceof AndFilter) {
+        return new AndFilter(children);
+      } else if (root instanceof OrFilter) {
+        return new OrFilter(children);
+      }
+    }
+    return root;
+  }
+
+  // A helper function adapted from Apache Hive, see:
+  // https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+  private static void generateAllCombinations(
+      Set<Filter> result,
+      List<Filter> andList,
+      List<Filter> nonAndList
+  )
+  {
+    Set<Filter> children = ((AndFilter) andList.get(0)).getFilters();
+    if (result.isEmpty()) {
+      for (Filter child : children) {
+        Set<Filter> a = new HashSet<>(nonAndList);
+        a.add(child);
+        result.add(new OrFilter(a));
+      }
+    } else {
+      List<Filter> work = new ArrayList<>(result);
+      result.clear();
+      for (Filter child : children) {
+        for (Filter or : work) {
+          Set<Filter> a = new HashSet<>((((OrFilter) or).getFilters()));
+          a.add(child);
+          result.add(new OrFilter(a));
+        }
+      }
+    }
+    if (andList.size() > 1) {
+      generateAllCombinations(result, andList.subList(1, andList.size()), nonAndList);
+    }
+  }
+
+  // All functions below were basically adopted from Apache Calcite and modified to use them in Druid.
 
 Review comment:
   Sounds good. Split into `CalciteCnfHelper` and `HiveCnfHelper`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug commented on issue #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
himanshug commented on issue #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#issuecomment-611156592
 
 
   > The CNF conversion returned ((C || E) && (D || E) && (C || F) && (D || F) && C && D && (D || C) && A && B (F || E)) prior to this PR.
   
   what is the  result after this PR  :) ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#discussion_r405814531
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
 ##########
 @@ -426,175 +424,21 @@ public static Filter convertToCNFFromQueryContext(Query query, @Nullable Filter
       return null;
     }
     boolean useCNF = query.getContextBoolean(CTX_KEY_USE_FILTER_CNF, false);
-    return useCNF ? toCNF(filter) : filter;
+    return useCNF ? Filters.toCnf(filter) : filter;
   }
 
-  public static Filter toCNF(Filter current)
+  public static Filter toCnf(Filter current)
   {
-    current = pushDownNot(current);
-    current = flatten(current);
-    current = convertToCNFInternal(current);
-    current = flatten(current);
+    current = HiveCnfHelper.pushDownNot(current);
+    current = HiveCnfHelper.flatten(current);
+    // Pull out AND filters first to convert the filter into a conjunctive form.
+    // This is important to not create a huge CNF.
+    current = CalciteCnfHelper.pull(current);
+    current = HiveCnfHelper.convertToCNFInternal(current);
+    current = HiveCnfHelper.flatten(current);
 
 Review comment:
   This code was adopted from Apache Hive as well and I don't know the exact reason for why we are doing in this order.. I tried to add some. Hopefully it helps.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#discussion_r405814703
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/filter/cnf/HiveCnfHelper.java
 ##########
 @@ -0,0 +1,190 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.filter.cnf;
+
+import org.apache.druid.query.filter.BooleanFilter;
+import org.apache.druid.query.filter.Filter;
+import org.apache.druid.segment.filter.AndFilter;
+import org.apache.druid.segment.filter.NotFilter;
+import org.apache.druid.segment.filter.OrFilter;
+
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+/**
+ * All functions in this class were basically adopted from Apache Hive and modified to use them in Druid.
+ * See https://github.com/apache/hive/blob/branch-2.0/storage-api/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java
+ * for original implementation.
+ */
+public class HiveCnfHelper
+{
+  public static Filter pushDownNot(Filter current)
+  {
+    if (current instanceof NotFilter) {
+      Filter child = ((NotFilter) current).getBaseFilter();
+      if (child instanceof NotFilter) {
+        return pushDownNot(((NotFilter) child).getBaseFilter());
+      }
+      if (child instanceof AndFilter) {
+        Set<Filter> children = new HashSet<>();
+        for (Filter grandChild : ((AndFilter) child).getFilters()) {
+          children.add(pushDownNot(new NotFilter(grandChild)));
+        }
+        return new OrFilter(children);
+      }
+      if (child instanceof OrFilter) {
+        Set<Filter> children = new HashSet<>();
+        for (Filter grandChild : ((OrFilter) child).getFilters()) {
+          children.add(pushDownNot(new NotFilter(grandChild)));
+        }
+        return new AndFilter(children);
+      }
+    }
+
+    if (current instanceof AndFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((AndFilter) current).getFilters()) {
+        children.add(pushDownNot(child));
+      }
+      return new AndFilter(children);
+    }
+
+    if (current instanceof OrFilter) {
+      Set<Filter> children = new HashSet<>();
+      for (Filter child : ((OrFilter) current).getFilters()) {
+        children.add(pushDownNot(child));
+      }
+      return new OrFilter(children);
+    }
+    return current;
+  }
+
+  public static Filter convertToCNFInternal(Filter current)
 
 Review comment:
   Oops, renamed to `convertToCnf()`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#issuecomment-611195064
 
 
   @jon-wei thanks, fixed checkstyle.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on issue #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
jihoonson commented on issue #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#issuecomment-611194940
 
 
   > > The CNF conversion returned ((C || E) && (D || E) && (C || F) && (D || F) && C && D && (D || C) && A && B (F || E)) prior to this PR.
   > 
   > what is the result after this PR :) ?
   
   Now it returns `A && B && C && D && (E || F)`. This is tested in `FilterCnfConversionTest.testToCnfCollapsibleBigFilter()`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei merged pull request #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
jon-wei merged pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#discussion_r405897673
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
 ##########
 @@ -426,175 +424,21 @@ public static Filter convertToCNFFromQueryContext(Query query, @Nullable Filter
       return null;
     }
     boolean useCNF = query.getContextBoolean(CTX_KEY_USE_FILTER_CNF, false);
-    return useCNF ? toCNF(filter) : filter;
+    return useCNF ? Filters.toCnf(filter) : filter;
   }
 
-  public static Filter toCNF(Filter current)
+  public static Filter toCnf(Filter current)
   {
-    current = pushDownNot(current);
-    current = flatten(current);
-    current = convertToCNFInternal(current);
-    current = flatten(current);
+    current = HiveCnfHelper.pushDownNot(current);
+    current = HiveCnfHelper.flatten(current);
+    // Pull out AND filters first to convert the filter into a conjunctive form.
+    // This is important to not create a huge CNF.
+    current = CalciteCnfHelper.pull(current);
+    current = HiveCnfHelper.convertToCNFInternal(current);
+    current = HiveCnfHelper.flatten(current);
 
 Review comment:
   Yeah, my two PRs are for creating more optimized CNFs in terms of size since we have seen that a filter of the same pattern with that at https://github.com/apache/druid/pull/9634#issuecomment-611156592 blows up memory after CNF conversion. Those PRs are mostly based on my heuristics. I happened to find `RexUtil.pullFactors()` of Calcite which seems useful to address the size issue. 
   
   > For example there are optimizations like (not sure if they are taken care of or not)
   > a OR (NOT a) = TRUE
   > a AND (NOT A) = FALSE
   
   Good point. I'm aware of that our CNF conversion doesn't reduce such filters by evaluating them, but I hope Calcite could reduce most of them in the SQL layer. After the join support was added, now there are two main use cases for the filter in CNF. One is cursor filter push down to use bitmaps if possible (the one you mentioned) and another is join filter push down to place filters at proper joins (both happen in historicals). I think it would be safe to assume that the filter is already optimized by Calcite for these use cases since SQL is now the primary way for querying.
   
   > unfortunate thing is that this is a major research topic with no proven algorithm that works in all cases.
   
   I totally agree. @jon-wei and I have been talking about it and perhaps the best would be using a well-tested third-party library. But I haven't seen such a library yet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] himanshug commented on a change in pull request #9634: More optimize CNF conversion of filters

Posted by GitBox <gi...@apache.org>.
himanshug commented on a change in pull request #9634: More optimize CNF conversion of filters
URL: https://github.com/apache/druid/pull/9634#discussion_r405866028
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/filter/Filters.java
 ##########
 @@ -426,175 +424,21 @@ public static Filter convertToCNFFromQueryContext(Query query, @Nullable Filter
       return null;
     }
     boolean useCNF = query.getContextBoolean(CTX_KEY_USE_FILTER_CNF, false);
-    return useCNF ? toCNF(filter) : filter;
+    return useCNF ? Filters.toCnf(filter) : filter;
   }
 
-  public static Filter toCNF(Filter current)
+  public static Filter toCnf(Filter current)
   {
-    current = pushDownNot(current);
-    current = flatten(current);
-    current = convertToCNFInternal(current);
-    current = flatten(current);
+    current = HiveCnfHelper.pushDownNot(current);
+    current = HiveCnfHelper.flatten(current);
+    // Pull out AND filters first to convert the filter into a conjunctive form.
+    // This is important to not create a huge CNF.
+    current = CalciteCnfHelper.pull(current);
+    current = HiveCnfHelper.convertToCNFInternal(current);
+    current = HiveCnfHelper.flatten(current);
 
 Review comment:
   thanks, those comments do help.
   
   I think I got/remember it now.
   
   "pushDownNot, flatten, convertToCnf, flatten" sequence is to convert original boolean expression into the CNF form, which doesn't necessarily "optimize"(from evaluation perspective) but CNF form is useful for analysis on boolean expressions(often  use to  prove equivalence of two different boolean expressions), in  our case, separating filter evaluation into pre/post cursor use for applying filters.
   
   your PRs (this and #9608 ) are towards optimizing the boolean expression. Are your PRs driven by similar work done in Calcite or some know work or based  on your own heuristics ? For example there are optimizations like (not sure if they are taken care of or not)
   `a OR (NOT  a) = TRUE`
   `a AND (NOT A) = FALSE`
   ... and more listed in section "Ten Basic Rules of Boolean Algebra, in https://grace.bluegrass.kctcs.edu/~kdunn0001/files/Simplification/4_Simplification_print.html
   
   unfortunate thing is that this is a major research topic with no proven algorithm that works in all cases.
   
   BTW: this comment is not a PR code review but just general things I noted while thinking more about this and wanted to document here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org