You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/04/30 07:34:24 UTC

[GitHub] [druid] jihoonson opened a new pull request #9800: Avoid sorting values in InDimFilter if possible

jihoonson opened a new pull request #9800:
URL: https://github.com/apache/druid/pull/9800


   ### Description
   
   The `values` in `InDimFilter` can be sometimes huge especially with query inlining. In that case, the constructor of `InDimFilter` can take a long time to build a `TreeSet` for storing filter values. AFAIT, `values` doesn't have to be a `SortedSet`, but can be a `Set` instead except for cache key. To avoid sorting in the constructor but generate cache key in a deterministic way, `getCacheKey()` method needs to sort values for generating cache key. Since this method can be called often, cache key is now cached in memory once it is created. Also, instead of storing all values in the cache key, a hash of values is stored now to avoid generating too long hash key.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420340485



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {

Review comment:
       All `DimFilter`s are not thread-safe. It should be never called by two threads at the same time.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r421006133



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();

Review comment:
       The issue with storing all `values` in the cacheKey is that the cache key can become huge as the size of `values` grows which can result in pollution in the cache. 
   
   Regarding using SHA-256, I believe academical research is more trustable than my testing. [According to the probability table, the required number of hashed elements such that probability of at least one hash collision >= 10^−18 is 4.8×10^29 under assumption that the hash function is perfect](https://en.wikipedia.org/wiki/Birthday_problem). This is negligible when it comes to practice even though the SHA-256 hash function is not proven to be perfect.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r419817391



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,41 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();
+      for (String v : sortedValues) {
+        if (v == null) {
+          hasher.putInt(0);
+        } else {
+          hasher.putString(v, StandardCharsets.UTF_8);
+        }
       }
+      cacheKey = new CacheKeyBuilder(DimFilterUtils.IN_CACHE_ID)
+          .appendString(dimension)
+          .appendByte(DimFilterUtils.STRING_SEPARATOR)
+          .appendByteArray(extractionFn == null ? new byte[0] : extractionFn.getCacheKey())
+          .appendByte(DimFilterUtils.STRING_SEPARATOR)
+          .appendByte(DimFilterUtils.STRING_SEPARATOR)

Review comment:
       Oops




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r421069133



##########
File path: processing/src/main/java/org/apache/druid/query/filter/NotDimFilter.java
##########
@@ -60,10 +60,17 @@ public DimFilter getField()
     return ByteBuffer.allocate(1 + subKey.length).put(DimFilterUtils.NOT_CACHE_ID).put(subKey).array();
   }
 
+  @SuppressWarnings("ObjectEquality")

Review comment:
       Thanks. I think this is fine since I don't see any reason to change them to be not singletons in the near future. Also, we should be able to notice that this method is not valid anymore if the assumption did change because we would remove `instance()` methods and therefore modify this method.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420339930



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);

Review comment:
       This method is not called if populateCache = false and useCache = false. Will add some comments about it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r419604051



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -84,9 +90,11 @@ public InDimFilter(
     Preconditions.checkNotNull(dimension, "dimension can not be null");
     Preconditions.checkArgument(values != null, "values can not be null");
 
-    this.values = new TreeSet<>(Comparators.naturalNullsFirst());
-    for (String value : values) {
-      this.values.add(NullHandling.emptyToNullIfNeeded(value));
+    // The values set can be huge. Try to avoid copying the set if possible.
+    if (values instanceof Set && values.stream().noneMatch(NullHandling::needsEmptyToNull)) {
+      this.values = (Set<String>) values;

Review comment:
       Added a comment on the constructor.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420309748



##########
File path: processing/src/main/java/org/apache/druid/query/filter/AndDimFilter.java
##########
@@ -72,8 +73,20 @@ public AndDimFilter(DimFilter... fields)
   @Override
   public DimFilter optimize()
   {
-    List<DimFilter> elements = DimFilters.optimize(fields);
-    return elements.size() == 1 ? elements.get(0) : new AndDimFilter(elements);
+    List<DimFilter> elements = DimFilters.optimize(fields)
+                                         .stream()
+                                         .filter(filter -> !(filter instanceof TrueDimFilter))
+                                         .collect(Collectors.toList());
+    if (elements.isEmpty()) {
+      // All elements were TrueDimFilter after optimization
+      return TrueDimFilter.instance();
+    } else if (elements.size() == 1) {
+      return elements.get(0);

Review comment:
       AndDimFilterTest is missing branch coverage for this and line 88




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r419212188



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -84,9 +90,11 @@ public InDimFilter(
     Preconditions.checkNotNull(dimension, "dimension can not be null");
     Preconditions.checkArgument(values != null, "values can not be null");
 
-    this.values = new TreeSet<>(Comparators.naturalNullsFirst());
-    for (String value : values) {
-      this.values.add(NullHandling.emptyToNullIfNeeded(value));
+    // The values set can be huge. Try to avoid copying the set if possible.
+    if (values instanceof Set && values.stream().noneMatch(NullHandling::needsEmptyToNull)) {
+      this.values = (Set<String>) values;

Review comment:
       I don't think we have any usages right now where this would be an issue, but maybe it's good to have a comment somewhere noting that the set passed in to InDimFilter shouldn't be modified afterwards since it can be reused

##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +141,43 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
-      }
+    if (cacheKey == null) {
+      final boolean hasNull = values.stream().anyMatch(Objects::isNull);

Review comment:
       If `values` is a set, I think you could determine `hasNull` beforehand if you've already scanned values for empty/nulls, and avoid scanning again

##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -84,9 +90,11 @@ public InDimFilter(
     Preconditions.checkNotNull(dimension, "dimension can not be null");
     Preconditions.checkArgument(values != null, "values can not be null");
 
-    this.values = new TreeSet<>(Comparators.naturalNullsFirst());
-    for (String value : values) {
-      this.values.add(NullHandling.emptyToNullIfNeeded(value));
+    // The values set can be huge. Try to avoid copying the set if possible.
+    if (values instanceof Set && values.stream().noneMatch(NullHandling::needsEmptyToNull)) {

Review comment:
       If `NullHandling.replaceWithDefault()` returns false, I think you can skip the scan here and just reuse the set




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r421005456



##########
File path: processing/src/main/java/org/apache/druid/query/filter/AndDimFilter.java
##########
@@ -72,8 +73,20 @@ public AndDimFilter(DimFilter... fields)
   @Override
   public DimFilter optimize()
   {
-    List<DimFilter> elements = DimFilters.optimize(fields);
-    return elements.size() == 1 ? elements.get(0) : new AndDimFilter(elements);
+    List<DimFilter> elements = DimFilters.optimize(fields)
+                                         .stream()
+                                         .filter(filter -> !(filter instanceof TrueDimFilter))
+                                         .collect(Collectors.toList());
+    if (elements.isEmpty()) {
+      // All elements were TrueDimFilter after optimization
+      return TrueDimFilter.instance();
+    } else if (elements.size() == 1) {
+      return elements.get(0);

Review comment:
       Added.

##########
File path: processing/src/main/java/org/apache/druid/query/filter/FalseDimFilter.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.google.common.collect.ImmutableRangeSet;
+import com.google.common.collect.RangeSet;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.filter.FalseFilter;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Set;
+
+public class FalseDimFilter implements DimFilter
+{
+  private static final FalseDimFilter INSTANCE = new FalseDimFilter();
+
+  @JsonCreator
+  public static FalseDimFilter instance()
+  {
+    return INSTANCE;
+  }
+
+  private FalseDimFilter()
+  {
+  }
+
+  @Override
+  public DimFilter optimize()
+  {
+    return this;
+  }
+
+  @Override
+  public Filter toFilter()
+  {
+    return FalseFilter.instance();
+  }
+
+  @Nullable
+  @Override
+  public RangeSet<String> getDimensionRangeSet(String dimension)
+  {
+    return ImmutableRangeSet.of();
+  }
+
+  @Override
+  public Set<String> getRequiredColumns()
+  {
+    return Collections.emptySet();
+  }
+
+  @Override
+  public byte[] getCacheKey()
+  {
+    return new CacheKeyBuilder(DimFilterUtils.FALSE_CACHE_ID).build();

Review comment:
       Done.

##########
File path: processing/src/main/java/org/apache/druid/query/filter/FalseDimFilter.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.google.common.collect.ImmutableRangeSet;
+import com.google.common.collect.RangeSet;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.filter.FalseFilter;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Set;
+
+public class FalseDimFilter implements DimFilter
+{
+  private static final FalseDimFilter INSTANCE = new FalseDimFilter();
+
+  @JsonCreator
+  public static FalseDimFilter instance()
+  {
+    return INSTANCE;
+  }
+
+  private FalseDimFilter()
+  {
+  }
+
+  @Override
+  public DimFilter optimize()
+  {
+    return this;
+  }
+
+  @Override
+  public Filter toFilter()
+  {
+    return FalseFilter.instance();
+  }
+
+  @Nullable
+  @Override
+  public RangeSet<String> getDimensionRangeSet(String dimension)
+  {
+    return ImmutableRangeSet.of();
+  }
+
+  @Override
+  public Set<String> getRequiredColumns()
+  {
+    return Collections.emptySet();
+  }
+
+  @Override
+  public byte[] getCacheKey()
+  {
+    return new CacheKeyBuilder(DimFilterUtils.FALSE_CACHE_ID).build();
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return DimFilterUtils.FALSE_CACHE_ID;
+  }
+
+  @Override
+  public boolean equals(Object o)
+  {
+    return o == this;
+  }

Review comment:
       Done.

##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -73,9 +77,14 @@
   private final Supplier<DruidFloatPredicate> floatPredicateSupplier;
   private final Supplier<DruidDoublePredicate> doublePredicateSupplier;
 
+  @JsonIgnore
+  private byte[] cacheKey;
+
   @JsonCreator
   public InDimFilter(
       @JsonProperty("dimension") String dimension,
+      // This 'values' collection instance can be reused if possible to avoid copying a big collection.
+      // Callers should _not_ modify the collection after it is passed to this constructor.
       @JsonProperty("values") Collection<String> values,

Review comment:
       Done.

##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);

Review comment:
       Added.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420311705



##########
File path: processing/src/main/java/org/apache/druid/query/filter/FalseDimFilter.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.google.common.collect.ImmutableRangeSet;
+import com.google.common.collect.RangeSet;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.filter.FalseFilter;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Set;
+
+public class FalseDimFilter implements DimFilter
+{
+  private static final FalseDimFilter INSTANCE = new FalseDimFilter();
+
+  @JsonCreator
+  public static FalseDimFilter instance()
+  {
+    return INSTANCE;
+  }
+
+  private FalseDimFilter()
+  {
+  }
+
+  @Override
+  public DimFilter optimize()
+  {
+    return this;
+  }
+
+  @Override
+  public Filter toFilter()
+  {
+    return FalseFilter.instance();
+  }
+
+  @Nullable
+  @Override
+  public RangeSet<String> getDimensionRangeSet(String dimension)
+  {
+    return ImmutableRangeSet.of();
+  }
+
+  @Override
+  public Set<String> getRequiredColumns()
+  {
+    return Collections.emptySet();
+  }
+
+  @Override
+  public byte[] getCacheKey()
+  {
+    return new CacheKeyBuilder(DimFilterUtils.FALSE_CACHE_ID).build();

Review comment:
       nit: should this be a static instance

##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -73,9 +77,14 @@
   private final Supplier<DruidFloatPredicate> floatPredicateSupplier;
   private final Supplier<DruidDoublePredicate> doublePredicateSupplier;
 
+  @JsonIgnore
+  private byte[] cacheKey;
+
   @JsonCreator
   public InDimFilter(
       @JsonProperty("dimension") String dimension,
+      // This 'values' collection instance can be reused if possible to avoid copying a big collection.
+      // Callers should _not_ modify the collection after it is passed to this constructor.
       @JsonProperty("values") Collection<String> values,

Review comment:
       Can we pass in a `Set<String>` instead of `Collection<String>` so jackson can do the de-dupe for us?

##########
File path: processing/src/main/java/org/apache/druid/query/filter/AndDimFilter.java
##########
@@ -72,8 +73,20 @@ public AndDimFilter(DimFilter... fields)
   @Override
   public DimFilter optimize()
   {
-    List<DimFilter> elements = DimFilters.optimize(fields);
-    return elements.size() == 1 ? elements.get(0) : new AndDimFilter(elements);
+    List<DimFilter> elements = DimFilters.optimize(fields)
+                                         .stream()
+                                         .filter(filter -> !(filter instanceof TrueDimFilter))
+                                         .collect(Collectors.toList());
+    if (elements.isEmpty()) {
+      // All elements were TrueDimFilter after optimization
+      return TrueDimFilter.instance();
+    } else if (elements.size() == 1) {
+      return elements.get(0);

Review comment:
       Looks like this code is hit by other tests in druid-processing, but I'm not sure which ones

##########
File path: processing/src/main/java/org/apache/druid/query/filter/FalseDimFilter.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.google.common.collect.ImmutableRangeSet;
+import com.google.common.collect.RangeSet;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.filter.FalseFilter;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Set;
+
+public class FalseDimFilter implements DimFilter
+{
+  private static final FalseDimFilter INSTANCE = new FalseDimFilter();
+
+  @JsonCreator
+  public static FalseDimFilter instance()
+  {
+    return INSTANCE;
+  }
+
+  private FalseDimFilter()
+  {
+  }
+
+  @Override
+  public DimFilter optimize()
+  {
+    return this;
+  }
+
+  @Override
+  public Filter toFilter()
+  {
+    return FalseFilter.instance();
+  }
+
+  @Nullable
+  @Override
+  public RangeSet<String> getDimensionRangeSet(String dimension)
+  {
+    return ImmutableRangeSet.of();
+  }
+
+  @Override
+  public Set<String> getRequiredColumns()
+  {
+    return Collections.emptySet();
+  }
+
+  @Override
+  public byte[] getCacheKey()
+  {
+    return new CacheKeyBuilder(DimFilterUtils.FALSE_CACHE_ID).build();
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return DimFilterUtils.FALSE_CACHE_ID;
+  }
+
+  @Override
+  public boolean equals(Object o)
+  {
+    return o == this;
+  }

Review comment:
       EqualsVerifierTest for this?

##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {

Review comment:
       Can multiple threads call `getCacheKey` at the same time?
   
   What happens if 2 threads try to build sortedValues?

##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);

Review comment:
       Does this negate your comment on line 96?
   
   We'd be copying all the values here to a list so that we can sort the list.

##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();

Review comment:
       Why did you choose a sha256 hasher? Is it fast to construct a new hasher? or to perform a sha256 hash?
   
   I wrote a test here which fails to produce the same cacheKey even though the values are sorted
   
   ```
   @Test
     public void testCacheKey()
     {
       final InDimFilter dimFilter1 = new InDimFilter("dim", ImmutableList.of("v1", "v2", "v3"), null);
       final InDimFilter dimFilter2 = new InDimFilter("dim", ImmutableList.of("v3", "v2", "v1"), null);
       Assert.assertEquals(dimFilter1.getCacheKey(), dimFilter2.getCacheKey());
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420360898



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();

Review comment:
       > I chose sha256 since it is long enough to distinguish different `values` sets even though I haven't really tested. I will do some tests.
   
   👍
   
   I think a simpler solution would be to revert back to the previous way of building a cacheKey, but use `.appendStringsIgnoringOrder()` instead of `.appendStrings()` This way the sort is done once for the list and we don't add any new memory overhead from the current implementation.
   
   Maybe consider creating a new utility function to the CacheKeyBuilder that sorts the list and writes whether or not any nulls existed in the list. This way we can do one pass of the list to build the cacheKey, but even at 10k entries in the list, this might be a micro optimization.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r421006641



##########
File path: processing/src/main/java/org/apache/druid/query/filter/NotDimFilter.java
##########
@@ -60,10 +60,17 @@ public DimFilter getField()
     return ByteBuffer.allocate(1 + subKey.length).put(DimFilterUtils.NOT_CACHE_ID).put(subKey).array();
   }
 
+  @SuppressWarnings("ObjectEquality")

Review comment:
       Why is it better?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r421011961



##########
File path: processing/src/main/java/org/apache/druid/query/filter/vector/FalseVectorMatcher.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter.vector;
+
+import org.apache.druid.segment.vector.VectorSizeInspector;
+
+public class FalseVectorMatcher implements VectorValueMatcher
+{
+  private final VectorSizeInspector vectorSizeInspector;
+
+  public FalseVectorMatcher(VectorSizeInspector vectorSizeInspector)
+  {
+    this.vectorSizeInspector = vectorSizeInspector;
+  }
+
+  @Override
+  public ReadableVectorMatch match(ReadableVectorMatch mask)
+  {
+    return VectorMatch.allFalse();
+  }
+
+  @Override
+  public int getMaxVectorSize()
+  {
+    return vectorSizeInspector.getMaxVectorSize();
+  }
+
+  @Override
+  public int getCurrentVectorSize()
+  {
+    return vectorSizeInspector.getCurrentVectorSize();
+  }

Review comment:
       ah cool. Thanks for the explanation!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420342740



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();

Review comment:
       I believe your test will pass if you use `Assert.assertArrayEquals()` (`getCacheKey()` returns a byte array).
   
   > Why did you choose a sha256 hasher? Is it fast to construct a new hasher? or to perform a sha256 hash?
   
   I chose sha256 since it is long enough to distinguish different `values` sets even though I haven't really tested. I will do some tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jon-wei commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r419808059



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,41 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();
+      for (String v : sortedValues) {
+        if (v == null) {
+          hasher.putInt(0);
+        } else {
+          hasher.putString(v, StandardCharsets.UTF_8);
+        }
       }
+      cacheKey = new CacheKeyBuilder(DimFilterUtils.IN_CACHE_ID)
+          .appendString(dimension)
+          .appendByte(DimFilterUtils.STRING_SEPARATOR)
+          .appendByteArray(extractionFn == null ? new byte[0] : extractionFn.getCacheKey())
+          .appendByte(DimFilterUtils.STRING_SEPARATOR)
+          .appendByte(DimFilterUtils.STRING_SEPARATOR)

Review comment:
       There's an extra STRING_SEPARATOR now




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420360898



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();

Review comment:
       > I chose sha256 since it is long enough to distinguish different `values` sets even though I haven't really tested. I will do some tests.
   
   👍
   
   I think a simpler solution would be to revert back to the previous way of building a cacheKey, but use `.appendStringsIgnoringOrder()` instead of `.appendStrings()` This way the sort is done once for the list and we don't add any new memory overhead from the current implementation.
   
   Maybe consider creating a new utility function to the CacheKeyBuilder that sorts the list and writes whether or not any nulls existed in the list. This way we can do one pass over the list of values to build the cacheKey, but even at 10k entries in the list, this might be a micro optimization.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420341968



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();

Review comment:
       Sorry about the false alarm. I just realized it's failing because we're comparing 2 byte arrays, but they are in fact equivalent
   
   ```
     @Test
     public void testCacheKey()
     {
       final InDimFilter dimFilter1 = new InDimFilter("dim", ImmutableList.of("v1", "v2", "v3"), null);
       final InDimFilter dimFilter2 = new InDimFilter("dim", ImmutableList.of("v3", "v2", "v1"), null);
       byte[] cacheKey1 = dimFilter1.getCacheKey();
       byte[] cacheKey2 = dimFilter2.getCacheKey();
       Assert.assertEquals(Arrays.toString(cacheKey1), Arrays.toString(cacheKey2));
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420344393



##########
File path: processing/src/main/java/org/apache/druid/query/filter/AndDimFilter.java
##########
@@ -72,8 +73,20 @@ public AndDimFilter(DimFilter... fields)
   @Override
   public DimFilter optimize()
   {
-    List<DimFilter> elements = DimFilters.optimize(fields);
-    return elements.size() == 1 ? elements.get(0) : new AndDimFilter(elements);
+    List<DimFilter> elements = DimFilters.optimize(fields)
+                                         .stream()
+                                         .filter(filter -> !(filter instanceof TrueDimFilter))
+                                         .collect(Collectors.toList());
+    if (elements.isEmpty()) {
+      // All elements were TrueDimFilter after optimization
+      return TrueDimFilter.instance();
+    } else if (elements.size() == 1) {
+      return elements.get(0);

Review comment:
       The use of this and `OrDimFilter` are prevalent in unit tests so I believe they are being tested somewhere (I didn't change the logic of the lines you mentioned). But I can add some more tests.

##########
File path: processing/src/main/java/org/apache/druid/query/filter/FalseDimFilter.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.google.common.collect.ImmutableRangeSet;
+import com.google.common.collect.RangeSet;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.filter.FalseFilter;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Set;
+
+public class FalseDimFilter implements DimFilter
+{
+  private static final FalseDimFilter INSTANCE = new FalseDimFilter();
+
+  @JsonCreator
+  public static FalseDimFilter instance()
+  {
+    return INSTANCE;
+  }
+
+  private FalseDimFilter()
+  {
+  }
+
+  @Override
+  public DimFilter optimize()
+  {
+    return this;
+  }
+
+  @Override
+  public Filter toFilter()
+  {
+    return FalseFilter.instance();
+  }
+
+  @Nullable
+  @Override
+  public RangeSet<String> getDimensionRangeSet(String dimension)
+  {
+    return ImmutableRangeSet.of();
+  }
+
+  @Override
+  public Set<String> getRequiredColumns()
+  {
+    return Collections.emptySet();
+  }
+
+  @Override
+  public byte[] getCacheKey()
+  {
+    return new CacheKeyBuilder(DimFilterUtils.FALSE_CACHE_ID).build();

Review comment:
       Good point. Will add it.

##########
File path: processing/src/main/java/org/apache/druid/query/filter/FalseDimFilter.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.google.common.collect.ImmutableRangeSet;
+import com.google.common.collect.RangeSet;
+import org.apache.druid.query.cache.CacheKeyBuilder;
+import org.apache.druid.segment.filter.FalseFilter;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Set;
+
+public class FalseDimFilter implements DimFilter
+{
+  private static final FalseDimFilter INSTANCE = new FalseDimFilter();
+
+  @JsonCreator
+  public static FalseDimFilter instance()
+  {
+    return INSTANCE;
+  }
+
+  private FalseDimFilter()
+  {
+  }
+
+  @Override
+  public DimFilter optimize()
+  {
+    return this;
+  }
+
+  @Override
+  public Filter toFilter()
+  {
+    return FalseFilter.instance();
+  }
+
+  @Nullable
+  @Override
+  public RangeSet<String> getDimensionRangeSet(String dimension)
+  {
+    return ImmutableRangeSet.of();
+  }
+
+  @Override
+  public Set<String> getRequiredColumns()
+  {
+    return Collections.emptySet();
+  }
+
+  @Override
+  public byte[] getCacheKey()
+  {
+    return new CacheKeyBuilder(DimFilterUtils.FALSE_CACHE_ID).build();
+  }
+
+  @Override
+  public int hashCode()
+  {
+    return DimFilterUtils.FALSE_CACHE_ID;
+  }
+
+  @Override
+  public boolean equals(Object o)
+  {
+    return o == this;
+  }

Review comment:
       👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r421006298



##########
File path: processing/src/main/java/org/apache/druid/query/filter/vector/FalseVectorMatcher.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter.vector;
+
+import org.apache.druid.segment.vector.VectorSizeInspector;
+
+public class FalseVectorMatcher implements VectorValueMatcher
+{
+  private final VectorSizeInspector vectorSizeInspector;
+
+  public FalseVectorMatcher(VectorSizeInspector vectorSizeInspector)
+  {
+    this.vectorSizeInspector = vectorSizeInspector;
+  }
+
+  @Override
+  public ReadableVectorMatch match(ReadableVectorMatch mask)
+  {
+    return VectorMatch.allFalse();
+  }
+
+  @Override
+  public int getMaxVectorSize()
+  {
+    return vectorSizeInspector.getMaxVectorSize();
+  }
+
+  @Override
+  public int getCurrentVectorSize()
+  {
+    return vectorSizeInspector.getCurrentVectorSize();
+  }

Review comment:
       The max vector size is the maximum number of values in a column you can read per iteration. The current vector size can be different depending on the remaining number of values to read.
   
   The max and current vector sizes are irrelevant to the size of backing array in `VectorMatcher`, although the array usually has the same size with the max vector size. The allFalse vectorMatch has an empty array because it always returns false no matter what value you evaluate in a vector.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r419604835



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -84,9 +90,11 @@ public InDimFilter(
     Preconditions.checkNotNull(dimension, "dimension can not be null");
     Preconditions.checkArgument(values != null, "values can not be null");
 
-    this.values = new TreeSet<>(Comparators.naturalNullsFirst());
-    for (String value : values) {
-      this.values.add(NullHandling.emptyToNullIfNeeded(value));
+    // The values set can be huge. Try to avoid copying the set if possible.
+    if (values instanceof Set && values.stream().noneMatch(NullHandling::needsEmptyToNull)) {

Review comment:
       Good point. Added a check for it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r419613487



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +141,43 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
-      }
+    if (cacheKey == null) {
+      final boolean hasNull = values.stream().anyMatch(Objects::isNull);

Review comment:
       Hmm, this value doesn't seem necessary. Removed and added a unit test for it




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r421011554



##########
File path: processing/src/main/java/org/apache/druid/query/filter/NotDimFilter.java
##########
@@ -60,10 +60,17 @@ public DimFilter getField()
     return ByteBuffer.allocate(1 + subKey.length).put(DimFilterUtils.NOT_CACHE_ID).put(subKey).array();
   }
 
+  @SuppressWarnings("ObjectEquality")

Review comment:
       This suppression is only correct as long as TrueDimFilter and FalseDimFilter are singletons. If in the future that assumption changes, this will break, and nothing will notify us of this incorrect behavior if we don't catch it in code review. That's usually why I try to limit any suppressions, because it takes a lot of reasoning power to justify why the suppression was made.
   
   Anyways, I'm ok with this as currently written.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420397046



##########
File path: processing/src/main/java/org/apache/druid/query/filter/vector/FalseVectorMatcher.java
##########
@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.filter.vector;
+
+import org.apache.druid.segment.vector.VectorSizeInspector;
+
+public class FalseVectorMatcher implements VectorValueMatcher
+{
+  private final VectorSizeInspector vectorSizeInspector;
+
+  public FalseVectorMatcher(VectorSizeInspector vectorSizeInspector)
+  {
+    this.vectorSizeInspector = vectorSizeInspector;
+  }
+
+  @Override
+  public ReadableVectorMatch match(ReadableVectorMatch mask)
+  {
+    return VectorMatch.allFalse();
+  }
+
+  @Override
+  public int getMaxVectorSize()
+  {
+    return vectorSizeInspector.getMaxVectorSize();
+  }
+
+  @Override
+  public int getCurrentVectorSize()
+  {
+    return vectorSizeInspector.getCurrentVectorSize();
+  }

Review comment:
       Should the max size / current size of these be `1` since `VectorMatch.allFalse()` is an empty array?
   
   Not really sure how the vector sizes are used

##########
File path: processing/src/main/java/org/apache/druid/query/filter/NotDimFilter.java
##########
@@ -60,10 +60,17 @@ public DimFilter getField()
     return ByteBuffer.allocate(1 + subKey.length).put(DimFilterUtils.NOT_CACHE_ID).put(subKey).array();
   }
 
+  @SuppressWarnings("ObjectEquality")

Review comment:
       Why not use the recommended resolution
   
   `FalseDimFilter.instance().equals(optimized)`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420345950



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -132,31 +145,40 @@ public FilterTuning getFilterTuning()
   @Override
   public byte[] getCacheKey()
   {
-    boolean hasNull = false;
-    for (String value : values) {
-      if (value == null) {
-        hasNull = true;
-        break;
+    if (cacheKey == null) {
+      final List<String> sortedValues = new ArrayList<>(values);
+      sortedValues.sort(Comparator.nullsFirst(Ordering.natural()));
+      final Hasher hasher = Hashing.sha256().newHasher();

Review comment:
       Oh sorry, didn't see your last comment. 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #9800: Avoid sorting values in InDimFilter if possible

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #9800:
URL: https://github.com/apache/druid/pull/9800#discussion_r420345225



##########
File path: processing/src/main/java/org/apache/druid/query/filter/InDimFilter.java
##########
@@ -73,9 +77,14 @@
   private final Supplier<DruidFloatPredicate> floatPredicateSupplier;
   private final Supplier<DruidDoublePredicate> doublePredicateSupplier;
 
+  @JsonIgnore
+  private byte[] cacheKey;
+
   @JsonCreator
   public InDimFilter(
       @JsonProperty("dimension") String dimension,
+      // This 'values' collection instance can be reused if possible to avoid copying a big collection.
+      // Callers should _not_ modify the collection after it is passed to this constructor.
       @JsonProperty("values") Collection<String> values,

Review comment:
       To be honest, I was lazy and haven't checked how many callers call this constructor before. It seems small enough, I guess I can.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org