You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/01/24 09:19:24 UTC

[GitHub] [druid] clintropolis opened a new pull request #9251: fix issue with long column predicate matcher based filters and nulls

clintropolis opened a new pull request #9251: fix issue with long column predicate matcher based filters and nulls
URL: https://github.com/apache/druid/pull/9251
 
 
   ### Description
   This PR fixes an issue with predicate filters, such as a bound filter, on long columns with numeric nulls, where it will incorrectly compare `0` instead of calling `predicate.applyNull()`. The correct code is actually in place already in `LongValueMatcherColumnSelectorStrategy`, but this was getting pre-empted by `Filters.makeValueMatcher` which had it's own predicate defined specifically handling long columns, along with a code comment:
   
   ```
   // This should be folded into the ValueMatcherColumnSelectorStrategy once that can handle LONG typed columns.
   ```
   Unfortunately, it just hadn't been done yet.
   
   Prior to this PR, the added `testLongPredicateFilterNulls` in `CalciteQueryTest` would fail with an error of the form:
   
   ```
   java.lang.AssertionError
   	at org.apache.druid.segment.data.ColumnarLongs$1HistoricalLongColumnSelectorWithNulls.getLong(ColumnarLongs.java:124)
   	at org.apache.druid.segment.filter.Filters$4.matches(Filters.java:467)
   	at org.apache.druid.segment.FilteredOffset.increment(FilteredOffset.java:75)
   	at org.apache.druid.segment.QueryableIndexCursorSequenceBuilder$QueryableIndexCursor.advance(QueryableIndexCursorSequenceBuilder.java:417)
   	at org.apache.druid.query.timeseries.TimeseriesQueryEngine.lambda$processNonVectorized$2(TimeseriesQueryEngine.java:288)
   ```
   
   The other added tests pass before this PR, just added for additional coverage for funsies.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added unit tests or modified existing tests to cover new code paths.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on issue #9251: fix some issues with filters on numeric columns with nulls

Posted by GitBox <gi...@apache.org>.
clintropolis commented on issue #9251: fix some issues with filters on numeric columns with nulls
URL: https://github.com/apache/druid/pull/9251#issuecomment-579009125
 
 
   >Another comment: did you consider removing the default impl of applyNull from the DruidXPredicate interfaces? It seems like it being missing is just asking for null handling bugs.
   
   Yes, and I agree, though since I was planning to look at expression filters and the others as a follow-up I was considering making this change later. Should I just do it all now?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9251: fix issue with long column predicate matcher based filters and nulls

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9251: fix issue with long column predicate matcher based filters and nulls
URL: https://github.com/apache/druid/pull/9251#discussion_r370892848
 
 

 ##########
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##########
 @@ -3198,6 +3198,69 @@ public void testNullLongTopN() throws Exception
     );
   }
 
+  @Test
+  public void testLongPredicateFilterNulls() throws Exception
+  {
+    testQuery(
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE l1 > 3",
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE3)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .granularity(Granularities.ALL)
+                  .filters(bound("l1", "3", null, true, false, null, StringComparators.NUMERIC))
 
 Review comment:
   How about moving these three tests to `BoundFilterTest`? I suggest this for two reasons:
   
   1) This test isn't adding much to coverage of the SQL layer.
   2) BoundFilterTest extends from BaseFilterTest which tests a whole big matrix of configurations, so coverage of the lower level stuff will be better vs. using CalciteQueryTest.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on issue #9251: fix some issues with filters on numeric columns with nulls

Posted by GitBox <gi...@apache.org>.
gianm commented on issue #9251: fix some issues with filters on numeric columns with nulls
URL: https://github.com/apache/druid/pull/9251#issuecomment-579012436
 
 
   > Yes, and I agree, though since I was planning to look at expression filters and the others as a follow-up I was considering making this change later. Should I just do it all now?
   
   I think doing those as a follow-up is OK.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls
URL: https://github.com/apache/druid/pull/9251#discussion_r371536844
 
 

 ##########
 File path: processing/src/test/java/org/apache/druid/segment/filter/BaseFilterTest.java
 ##########
 @@ -24,8 +24,20 @@
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Iterables;
+import junitparams.converters.Nullable;
 
 Review comment:
   Did you mean `javax.annotation.Nullable`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls
URL: https://github.com/apache/druid/pull/9251#discussion_r371543207
 
 

 ##########
 File path: processing/src/test/java/org/apache/druid/segment/filter/BaseFilterTest.java
 ##########
 @@ -24,8 +24,20 @@
 import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Iterables;
+import junitparams.converters.Nullable;
 
 Review comment:
   Heh, oops, gg intellij

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls
URL: https://github.com/apache/druid/pull/9251#discussion_r371536665
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/vector/VectorSelectorUtils.java
 ##########
 @@ -71,7 +72,8 @@
         final int row = currentOffsets[i];
         nullIterator.advanceIfNeeded(row);
         if (!nullIterator.hasNext()) {
-          break;
 
 Review comment:
   Same comment.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9251: fix issue with long column predicate matcher based filters and nulls

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9251: fix issue with long column predicate matcher based filters and nulls
URL: https://github.com/apache/druid/pull/9251#discussion_r371168263
 
 

 ##########
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##########
 @@ -3198,6 +3198,69 @@ public void testNullLongTopN() throws Exception
     );
   }
 
+  @Test
+  public void testLongPredicateFilterNulls() throws Exception
+  {
+    testQuery(
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE l1 > 3",
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE3)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .granularity(Granularities.ALL)
+                  .filters(bound("l1", "3", null, true, false, null, StringComparators.NUMERIC))
 
 Review comment:
   I did some refactoring around `BaseFilterTest` to add in numeric columns with null values, and bumped into some additional issues. First one in the `JavascriptDimFilter`, whose numeric predicates did not implement `applyNull`, meaning you couldn't specifically filter for the null value. The 'in' filter I believe also has this problem, assuming that something like `.. x IN (1, 2, NULL)` is valid, but is more involved to fix so I didn't do it in this PR (assuming it needs fixed?).
   
   I also ran into some issues with vectorized value and predicate matchers on numeric null columns. The initial issue is that the matchers were not checking the null bitmap, however once that was in place I bumped into another issue that the null vector could be incorrectly polluted with the wrong null information in the case where the null bitmap ran out of values before the end of the column (likely) because it was breaking from the loop instead of writing false values until the end vector offset. 
   
   I added tests for all of these cases.
   
   I did not dig into the expression filter for fear of what I might run into, but will investigate this as a follow-up to this PR, along with any remaining filters which I did not address.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm merged pull request #9251: fix some issues with filters on numeric columns with nulls

Posted by GitBox <gi...@apache.org>.
gianm merged pull request #9251: fix some issues with filters on numeric columns with nulls
URL: https://github.com/apache/druid/pull/9251
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls
URL: https://github.com/apache/druid/pull/9251#discussion_r371543191
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/vector/VectorSelectorUtils.java
 ##########
 @@ -57,7 +57,8 @@
         final int row = i + startOffset;
         nullIterator.advanceIfNeeded(row);
         if (!nullIterator.hasNext()) {
-          break;
 
 Review comment:
   :+1: will change

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on a change in pull request #9251: fix issue with long column predicate matcher based filters and nulls

Posted by GitBox <gi...@apache.org>.
clintropolis commented on a change in pull request #9251: fix issue with long column predicate matcher based filters and nulls
URL: https://github.com/apache/druid/pull/9251#discussion_r370893433
 
 

 ##########
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##########
 @@ -3198,6 +3198,69 @@ public void testNullLongTopN() throws Exception
     );
   }
 
+  @Test
+  public void testLongPredicateFilterNulls() throws Exception
+  {
+    testQuery(
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE l1 > 3",
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE3)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .granularity(Granularities.ALL)
+                  .filters(bound("l1", "3", null, true, false, null, StringComparators.NUMERIC))
 
 Review comment:
   Yeah, I'm working on adding some tests here to get the extra coverage

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9251: fix some issues with filters on numeric columns with nulls
URL: https://github.com/apache/druid/pull/9251#discussion_r371536631
 
 

 ##########
 File path: processing/src/main/java/org/apache/druid/segment/vector/VectorSelectorUtils.java
 ##########
 @@ -57,7 +57,8 @@
         final int row = i + startOffset;
         nullIterator.advanceIfNeeded(row);
         if (!nullIterator.hasNext()) {
-          break;
 
 Review comment:
   How about using `Arrays.fill` to fill the rest with falses, and then break? Should be somewhat more efficient.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on a change in pull request #9251: fix issue with long column predicate matcher based filters and nulls

Posted by GitBox <gi...@apache.org>.
gianm commented on a change in pull request #9251: fix issue with long column predicate matcher based filters and nulls
URL: https://github.com/apache/druid/pull/9251#discussion_r370894982
 
 

 ##########
 File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
 ##########
 @@ -3198,6 +3198,69 @@ public void testNullLongTopN() throws Exception
     );
   }
 
+  @Test
+  public void testLongPredicateFilterNulls() throws Exception
+  {
+    testQuery(
+        "SELECT COUNT(*)\n"
+        + "FROM druid.numfoo\n"
+        + "WHERE l1 > 3",
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(CalciteTests.DATASOURCE3)
+                  .intervals(querySegmentSpec(Filtration.eternity()))
+                  .granularity(Granularities.ALL)
+                  .filters(bound("l1", "3", null, true, false, null, StringComparators.NUMERIC))
 
 Review comment:
   Rad.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org