You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/04/22 10:57:52 UTC

[GitHub] [druid] rohangarg opened a new pull request, #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

rohangarg opened a new pull request, #12472:
URL: https://github.com/apache/druid/pull/12472

   @suneet-s pointed that simple queries like `select max(__time) from ds` run as a timeseries query, which is unnecessary. These queries can be converted to timeBoundary queries which take advantage of the sorted nature of time in a segment are optimized for such processing. This change converts some of the simple SQL queries to timeBoundary queries.
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
rohangarg commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r857096832


##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryQueryToolChest.java:
##########
@@ -224,4 +226,40 @@ public Result<TimeBoundaryResultValue> apply(Object input)
       }
     };
   }
+
+  @Override
+  public RowSignature resultArraySignature(TimeBoundaryQuery query)
+  {
+    if (query.isMinTime() || query.isMaxTime()) {
+      RowSignature.Builder builder = RowSignature.builder();
+      String outputName = query.isMinTime() ?
+                          query.getContextValue(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MIN_TIME) :
+                          query.getContextValue(TimeBoundaryQuery.MAX_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MAX_TIME);
+      return builder.add(outputName, ColumnType.LONG).build();
+    }
+    return super.resultArraySignature(query);

Review Comment:
   updated the PR desc with the min-max query limitation



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java:
##########
@@ -822,6 +832,69 @@ private Query computeQuery(final QueryFeatureInspector queryFeatureInspector)
     throw new CannotBuildQueryException("Cannot convert query parts into an actual query");
   }
 
+  /**
+   * Return this query as a TimeBoundary query, or null if this query is not compatible with Timeseries.
+   *
+   * @return query

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s merged pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s merged PR #12472:
URL: https://github.com/apache/druid/pull/12472


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r856349296


##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryQueryToolChest.java:
##########
@@ -224,4 +226,40 @@ public Result<TimeBoundaryResultValue> apply(Object input)
       }
     };
   }
+
+  @Override
+  public RowSignature resultArraySignature(TimeBoundaryQuery query)
+  {
+    if (query.isMinTime() || query.isMaxTime()) {
+      RowSignature.Builder builder = RowSignature.builder();
+      String outputName = query.isMinTime() ?
+                          query.getContextValue(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MIN_TIME) :
+                          query.getContextValue(TimeBoundaryQuery.MAX_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MAX_TIME);
+      return builder.add(outputName, ColumnType.LONG).build();
+    }
+    return super.resultArraySignature(query);

Review Comment:
   Should we throw a more meaningful UOE here instead of relying on the method in the super class?
   ```suggestion
       throw new UOE("TimeBoundaryQuery with unsupported bounds [%s]", query.getBound());
   ```
   
   similar comment for the method below



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
rohangarg commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r857096949


##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteTimeBoundaryQueryTest.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.java.util.common.Intervals;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.aggregation.LongMaxAggregatorFactory;
+import org.apache.druid.query.aggregation.LongMinAggregatorFactory;
+import org.apache.druid.query.spec.MultipleIntervalSegmentSpec;
+import org.apache.druid.query.timeboundary.TimeBoundaryQuery;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.junit.Test;
+
+import java.util.HashMap;
+
+public class CalciteTimeBoundaryQueryTest extends BaseCalciteQueryTest
+{
+  // __time for foo is [2000-01-01, 2000-01-02, 2000-01-03, 2001-01-01, 2001-01-02, 2001-01-03]
+  @Test
+  public void testMaxTimeQuery() throws Exception
+  {
+    HashMap<String, Object> context = new HashMap<>(QUERY_CONTEXT_DEFAULT);
+    context.put(TimeBoundaryQuery.MAX_TIME_ARRAY_OUTPUT_NAME, "a0");
+    testQuery(
+        "SELECT MAX(__time) AS maxTime FROM foo",
+        ImmutableList.of(
+            Druids.newTimeBoundaryQueryBuilder()
+                  .dataSource("foo")
+                  .bound(TimeBoundaryQuery.MAX_TIME)
+                  .context(context)
+                  .build()
+        ),
+        ImmutableList.of(new Object[]{DateTimes.of("2001-01-03").getMillis()})
+    );
+  }
+
+  @Test
+  public void testMinTimeQuery() throws Exception
+  {
+    HashMap<String, Object> context = new HashMap<>(QUERY_CONTEXT_DEFAULT);
+    context.put(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, "a0");
+    testQuery(
+        "SELECT MIN(__time) AS minTime FROM foo",
+        ImmutableList.of(
+            Druids.newTimeBoundaryQueryBuilder()
+                  .dataSource("foo")
+                  .bound(TimeBoundaryQuery.MIN_TIME)
+                  .context(context)
+                  .build()
+        ),
+        ImmutableList.of(new Object[]{DateTimes.of("2000-01-01").getMillis()})
+    );
+  }
+
+  @Test
+  public void testMinTimeQueryWithFilters() throws Exception
+  {
+    HashMap<String, Object> context = new HashMap<>(QUERY_CONTEXT_DEFAULT);
+    context.put(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, "a0");
+    testQuery(
+        "SELECT MIN(__time) AS minTime FROM foo where __time >= '2001-01-01' and __time < '2003-01-01'",
+        ImmutableList.of(
+            Druids.newTimeBoundaryQueryBuilder()
+                  .dataSource("foo")
+                  .intervals(
+                      new MultipleIntervalSegmentSpec(
+                          ImmutableList.of(Intervals.of("2001-01-01T00:00:00.000Z/2003-01-01T00:00:00.000Z"))
+                      )
+                  )
+                  .bound(TimeBoundaryQuery.MIN_TIME)
+                  .context(context)
+                  .build()
+        ),
+        ImmutableList.of(new Object[]{DateTimes.of("2001-01-01").getMillis()})
+    );
+  }
+
+  // this test is to maintain that if both min(__time) and max(__time) are present,
+  // we currently don't convert that query to timeBoundary

Review Comment:
   updated the comment, created the issue and linked it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
rohangarg commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r857097631


##########
processing/src/test/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryRunnerTest.java:
##########
@@ -216,6 +247,22 @@ public void testTimeBoundary()
     Assert.assertEquals(DateTimes.of("2011-04-15T00:00:00.000Z"), maxTime);
   }
 
+  @Test(expected = UOE.class)
+  @SuppressWarnings("unchecked")
+  public void testTimeBoundaryArrayResults()
+  {
+    TimeBoundaryQuery timeBoundaryQuery = Druids.newTimeBoundaryQueryBuilder()
+                                                   .dataSource("testing")
+                                                   .bound(null)
+                                                   .build();
+    ResponseContext context = ConcurrentResponseContext.createEmpty();
+    context.initializeMissingSegments();
+    new TimeBoundaryQueryQueryToolChest().resultsAsArrays(
+        timeBoundaryQuery,
+        runner.run(QueryPlus.wrap(timeBoundaryQuery), context)
+    ).toList();
+  }
+

Review Comment:
   Yes, I had planned to add these tests in `TimeBoundaryQueryQueryToolChestTest` but it didn't contain any setup for runners or test data. `TimeBoundaryQueryRunnerTest` contained them so added tests in it.
   One way to move these to toolchest tests could be to copy some of the runner and data creation code in toolchest tests too. Or create `TimeBoundaryQueryRunnerTest` objects in toolchest tests to use the setup code. wdyt? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r856359477


##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidQuery.java:
##########
@@ -822,6 +832,69 @@ private Query computeQuery(final QueryFeatureInspector queryFeatureInspector)
     throw new CannotBuildQueryException("Cannot convert query parts into an actual query");
   }
 
+  /**
+   * Return this query as a TimeBoundary query, or null if this query is not compatible with Timeseries.
+   *
+   * @return query

Review Comment:
   ```suggestion
      * @return a TimeBoundaryQuery if possible. null if it is not possible to construct one.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
rohangarg commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r857106769


##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryRunnerFactory.java:
##########
@@ -155,7 +156,8 @@ public Iterator<Result<TimeBoundaryResultValue>> make()
               final DateTime minTime;
               final DateTime maxTime;
 
-              if (legacyQuery.getFilter() != null) {
+              Interval queryInterval = legacyQuery.getQuerySegmentSpec().getIntervals().get(0);

Review Comment:
   I've improved the code for checking the containment of the adapter interval. Does it look ok now? Now, it always expects exactly 1 interval as other engines do. But still empty intervals pass in timeBoundary because the broker code doesn't generate query runners incase empty intervals are given.
   For null intervals (which seems like a valid case for timeBoundary), the query object itself creates an eternity interval.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
rohangarg commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r857379194


##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryRunnerFactory.java:
##########
@@ -155,7 +157,7 @@ public Iterator<Result<TimeBoundaryResultValue>> make()
               final DateTime minTime;
               final DateTime maxTime;
 
-              if (legacyQuery.getFilter() != null) {
+              if (legacyQuery.getFilter() != null || !queryIntervalContainsAdapterInterval()) {

Review Comment:
   There could still be some overlap between the intervals which could mean that minTime and maxTime could exist. For those cases, we'd need `getTimeBoundary` to look through the cursor and find the appropriate rows.
   If there is now row with non-null time in that range, then yes as you mentioned we'd be returning null minTime and maxTime



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s commented on PR #12472:
URL: https://github.com/apache/druid/pull/12472#issuecomment-1108712702

   Here's a set of tests I ran on top of this PR after ingesting the sample wikipedia data with hour granularity.
   
   Thanks for this enhancement @rohangarg !
   
   ```
   --Time buckets
   -- SELECT
   --   TIME_FLOOR(__time, 'PT1H') AS "Time",
   --   COUNT(*) AS "Count"
   -- FROM "wiki-hour"
   -- GROUP BY 1
   -- ORDER BY 1 ASC
   
   -- Time filters (TESTS - aggregation, filter, results) 
   -- MAX  t > 2016-06-27 22:00:00                          | <no data>
   -- MAX  t < 2016-06-27 00:00:00                          | <no data>
   -- MAX  2016-06-27 02:12:00 < t < 2016-06-27 02:30:00    | 2016-06-27T02:29:59.744Z
   -- MAX  2016-06-27 20:30:00 < t < 2016-06-27 21:10:00    | 2016-06-27T21:09:59.379Z
   -- MAX  2016-06-27 21:30:00 < t < 2016-06-28 00:10:00    | 2016-06-27T21:31:02.498Z
   -- MAX                                                   | 2016-06-27T21:31:02.498Z
   
   SELECT MAX(__time) from "wiki-hour" WHERE __time >  TIMESTAMP '
   ```
   
   My last concern with this is that the results are now different from what they used to be if the query interval doesn't match. Returning no results sounds like a better response than returning -ve and +ve infinity when the query interval doesn't match. But I will mark this PR as `Incompatible` and `Release Notes` so the release manager can decide if they disagree with this assessment. Query users will need to handle both -ve/+ve infinity as well as null as expected results from this query which isn't a great experience, but is a step in the right direction.
   
   <img width="1418" alt="Screen Shot 2022-04-25 at 8 14 29 AM" src="https://user-images.githubusercontent.com/44787917/165119122-8a182da5-a618-43a2-9c8b-610c8f8ca380.png">
   2016-06-27 22:00:00'


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r857242874


##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryRunnerFactory.java:
##########
@@ -155,7 +157,7 @@ public Iterator<Result<TimeBoundaryResultValue>> make()
               final DateTime minTime;
               final DateTime maxTime;
 
-              if (legacyQuery.getFilter() != null) {
+              if (legacyQuery.getFilter() != null || !queryIntervalContainsAdapterInterval()) {

Review Comment:
   If the query interval does not contain the adapter interval - should we just set minTime and maxTime to null? 
   
   I think that's what is happening here when we call `getTimeBoundary` on line 161 except it looks like we iterate through the cursor before finding no matches. It probably isn't too big of a deal since the TImeBoundaryQuery doesn't operate on every segment (at least if my understanding is correct).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r857246712


##########
processing/src/test/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryRunnerTest.java:
##########
@@ -216,6 +247,22 @@ public void testTimeBoundary()
     Assert.assertEquals(DateTimes.of("2011-04-15T00:00:00.000Z"), maxTime);
   }
 
+  @Test(expected = UOE.class)
+  @SuppressWarnings("unchecked")
+  public void testTimeBoundaryArrayResults()
+  {
+    TimeBoundaryQuery timeBoundaryQuery = Druids.newTimeBoundaryQueryBuilder()
+                                                   .dataSource("testing")
+                                                   .bound(null)
+                                                   .build();
+    ResponseContext context = ConcurrentResponseContext.createEmpty();
+    context.initializeMissingSegments();
+    new TimeBoundaryQueryQueryToolChest().resultsAsArrays(
+        timeBoundaryQuery,
+        runner.run(QueryPlus.wrap(timeBoundaryQuery), context)
+    ).toList();
+  }
+

Review Comment:
   I haven't thought much about this specific example, but here are my 2 cents
   
   > Or create TimeBoundaryQueryRunnerTest objects in toolchest tests to use the setup code
   
   This seems a little bit like an anti-pattern,  so I'd stay away from it.
   
   > One way to move these to toolchest tests could be to copy some of the runner and data creation code in toolchest tests too.
   
   Usually, the pattern I've seen in instances like this is to move this common setup logic into some shared class, and then use that class in both test classes. So similar to this suggestion, but not copying the code over directly.
   
   In this case the refactoring seems tedious, and I don't think needs to be done as part of this change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r856352014


##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryQueryToolChest.java:
##########
@@ -224,4 +226,40 @@ public Result<TimeBoundaryResultValue> apply(Object input)
       }
     };
   }
+
+  @Override
+  public RowSignature resultArraySignature(TimeBoundaryQuery query)
+  {
+    if (query.isMinTime() || query.isMaxTime()) {
+      RowSignature.Builder builder = RowSignature.builder();
+      String outputName = query.isMinTime() ?
+                          query.getContextValue(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MIN_TIME) :
+                          query.getContextValue(TimeBoundaryQuery.MAX_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MAX_TIME);
+      return builder.add(outputName, ColumnType.LONG).build();
+    }
+    return super.resultArraySignature(query);

Review Comment:
   Actually, reading the implementation of `TimeBoundaryQuery#mergeResults`, it looks like `bound` being set to something other than min or max is supported and it is handled by returning both min and max for the result. I think we should do something similar here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r856371135


##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteTimeBoundaryQueryTest.java:
##########
@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.sql.calcite;
+
+import com.google.common.collect.ImmutableList;
+import org.apache.druid.java.util.common.DateTimes;
+import org.apache.druid.java.util.common.Intervals;
+import org.apache.druid.query.Druids;
+import org.apache.druid.query.aggregation.LongMaxAggregatorFactory;
+import org.apache.druid.query.aggregation.LongMinAggregatorFactory;
+import org.apache.druid.query.spec.MultipleIntervalSegmentSpec;
+import org.apache.druid.query.timeboundary.TimeBoundaryQuery;
+import org.apache.druid.sql.calcite.filtration.Filtration;
+import org.junit.Test;
+
+import java.util.HashMap;
+
+public class CalciteTimeBoundaryQueryTest extends BaseCalciteQueryTest
+{
+  // __time for foo is [2000-01-01, 2000-01-02, 2000-01-03, 2001-01-01, 2001-01-02, 2001-01-03]
+  @Test
+  public void testMaxTimeQuery() throws Exception
+  {
+    HashMap<String, Object> context = new HashMap<>(QUERY_CONTEXT_DEFAULT);
+    context.put(TimeBoundaryQuery.MAX_TIME_ARRAY_OUTPUT_NAME, "a0");
+    testQuery(
+        "SELECT MAX(__time) AS maxTime FROM foo",
+        ImmutableList.of(
+            Druids.newTimeBoundaryQueryBuilder()
+                  .dataSource("foo")
+                  .bound(TimeBoundaryQuery.MAX_TIME)
+                  .context(context)
+                  .build()
+        ),
+        ImmutableList.of(new Object[]{DateTimes.of("2001-01-03").getMillis()})
+    );
+  }
+
+  @Test
+  public void testMinTimeQuery() throws Exception
+  {
+    HashMap<String, Object> context = new HashMap<>(QUERY_CONTEXT_DEFAULT);
+    context.put(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, "a0");
+    testQuery(
+        "SELECT MIN(__time) AS minTime FROM foo",
+        ImmutableList.of(
+            Druids.newTimeBoundaryQueryBuilder()
+                  .dataSource("foo")
+                  .bound(TimeBoundaryQuery.MIN_TIME)
+                  .context(context)
+                  .build()
+        ),
+        ImmutableList.of(new Object[]{DateTimes.of("2000-01-01").getMillis()})
+    );
+  }
+
+  @Test
+  public void testMinTimeQueryWithFilters() throws Exception
+  {
+    HashMap<String, Object> context = new HashMap<>(QUERY_CONTEXT_DEFAULT);
+    context.put(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, "a0");
+    testQuery(
+        "SELECT MIN(__time) AS minTime FROM foo where __time >= '2001-01-01' and __time < '2003-01-01'",
+        ImmutableList.of(
+            Druids.newTimeBoundaryQueryBuilder()
+                  .dataSource("foo")
+                  .intervals(
+                      new MultipleIntervalSegmentSpec(
+                          ImmutableList.of(Intervals.of("2001-01-01T00:00:00.000Z/2003-01-01T00:00:00.000Z"))
+                      )
+                  )
+                  .bound(TimeBoundaryQuery.MIN_TIME)
+                  .context(context)
+                  .build()
+        ),
+        ImmutableList.of(new Object[]{DateTimes.of("2001-01-01").getMillis()})
+    );
+  }
+
+  // this test is to maintain that if both min(__time) and max(__time) are present,
+  // we currently don't convert that query to timeBoundary

Review Comment:
   ```suggestion
     // Currently, if both min(__time) and max(__time) are present, we don't convert it
     // to a timeBoundary query. This should be supported in a future change
   ```
   
   (bonus points if you can create a github issue and link it here) :) 



##########
processing/src/test/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryRunnerTest.java:
##########
@@ -216,6 +247,22 @@ public void testTimeBoundary()
     Assert.assertEquals(DateTimes.of("2011-04-15T00:00:00.000Z"), maxTime);
   }
 
+  @Test(expected = UOE.class)
+  @SuppressWarnings("unchecked")
+  public void testTimeBoundaryArrayResults()
+  {
+    TimeBoundaryQuery timeBoundaryQuery = Druids.newTimeBoundaryQueryBuilder()
+                                                   .dataSource("testing")
+                                                   .bound(null)
+                                                   .build();
+    ResponseContext context = ConcurrentResponseContext.createEmpty();
+    context.initializeMissingSegments();
+    new TimeBoundaryQueryQueryToolChest().resultsAsArrays(
+        timeBoundaryQuery,
+        runner.run(QueryPlus.wrap(timeBoundaryQuery), context)
+    ).toList();
+  }
+

Review Comment:
   Seems like these tests belong in `TimeBoundaryQueryQueryToolChestTest` instead of this class?
   * testTimeBoundaryArrayResults
   * testTimeBoundaryMaxArraysResults
   * testTimeBoundaryMinArraysResults



##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryQueryToolChest.java:
##########
@@ -224,4 +226,40 @@ public Result<TimeBoundaryResultValue> apply(Object input)
       }
     };
   }
+
+  @Override
+  public RowSignature resultArraySignature(TimeBoundaryQuery query)
+  {
+    if (query.isMinTime() || query.isMaxTime()) {
+      RowSignature.Builder builder = RowSignature.builder();
+      String outputName = query.isMinTime() ?
+                          query.getContextValue(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MIN_TIME) :
+                          query.getContextValue(TimeBoundaryQuery.MAX_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MAX_TIME);
+      return builder.add(outputName, ColumnType.LONG).build();
+    }
+    return super.resultArraySignature(query);

Review Comment:
   I see, this is still a substantial improvement for use cases that just need either min or max time. Let's call this out as a temporary limitation in the PR description, in case someone stumbles across it and decides to try and fix it.



##########
processing/src/test/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryQueryToolChestTest.java:
##########
@@ -289,6 +292,43 @@ public void testFilteredFilterSegments()
     Assert.assertEquals(7, segments.size());
   }
 
+  @Test(expected = UOE.class)
+  public void testResultArraySignature()
+  {
+    TimeBoundaryQuery timeBoundaryQuery = Druids.newTimeBoundaryQueryBuilder()
+                                                .dataSource("testing")
+                                                .build();
+    new TimeBoundaryQueryQueryToolChest().resultArraySignature(timeBoundaryQuery);
+  }
+
+  @Test
+  public void testResultArraySignatureWithMinTime()
+  {
+    TimeBoundaryQuery timeBoundaryQuery = Druids.newTimeBoundaryQueryBuilder()
+                                                .dataSource("testing")
+                                                .bound(TimeBoundaryQuery.MIN_TIME)
+                                                .context(ImmutableMap.of(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, "foo"))
+                                                .build();
+    RowSignature rowSignature = new TimeBoundaryQueryQueryToolChest().resultArraySignature(timeBoundaryQuery);
+    RowSignature.Builder expectedRowSignatureBuilder = RowSignature.builder();
+    expectedRowSignatureBuilder.add("foo", ColumnType.LONG);
+    Assert.assertEquals(expectedRowSignatureBuilder.build(), rowSignature);
+  }
+
+  @Test
+  public void testResultArraySignatureWithMaxTime()
+  {
+    TimeBoundaryQuery timeBoundaryQuery = Druids.newTimeBoundaryQueryBuilder()
+                                                .dataSource("testing")
+                                                .bound(TimeBoundaryQuery.MAX_TIME)
+                                                .context(ImmutableMap.of(TimeBoundaryQuery.MAX_TIME_ARRAY_OUTPUT_NAME, "foo"))
+                                                .build();
+    RowSignature rowSignature = new TimeBoundaryQueryQueryToolChest().resultArraySignature(timeBoundaryQuery);
+    RowSignature.Builder expectedRowSignatureBuilder = RowSignature.builder();
+    expectedRowSignatureBuilder.add("foo", ColumnType.LONG);
+    Assert.assertEquals(expectedRowSignatureBuilder.build(), rowSignature);
+  }
+

Review Comment:
   looks like we're missing some unit tests for `resultsAsArrays`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
rohangarg commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r856362843


##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryQueryToolChest.java:
##########
@@ -224,4 +226,40 @@ public Result<TimeBoundaryResultValue> apply(Object input)
       }
     };
   }
+
+  @Override
+  public RowSignature resultArraySignature(TimeBoundaryQuery query)
+  {
+    if (query.isMinTime() || query.isMaxTime()) {
+      RowSignature.Builder builder = RowSignature.builder();
+      String outputName = query.isMinTime() ?
+                          query.getContextValue(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MIN_TIME) :
+                          query.getContextValue(TimeBoundaryQuery.MAX_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MAX_TIME);
+      return builder.add(outputName, ColumnType.LONG).build();
+    }
+    return super.resultArraySignature(query);

Review Comment:
   Yes, I'll update the exception to something more meaningful.
   
   Regarding, the handling of min-max part. Yes, the native query returns both min-max if no bounds given. But in the first implementation, it was becoming a bit dirty to map the min-max results of a SQL query so decided to throw it for a start. Will take another stab to see if we can accommodate `SELECT min(__time), max(__time) from ds` and `SELECT max(__time), min(__time) from ds`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r856352014


##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryQueryToolChest.java:
##########
@@ -224,4 +226,40 @@ public Result<TimeBoundaryResultValue> apply(Object input)
       }
     };
   }
+
+  @Override
+  public RowSignature resultArraySignature(TimeBoundaryQuery query)
+  {
+    if (query.isMinTime() || query.isMaxTime()) {
+      RowSignature.Builder builder = RowSignature.builder();
+      String outputName = query.isMinTime() ?
+                          query.getContextValue(TimeBoundaryQuery.MIN_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MIN_TIME) :
+                          query.getContextValue(TimeBoundaryQuery.MAX_TIME_ARRAY_OUTPUT_NAME, TimeBoundaryQuery.MAX_TIME);
+      return builder.add(outputName, ColumnType.LONG).build();
+    }
+    return super.resultArraySignature(query);

Review Comment:
   Actually, reading the implementation of `TimeBoundaryQuery#mergeResults`, it looks like `bound` being set to something other than min or max is supported and it is handled by returning both min and max for the result.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a diff in pull request #12472: Convert simple min/max SQL queries on __time to timeBoundary queries

Posted by GitBox <gi...@apache.org>.
suneet-s commented on code in PR #12472:
URL: https://github.com/apache/druid/pull/12472#discussion_r856357386


##########
processing/src/main/java/org/apache/druid/query/timeboundary/TimeBoundaryQueryRunnerFactory.java:
##########
@@ -155,7 +156,8 @@ public Iterator<Result<TimeBoundaryResultValue>> make()
               final DateTime minTime;
               final DateTime maxTime;
 
-              if (legacyQuery.getFilter() != null) {
+              Interval queryInterval = legacyQuery.getQuerySegmentSpec().getIntervals().get(0);

Review Comment:
   Are we guaranteed that `getIntervals()` will always return a list with at least one element?
   
   There are no javadocs on `QuerySegmentSpec#getIntervals()` so I'm not sure if a future implementation may choose to return an empty intervals.
   
   NOTE to self: Why do we need this change?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org