You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/11/11 10:29:58 UTC

[GitHub] [druid] kfaraz commented on a change in pull request #11908: Improve the output of SQL explain message

kfaraz commented on a change in pull request #11908:
URL: https://github.com/apache/druid/pull/11908#discussion_r747380495



##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -384,6 +399,72 @@ private PlannerResult planExplanation(
     return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
   }
 
+  /**
+   * Construct a {@link PlannerResult} for an 'explain' query from a {@link RelNode}
+   */
+  private PlannerResult planDruidExplanation(
+      final DruidRel<?> rel
+  )
+  {
+    String resources = null;
+    String explanation = null;
+    try {
+      explanation = dumpDruidPlan(rel);
+      resources = jsonMapper.writeValueAsString(plannerContext.getResources());
+    }
+    catch (JsonProcessingException jpe) {
+      // this should never happen, we create the Resources here, not a user
+      log.error(jpe, "Encountered exception while serializing Resources for explain output");
+    }
+    final Supplier<Sequence<Object[]>> resultsSupplier = Suppliers.ofInstance(
+        Sequences.simple(ImmutableList.of(new Object[]{explanation, resources})));
+    return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
+  }
+
+  /**
+   * This method doesn't utilize the Calcite's internal {@link RelOptUtil#dumpPlan} since that tends to be verbose
+   * and not indicative of the native Druid Queries which will get executed
+   * This method assumes that the Planner has converted the RelNodes to DruidRels, and thereby we can implictly cast it
+   *
+   * @param rel Instance of the root {@link DruidRel} which is formed by running the planner transformations on it
+   * @return A string representing an array of native queries that correspond to the given SQL query, in JSON format
+   * @throws JsonProcessingException
+   */
+  public static String dumpDruidPlan(DruidRel<?> rel) throws JsonProcessingException
+  {
+    List<Map<String, Object>> nativeQueries;
+
+    // Only if rel is an instance of DruidUnionRel, do we run multiple native queries corresponding to single SQL query

Review comment:
       What about sub-selects?
   e.g. `SELECT * FROM abc WHERE col IN (SELECT col1 FROM xyz)`
   Does this also run as a single native query?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -384,6 +399,72 @@ private PlannerResult planExplanation(
     return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
   }
 
+  /**
+   * Construct a {@link PlannerResult} for an 'explain' query from a {@link RelNode}
+   */
+  private PlannerResult planDruidExplanation(
+      final DruidRel<?> rel
+  )
+  {
+    String resources = null;
+    String explanation = null;
+    try {
+      explanation = dumpDruidPlan(rel);
+      resources = jsonMapper.writeValueAsString(plannerContext.getResources());
+    }
+    catch (JsonProcessingException jpe) {
+      // this should never happen, we create the Resources here, not a user
+      log.error(jpe, "Encountered exception while serializing Resources for explain output");
+    }
+    final Supplier<Sequence<Object[]>> resultsSupplier = Suppliers.ofInstance(
+        Sequences.simple(ImmutableList.of(new Object[]{explanation, resources})));
+    return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
+  }
+
+  /**
+   * This method doesn't utilize the Calcite's internal {@link RelOptUtil#dumpPlan} since that tends to be verbose
+   * and not indicative of the native Druid Queries which will get executed
+   * This method assumes that the Planner has converted the RelNodes to DruidRels, and thereby we can implictly cast it
+   *
+   * @param rel Instance of the root {@link DruidRel} which is formed by running the planner transformations on it
+   * @return A string representing an array of native queries that correspond to the given SQL query, in JSON format
+   * @throws JsonProcessingException
+   */
+  public static String dumpDruidPlan(DruidRel<?> rel) throws JsonProcessingException

Review comment:
       I think it would be better if this method returned the List of native queries instead of serializing and then returning it.
   You are already doing serialization of the resources in method `planDruidExplanation`, it makes sense to move the serialization of the native queries as well to that method.
   
   Also rename this method accordingly.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -384,6 +399,72 @@ private PlannerResult planExplanation(
     return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
   }
 
+  /**
+   * Construct a {@link PlannerResult} for an 'explain' query from a {@link RelNode}
+   */
+  private PlannerResult planDruidExplanation(
+      final DruidRel<?> rel
+  )
+  {
+    String resources = null;
+    String explanation = null;
+    try {
+      explanation = dumpDruidPlan(rel);
+      resources = jsonMapper.writeValueAsString(plannerContext.getResources());
+    }
+    catch (JsonProcessingException jpe) {
+      // this should never happen, we create the Resources here, not a user
+      log.error(jpe, "Encountered exception while serializing Resources for explain output");
+    }
+    final Supplier<Sequence<Object[]>> resultsSupplier = Suppliers.ofInstance(
+        Sequences.simple(ImmutableList.of(new Object[]{explanation, resources})));
+    return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
+  }
+
+  /**
+   * This method doesn't utilize the Calcite's internal {@link RelOptUtil#dumpPlan} since that tends to be verbose
+   * and not indicative of the native Druid Queries which will get executed
+   * This method assumes that the Planner has converted the RelNodes to DruidRels, and thereby we can implictly cast it
+   *
+   * @param rel Instance of the root {@link DruidRel} which is formed by running the planner transformations on it
+   * @return A string representing an array of native queries that correspond to the given SQL query, in JSON format
+   * @throws JsonProcessingException
+   */
+  public static String dumpDruidPlan(DruidRel<?> rel) throws JsonProcessingException

Review comment:
       I see that the only difference is that to serialize the native queries,  you are using a different json mapper `rel.getQueryMaker().getJsonMapper()`.
   Can we not use the same `jsonMapper` as is being used in `planDruidExplanation`?

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -384,6 +399,72 @@ private PlannerResult planExplanation(
     return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
   }
 
+  /**
+   * Construct a {@link PlannerResult} for an 'explain' query from a {@link RelNode}
+   */
+  private PlannerResult planDruidExplanation(
+      final DruidRel<?> rel
+  )
+  {
+    String resources = null;
+    String explanation = null;
+    try {
+      explanation = dumpDruidPlan(rel);
+      resources = jsonMapper.writeValueAsString(plannerContext.getResources());
+    }
+    catch (JsonProcessingException jpe) {
+      // this should never happen, we create the Resources here, not a user
+      log.error(jpe, "Encountered exception while serializing Resources for explain output");
+    }
+    final Supplier<Sequence<Object[]>> resultsSupplier = Suppliers.ofInstance(
+        Sequences.simple(ImmutableList.of(new Object[]{explanation, resources})));
+    return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
+  }
+
+  /**
+   * This method doesn't utilize the Calcite's internal {@link RelOptUtil#dumpPlan} since that tends to be verbose
+   * and not indicative of the native Druid Queries which will get executed
+   * This method assumes that the Planner has converted the RelNodes to DruidRels, and thereby we can implictly cast it
+   *
+   * @param rel Instance of the root {@link DruidRel} which is formed by running the planner transformations on it
+   * @return A string representing an array of native queries that correspond to the given SQL query, in JSON format
+   * @throws JsonProcessingException
+   */
+  public static String dumpDruidPlan(DruidRel<?> rel) throws JsonProcessingException
+  {
+    List<Map<String, Object>> nativeQueries;
+
+    // Only if rel is an instance of DruidUnionRel, do we run multiple native queries corresponding to single SQL query
+    // Also, DruidUnionRel can only be a top level node, so we don't need to check for this condition in the subsequent
+    // child nodes
+    if (rel instanceof DruidUnionRel) {
+      nativeQueries = new ArrayList<>();
+      for (RelNode childRel : rel.getInputs()) {
+        DruidRel<?> druidChildRel = (DruidRel<?>) childRel;
+        if (childRel instanceof DruidUnionRel) { // Failsafe block, shouldn't be encountered
+          log.error("DruidUnionRel can only be the outermost Rel. This error shouldn't be encountered");
+          // TODO: Throw an error here
+        }
+        DruidQuery query1 = druidChildRel.toDruidQuery(false); // Finalize aggregations

Review comment:
       Nit: Rename this variable to `childQuery`.

##########
File path: sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidPlanner.java
##########
@@ -384,6 +399,72 @@ private PlannerResult planExplanation(
     return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
   }
 
+  /**
+   * Construct a {@link PlannerResult} for an 'explain' query from a {@link RelNode}
+   */
+  private PlannerResult planDruidExplanation(
+      final DruidRel<?> rel
+  )
+  {
+    String resources = null;
+    String explanation = null;
+    try {
+      explanation = dumpDruidPlan(rel);
+      resources = jsonMapper.writeValueAsString(plannerContext.getResources());
+    }
+    catch (JsonProcessingException jpe) {
+      // this should never happen, we create the Resources here, not a user
+      log.error(jpe, "Encountered exception while serializing Resources for explain output");
+    }
+    final Supplier<Sequence<Object[]>> resultsSupplier = Suppliers.ofInstance(
+        Sequences.simple(ImmutableList.of(new Object[]{explanation, resources})));
+    return new PlannerResult(resultsSupplier, getExplainStructType(rel.getCluster().getTypeFactory()));
+  }
+
+  /**
+   * This method doesn't utilize the Calcite's internal {@link RelOptUtil#dumpPlan} since that tends to be verbose
+   * and not indicative of the native Druid Queries which will get executed
+   * This method assumes that the Planner has converted the RelNodes to DruidRels, and thereby we can implictly cast it
+   *
+   * @param rel Instance of the root {@link DruidRel} which is formed by running the planner transformations on it
+   * @return A string representing an array of native queries that correspond to the given SQL query, in JSON format
+   * @throws JsonProcessingException
+   */
+  public static String dumpDruidPlan(DruidRel<?> rel) throws JsonProcessingException
+  {
+    List<Map<String, Object>> nativeQueries;
+
+    // Only if rel is an instance of DruidUnionRel, do we run multiple native queries corresponding to single SQL query
+    // Also, DruidUnionRel can only be a top level node, so we don't need to check for this condition in the subsequent
+    // child nodes
+    if (rel instanceof DruidUnionRel) {
+      nativeQueries = new ArrayList<>();
+      for (RelNode childRel : rel.getInputs()) {
+        DruidRel<?> druidChildRel = (DruidRel<?>) childRel;
+        if (childRel instanceof DruidUnionRel) { // Failsafe block, shouldn't be encountered
+          log.error("DruidUnionRel can only be the outermost Rel. This error shouldn't be encountered");
+          // TODO: Throw an error here
+        }
+        DruidQuery query1 = druidChildRel.toDruidQuery(false); // Finalize aggregations
+        nativeQueries.add(ImmutableMap.of(
+            "query",
+            query1.getQuery(),
+            "signature",

Review comment:
       I am not sure why we need the `signature` field. Is it also present in the existing EXPLAIN PLAN output?

##########
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##########
@@ -8955,6 +8955,28 @@ public void testMaxSubqueryRows() throws Exception
     );
   }
 
+  @Test
+  public void testExplainUnionAllOfQueries() throws Exception
+  {
+    skipVectorize(); // Required otherwise results of the EXPLAIN change
+    String query = "EXPLAIN PLAN FOR (SELECT dim1 FROM druid.foo UNION ALL SELECT dim1 from druid.foo2)";
+    String resources = "[{\"name\":\"foo2\",\"type\":\"DATASOURCE\"},{\"name\":\"foo\",\"type\":\"DATASOURCE\"}]";
+    String explanation = "[{\"query\":{\"dataSource\":{\"type\":\"union\",\"dataSources\":[{\"type\":\"table\",\"name\":\"foo\"},{\"type\":\"table\",\"name\":\"foo2\"}]},\"intervals\":{\"type\":\"intervals\",\"intervals\":[\"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z\"]},\"virtualColumns\":[],\"resultFormat\":\"compactedList\",\"batchSize\":20480,\"order\":\"none\",\"filter\":null,\"columns\":[\"dim1\"],\"legacy\":false,\"context\":{\"defaultTimeout\":300000,\"maxScatterGatherBytes\":9223372036854775807,\"sqlCurrentTimestamp\":\"2000-01-01T00:00:00Z\",\"sqlQueryId\":\"dummy\",\"vectorize\":\"false\",\"vectorizeVirtualColumns\":\"false\"},\"descending\":false,\"granularity\":{\"type\":\"all\"}},\"signature\":\"{dim1:STRING}\"}]";
+    testQuery(query, ImmutableList.of(), ImmutableList.of(new Object[]{explanation, resources}));
+  }
+
+  @Test
+  public void testExplainUnionOfMultipleNativeQueries() throws Exception
+  {
+    skipVectorize();
+    String query = "EXPLAIN PLAN FOR ((SELECT dim1 FROM druid.foo) UNION ALL (SELECT dim1 FROM druid.foo4 where dim1 in (SELECT dim1 FROM druid.foo)))";
+    String explanation = "["
+                         + "{\"query\":{\"dataSource\":{\"type\":\"table\",\"name\":\"foo\"},\"intervals\":{\"type\":\"intervals\",\"intervals\":[\"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z\"]},\"virtualColumns\":[],\"resultFormat\":\"compactedList\",\"batchSize\":20480,\"order\":\"none\",\"filter\":null,\"columns\":[\"dim1\"],\"legacy\":false,\"context\":{\"defaultTimeout\":300000,\"maxScatterGatherBytes\":9223372036854775807,\"sqlCurrentTimestamp\":\"2000-01-01T00:00:00Z\",\"sqlQueryId\":\"dummy\",\"vectorize\":\"false\",\"vectorizeVirtualColumns\":\"false\"},\"descending\":false,\"granularity\":{\"type\":\"all\"}},\"signature\":\"{dim1:STRING}\"}"
+                         + ",{\"query\":{\"dataSource\":{\"type\":\"join\",\"left\":{\"type\":\"table\",\"name\":\"foo4\"},\"right\":{\"type\":\"query\",\"query\":{\"queryType\":\"groupBy\",\"dataSource\":{\"type\":\"table\",\"name\":\"foo\"},\"intervals\":{\"type\":\"intervals\",\"intervals\":[\"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z\"]},\"virtualColumns\":[],\"filter\":null,\"granularity\":{\"type\":\"all\"},\"dimensions\":[{\"type\":\"default\",\"dimension\":\"dim1\",\"outputName\":\"d0\",\"outputType\":\"STRING\"}],\"aggregations\":[],\"postAggregations\":[],\"having\":null,\"limitSpec\":{\"type\":\"NoopLimitSpec\"},\"context\":{\"defaultTimeout\":300000,\"maxScatterGatherBytes\":9223372036854775807,\"sqlCurrentTimestamp\":\"2000-01-01T00:00:00Z\",\"sqlQueryId\":\"dummy\",\"vectorize\":\"false\",\"vectorizeVirtualColumns\":\"false\"},\"descending\":false}},\"rightPrefix\":\"j0.\",\"condition\":\"(\\\"dim1\\\" == \\\"j0.d0\\\")\",\"joinType\":\"INNER\",\"lef
 tFilter\":null},\"intervals\":{\"type\":\"intervals\",\"intervals\":[\"-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z\"]},\"virtualColumns\":[],\"resultFormat\":\"compactedList\",\"batchSize\":20480,\"order\":\"none\",\"filter\":null,\"columns\":[\"dim1\"],\"legacy\":false,\"context\":{\"defaultTimeout\":300000,\"maxScatterGatherBytes\":9223372036854775807,\"sqlCurrentTimestamp\":\"2000-01-01T00:00:00Z\",\"sqlQueryId\":\"dummy\",\"vectorize\":\"false\",\"vectorizeVirtualColumns\":\"false\"},\"descending\":false,\"granularity\":{\"type\":\"all\"}},\"signature\":\"{dim1:STRING}\"}]";
+    String resources = "[{\"name\":\"foo4\",\"type\":\"DATASOURCE\"},{\"name\":\"foo\",\"type\":\"DATASOURCE\"}]";
+    testQuery(query, ImmutableList.of(), ImmutableList.of(new Object[]{explanation, resources}));
+  }
+
   @Test
   public void testExplainExactCountDistinctOfSemiJoinResult() throws Exception
   {

Review comment:
       As the other tests have not changed, I assume this means that the output of EXPLAIN PLAN for most use cases will remain the same?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org