You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "paul-rogers (via GitHub)" <gi...@apache.org> on 2023/02/08 21:23:48 UTC

[GitHub] [druid] paul-rogers commented on a diff in pull request #13764: Fixing a data correctness issue in unnest when first row of an MVD is null

paul-rogers commented on code in PR #13764:
URL: https://github.com/apache/druid/pull/13764#discussion_r1100676736


##########
sql/src/test/java/org/apache/druid/sql/calcite/util/CalciteTests.java:
##########
@@ -97,6 +97,7 @@
   public static final String DATASOURCE3 = "numfoo";
   public static final String DATASOURCE4 = "foo4";
   public static final String DATASOURCE5 = "lotsocolumns";
+  public static final String DATASOURCE6 = "unnestnumfoo";

Review Comment:
   Better name? `nested` perhaps? Also, would be cool to add a comment with the schema: I find it hard to suss that out from the code.



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -200,11 +200,36 @@ public DruidQuery toDruidQuery(boolean finalizeAggregations)
     // This is necessary to handle the virtual columns on the unnestProject
     // Also create the unnest datasource to be used by the partial query
     PartialDruidQuery partialDruidQuery = unnestProjectNeeded ? partialQuery.withUnnest(unnestProject) : partialQuery;
+    String outputColName = unnestDatasourceRel.getUnnestProject().getRowType().getFieldNames().get(0);
+
+    // In case of nested queries for e.g.
+    // with t AS (select * from druid.numfoo, unnest(MV_TO_ARRAY(dim3)) as unnested (d3))
+    // select d2,d3 from t, UNNEST(MV_TO_ARRAY(dim2)) as foo(d2)
+    // which plans to
+    // 186:LogicalCorrelate
+    //  178:LogicalProject
+    //    176:LogicalCorrelate
+    //      13:LogicalTableScan(subset=[rel#168:Subset#0.NONE.[]], table=[[druid, numfoo]])
+    //      172:Uncollect(subset=[rel#173:Subset#3.NONE.[]])
+    //        170:LogicalProject(subset=[rel#171:Subset#2.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
+    //          14:LogicalValues(subset=[rel#169:Subset#1.NONE.[0]], tuples=[[{ 0 }]])
+    //  182:Uncollect(subset=[rel#183:Subset#8.NONE.[]])
+    //    180:LogicalProject(subset=[rel#181:Subset#7.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor1.dim2)])
+    //      14:LogicalValues(subset=[rel#169:Subset#1.NONE.[0]], tuples=[[{ 0 }]])
+    //
+    // the column name cannot be EXPR$0 for both inner and outer. The inner one which gets executed first gets the name
+    // EXPR$0 and as we move up the tree we add a 0 at the end to make the top level EXPR$00.
+    // Ideally these names should be replaced by the alias names specified in the query. Any future developer if
+    // able to find these alias names should replace EXPR$0 by dim3 and EXPR$00 by dim2, i.e use the correct name from Calcite

Review Comment:
   Thanks much for the detailed explanation!



##########
sql/src/test/java/org/apache/druid/sql/calcite/util/TestDataBuilder.java:
##########
@@ -351,6 +351,100 @@ public Optional<Joinable> build(
   public static final List<InputRow> ROWS1 =
       RAW_ROWS1.stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
 
+  public static final List<ImmutableMap<String, Object>> RAW_ROWS_FOR_UNNEST = ImmutableList.of(

Review Comment:
   Does this have all the interesting corner cases? Empty arrays or objects? Null values? Fields that appear in one nested object but not another (in both orders: (a,b), (a), (a,c))? And so on. To help future readers, might be handy to add a comment above each `.put(` call that sets up one of these cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org