You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "clintropolis (via GitHub)" <gi...@apache.org> on 2023/02/11 00:42:39 UTC

[GitHub] [druid] clintropolis commented on a diff in pull request #13764: Fixing a data correctness issue in unnest when first row of an MVD is null

clintropolis commented on code in PR #13764:
URL: https://github.com/apache/druid/pull/13764#discussion_r1103444244


##########
sql/src/test/java/org/apache/druid/sql/calcite/util/TestDataBuilder.java:
##########
@@ -351,6 +351,100 @@ public Optional<Joinable> build(
   public static final List<InputRow> ROWS1 =
       RAW_ROWS1.stream().map(TestDataBuilder::createRow).collect(Collectors.toList());
 
+  public static final List<ImmutableMap<String, Object>> RAW_ROWS_FOR_UNNEST = ImmutableList.of(
+      ImmutableMap.<String, Object>builder()
+                  .put("t", "2000-01-01")
+                  .put("m1", "1.0")
+                  .put("m2", "1.0")
+                  .put("d1", 1.0)
+                  .put("f1", 1.0f)
+                  .put("l1", 7L)
+                  .put("dim1", "")
+                  .put("dim3", ImmutableList.of("a", ImmutableList.of("b", "c")))

Review Comment:
   the string dimension indexer can't really handle nested arrays like this, i think you'll end up with something like "a" and then the 'toString' of ["b","c"], or maybe something even weirder...
   
   I think you should stick to having either flat lists or single layer strings for these tests



##########
sql/src/main/java/org/apache/druid/sql/calcite/rel/DruidCorrelateUnnestRel.java:
##########
@@ -200,11 +200,36 @@ public DruidQuery toDruidQuery(boolean finalizeAggregations)
     // This is necessary to handle the virtual columns on the unnestProject
     // Also create the unnest datasource to be used by the partial query
     PartialDruidQuery partialDruidQuery = unnestProjectNeeded ? partialQuery.withUnnest(unnestProject) : partialQuery;
+    String outputColName = unnestDatasourceRel.getUnnestProject().getRowType().getFieldNames().get(0);
+
+    // In case of nested queries for e.g.
+    // with t AS (select * from druid.numfoo, unnest(MV_TO_ARRAY(dim3)) as unnested (d3))
+    // select d2,d3 from t, UNNEST(MV_TO_ARRAY(dim2)) as foo(d2)
+    // which plans to
+    // 186:LogicalCorrelate
+    //  178:LogicalProject
+    //    176:LogicalCorrelate
+    //      13:LogicalTableScan(subset=[rel#168:Subset#0.NONE.[]], table=[[druid, numfoo]])
+    //      172:Uncollect(subset=[rel#173:Subset#3.NONE.[]])
+    //        170:LogicalProject(subset=[rel#171:Subset#2.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor0.dim3)])
+    //          14:LogicalValues(subset=[rel#169:Subset#1.NONE.[0]], tuples=[[{ 0 }]])
+    //  182:Uncollect(subset=[rel#183:Subset#8.NONE.[]])
+    //    180:LogicalProject(subset=[rel#181:Subset#7.NONE.[]], EXPR$0=[MV_TO_ARRAY($cor1.dim2)])
+    //      14:LogicalValues(subset=[rel#169:Subset#1.NONE.[0]], tuples=[[{ 0 }]])
+    //
+    // the column name cannot be EXPR$0 for both inner and outer. The inner one which gets executed first gets the name
+    // EXPR$0 and as we move up the tree we add a 0 at the end to make the top level EXPR$00.
+    // Ideally these names should be replaced by the alias names specified in the query. Any future developer if
+    // able to find these alias names should replace EXPR$0 by dim3 and EXPR$00 by dim2, i.e use the correct name from Calcite
+
+    if (druidQueryRel instanceof DruidCorrelateUnnestRel) {
+      outputColName = outputColName + "0";

Review Comment:
   im skeptical that this is always correct, is it really cool?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org