You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by "clintropolis (via GitHub)" <gi...@apache.org> on 2023/03/16 13:44:33 UTC

[GitHub] [druid] clintropolis opened a new pull request, #13943: fix join and unnest planning to ensure that duplicate join prefixes a…

clintropolis opened a new pull request, #13943:
URL: https://github.com/apache/druid/pull/13943

   ### Description
   This PR fixes a join query planning regression caused by #13902. After that change, this query will actually plan "correctly" but then later explode due to duplicate join prefix. The planner was selecting a join prefix only looking at the `RowSignature` of the left and right side datasources, but if those queries had any nested join prefixes, the query would later fail because the native prefix checker flattens the join to check prefixes (so losing the context of what is used directly in outside layers).
   
   The solution Ive added is to just dig through the datasources to extract any existing prefixes and add that to the checked set when selecting new prefixes, to ensure the planner never picks duplicate prefixes even if it can't see them at the top level.
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13943: fix join and unnest planning to ensure that duplicate join prefixes are not used

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #13943:
URL: https://github.com/apache/druid/pull/13943#discussion_r1138760777


##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java:
##########
@@ -5084,4 +5087,181 @@
         null
     );
   }
+
+  @Test
+  @Parameters(source = QueryContextForJoinProvider.class)
+  public void testRegressionFilteredAggregatorsSubqueryJoins(Map<String, Object> queryContext)
+  {
+    cannotVectorize();
+    testQuery(
+        "select\n" +
+        "count(*) filter (where trim(both from dim1) in (select dim2 from foo)),\n" +
+        "min(m1) filter (where 'A' not in (select m2 from foo))\n" +
+        "from foo as t0\n" +
+        "where __time in (select __time from foo)",
+        queryContext,
+        useDefault ?
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(
+                    join(
+                        join(
+                            join(
+                                new TableDataSource(CalciteTests.DATASOURCE1),
+                                new QueryDataSource(
+                                    GroupByQuery.builder()
+                                                .setDataSource(CalciteTests.DATASOURCE1)
+                                                .setInterval(querySegmentSpec(Filtration.eternity()))
+                                                .setDimensions(
+                                                    new DefaultDimensionSpec("__time", "d0", ColumnType.LONG)
+                                                )
+                                                .setGranularity(Granularities.ALL)
+                                                .setLimitSpec(NoopLimitSpec.instance())
+                                                .build()
+                                ),
+                                "j0.",
+                                equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4397)



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java:
##########
@@ -5084,4 +5087,181 @@
         null
     );
   }
+
+  @Test
+  @Parameters(source = QueryContextForJoinProvider.class)
+  public void testRegressionFilteredAggregatorsSubqueryJoins(Map<String, Object> queryContext)
+  {
+    cannotVectorize();
+    testQuery(
+        "select\n" +
+        "count(*) filter (where trim(both from dim1) in (select dim2 from foo)),\n" +
+        "min(m1) filter (where 'A' not in (select m2 from foo))\n" +
+        "from foo as t0\n" +
+        "where __time in (select __time from foo)",
+        queryContext,
+        useDefault ?
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(
+                    join(
+                        join(
+                            join(
+                                new TableDataSource(CalciteTests.DATASOURCE1),
+                                new QueryDataSource(
+                                    GroupByQuery.builder()
+                                                .setDataSource(CalciteTests.DATASOURCE1)
+                                                .setInterval(querySegmentSpec(Filtration.eternity()))
+                                                .setDimensions(
+                                                    new DefaultDimensionSpec("__time", "d0", ColumnType.LONG)
+                                                )
+                                                .setGranularity(Granularities.ALL)
+                                                .setLimitSpec(NoopLimitSpec.instance())
+                                                .build()
+                                ),
+                                "j0.",
+                                equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4396)



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java:
##########
@@ -5084,4 +5087,181 @@
         null
     );
   }
+
+  @Test
+  @Parameters(source = QueryContextForJoinProvider.class)
+  public void testRegressionFilteredAggregatorsSubqueryJoins(Map<String, Object> queryContext)
+  {
+    cannotVectorize();
+    testQuery(
+        "select\n" +
+        "count(*) filter (where trim(both from dim1) in (select dim2 from foo)),\n" +
+        "min(m1) filter (where 'A' not in (select m2 from foo))\n" +
+        "from foo as t0\n" +
+        "where __time in (select __time from foo)",
+        queryContext,
+        useDefault ?
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(
+                    join(
+                        join(
+                            join(
+                                new TableDataSource(CalciteTests.DATASOURCE1),
+                                new QueryDataSource(
+                                    GroupByQuery.builder()
+                                                .setDataSource(CalciteTests.DATASOURCE1)
+                                                .setInterval(querySegmentSpec(Filtration.eternity()))
+                                                .setDimensions(
+                                                    new DefaultDimensionSpec("__time", "d0", ColumnType.LONG)
+                                                )
+                                                .setGranularity(Granularities.ALL)
+                                                .setLimitSpec(NoopLimitSpec.instance())
+                                                .build()
+                                ),
+                                "j0.",
+                                equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),
+                                JoinType.INNER
+                            ),
+                            new QueryDataSource(
+                                GroupByQuery.builder()
+                                            .setDataSource(CalciteTests.DATASOURCE1)
+                                            .setInterval(querySegmentSpec(Filtration.eternity()))
+                                            .setVirtualColumns(expressionVirtualColumn("v0", "1", ColumnType.LONG))
+                                            .setDimensions(
+                                                new DefaultDimensionSpec("dim2", "d0", ColumnType.STRING),
+                                                new DefaultDimensionSpec("v0", "d1", ColumnType.LONG)
+                                            )
+                                            .setGranularity(Granularities.ALL)
+                                            .setLimitSpec(NoopLimitSpec.instance())
+                                            .build()
+                            ),
+                            "_j0.",
+                            "(trim(\"dim1\",' ') == \"_j0.d0\")",
+                            JoinType.LEFT
+                        ),
+                        new QueryDataSource(
+                            GroupByQuery.builder()
+                                        .setDataSource(CalciteTests.DATASOURCE1)
+                                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                                        .setVirtualColumns(expressionVirtualColumn("v0", "1", ColumnType.LONG))
+                                        .setDimFilter(selector("m2", "A", null))
+                                        .setDimensions(
+                                            new DefaultDimensionSpec("v0", "d0", ColumnType.LONG)
+                                        )
+                                        .setGranularity(Granularities.ALL)
+                                        .setLimitSpec(NoopLimitSpec.instance())
+                                        .build()
+                        ),
+                        "__j0.",
+                        "1",
+                        JoinType.LEFT
+                    )
+                )
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .aggregators(
+                    new FilteredAggregatorFactory(
+                        new CountAggregatorFactory("a0"),
+                        and(
+                            not(selector("_j0.d1", null, null)),
+                            not(selector("dim1", null, null))
+                        ),
+                        "a0"
+                    ),
+                    new FilteredAggregatorFactory(
+                        new FloatMinAggregatorFactory("a1", "m1"),
+                        selector("__j0.d0", null, null),
+                        "a1"
+                    )
+                )
+                .context(queryContext)
+                .build()
+        ) :
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(
+                      join(
+                          join(
+                              join(
+                                  new TableDataSource(CalciteTests.DATASOURCE1),
+                                  new QueryDataSource(
+                                      GroupByQuery.builder()
+                                                  .setDataSource(CalciteTests.DATASOURCE1)
+                                                  .setInterval(querySegmentSpec(Filtration.eternity()))
+                                                  .setDimensions(
+                                                      new DefaultDimensionSpec("__time", "d0", ColumnType.LONG)
+                                                  )
+                                                  .setGranularity(Granularities.ALL)
+                                                  .setLimitSpec(NoopLimitSpec.instance())
+                                                  .build()
+                                  ),
+                                  "j0.",
+                                  equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4399)



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java:
##########
@@ -5084,4 +5087,181 @@
         null
     );
   }
+
+  @Test
+  @Parameters(source = QueryContextForJoinProvider.class)
+  public void testRegressionFilteredAggregatorsSubqueryJoins(Map<String, Object> queryContext)
+  {
+    cannotVectorize();
+    testQuery(
+        "select\n" +
+        "count(*) filter (where trim(both from dim1) in (select dim2 from foo)),\n" +
+        "min(m1) filter (where 'A' not in (select m2 from foo))\n" +
+        "from foo as t0\n" +
+        "where __time in (select __time from foo)",
+        queryContext,
+        useDefault ?
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(
+                    join(
+                        join(
+                            join(
+                                new TableDataSource(CalciteTests.DATASOURCE1),
+                                new QueryDataSource(
+                                    GroupByQuery.builder()
+                                                .setDataSource(CalciteTests.DATASOURCE1)
+                                                .setInterval(querySegmentSpec(Filtration.eternity()))
+                                                .setDimensions(
+                                                    new DefaultDimensionSpec("__time", "d0", ColumnType.LONG)
+                                                )
+                                                .setGranularity(Granularities.ALL)
+                                                .setLimitSpec(NoopLimitSpec.instance())
+                                                .build()
+                                ),
+                                "j0.",
+                                equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),
+                                JoinType.INNER
+                            ),
+                            new QueryDataSource(
+                                GroupByQuery.builder()
+                                            .setDataSource(CalciteTests.DATASOURCE1)
+                                            .setInterval(querySegmentSpec(Filtration.eternity()))
+                                            .setVirtualColumns(expressionVirtualColumn("v0", "1", ColumnType.LONG))
+                                            .setDimensions(
+                                                new DefaultDimensionSpec("dim2", "d0", ColumnType.STRING),
+                                                new DefaultDimensionSpec("v0", "d1", ColumnType.LONG)
+                                            )
+                                            .setGranularity(Granularities.ALL)
+                                            .setLimitSpec(NoopLimitSpec.instance())
+                                            .build()
+                            ),
+                            "_j0.",
+                            "(trim(\"dim1\",' ') == \"_j0.d0\")",
+                            JoinType.LEFT
+                        ),
+                        new QueryDataSource(
+                            GroupByQuery.builder()
+                                        .setDataSource(CalciteTests.DATASOURCE1)
+                                        .setInterval(querySegmentSpec(Filtration.eternity()))
+                                        .setVirtualColumns(expressionVirtualColumn("v0", "1", ColumnType.LONG))
+                                        .setDimFilter(selector("m2", "A", null))
+                                        .setDimensions(
+                                            new DefaultDimensionSpec("v0", "d0", ColumnType.LONG)
+                                        )
+                                        .setGranularity(Granularities.ALL)
+                                        .setLimitSpec(NoopLimitSpec.instance())
+                                        .build()
+                        ),
+                        "__j0.",
+                        "1",
+                        JoinType.LEFT
+                    )
+                )
+                .intervals(querySegmentSpec(Filtration.eternity()))
+                .aggregators(
+                    new FilteredAggregatorFactory(
+                        new CountAggregatorFactory("a0"),
+                        and(
+                            not(selector("_j0.d1", null, null)),
+                            not(selector("dim1", null, null))
+                        ),
+                        "a0"
+                    ),
+                    new FilteredAggregatorFactory(
+                        new FloatMinAggregatorFactory("a1", "m1"),
+                        selector("__j0.d0", null, null),
+                        "a1"
+                    )
+                )
+                .context(queryContext)
+                .build()
+        ) :
+        ImmutableList.of(
+            Druids.newTimeseriesQueryBuilder()
+                  .dataSource(
+                      join(
+                          join(
+                              join(
+                                  new TableDataSource(CalciteTests.DATASOURCE1),
+                                  new QueryDataSource(
+                                      GroupByQuery.builder()
+                                                  .setDataSource(CalciteTests.DATASOURCE1)
+                                                  .setInterval(querySegmentSpec(Filtration.eternity()))
+                                                  .setDimensions(
+                                                      new DefaultDimensionSpec("__time", "d0", ColumnType.LONG)
+                                                  )
+                                                  .setGranularity(Granularities.ALL)
+                                                  .setLimitSpec(NoopLimitSpec.instance())
+                                                  .build()
+                                  ),
+                                  "j0.",
+                                  equalsCondition(makeColumnExpression("__time"), makeColumnExpression("j0.d0")),

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4398)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] github-code-scanning[bot] commented on a diff in pull request #13943: fix join and unnest planning to ensure that duplicate join prefixes are not used

Posted by "github-code-scanning[bot] (via GitHub)" <gi...@apache.org>.
github-code-scanning[bot] commented on code in PR #13943:
URL: https://github.com/apache/druid/pull/13943#discussion_r1139712250


##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java:
##########
@@ -4766,8 +4769,8 @@
                                 .context(queryContext)
                                 .build()
                         ),
-                        "j0.",
-                        equalsCondition(makeColumnExpression("v0"), makeColumnExpression("j0.v0")),
+                        "_j0.",
+                        equalsCondition(makeColumnExpression("v0"), makeColumnExpression("_j0.v0")),

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4402)



##########
sql/src/test/java/org/apache/druid/sql/calcite/CalciteJoinQueryTest.java:
##########
@@ -4766,8 +4769,8 @@
                                 .context(queryContext)
                                 .build()
                         ),
-                        "j0.",
-                        equalsCondition(makeColumnExpression("v0"), makeColumnExpression("j0.v0")),
+                        "_j0.",
+                        equalsCondition(makeColumnExpression("v0"), makeColumnExpression("_j0.v0")),

Review Comment:
   ## Deprecated method or constructor invocation
   
   Invoking [CalciteTestBase.makeColumnExpression](1) should be avoided because it has been deprecated.
   
   [Show more details](https://github.com/apache/druid/security/code-scanning/4401)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis merged pull request #13943: fix join and unnest planning to ensure that duplicate join prefixes are not used

Posted by "clintropolis (via GitHub)" <gi...@apache.org>.
clintropolis merged PR #13943:
URL: https://github.com/apache/druid/pull/13943


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org