You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/08/14 15:57:02 UTC

[GitHub] [druid] abhishekrb19 opened a new issue #10282: Druid SQL: A sub-query with expressions returns no data potentially due to an incorrect Query plan.

abhishekrb19 opened a new issue #10282:
URL: https://github.com/apache/druid/issues/10282


   A SQL sub-query with an expression specified in the inner query's HAVING sub-clause returns no data. Easy to reproduce, please see below for more details.
   
   ### Affected Version
   
   0.18.1
   ### Description
   
   A minimal example to reproduce the issue easily -- let's consider a single dimension "a" in a data source "ds", the problem can be reproduced with the following query:
   
   **Query-1**: Sub-query:
   ```
   SELECT COUNT(*)
   FROM (
       SELECT a
       FROM ds
       GROUP BY a
       HAVING SUM(a) + 1 > 0
    )
   ```
   returns no data.
   
   **Query-2**: The inner sub-query in itself returns results as expected:
   ```
       SELECT a
       FROM ds
       GROUP BY a
       HAVING SUM(a) + 1 > 0
   ```
   
   returns the following results:
   ```
   a2
   a3
   a6
   a7
   ```
   The expression specified in the `HAVING` clause in the sub-query appears to be the offending one. Note that without the expression, the query returns the correct results with 4 rows.
   
   Further, looking into the query plan for Query-1, we have:
   ```
   DruidOuterQueryRel(query=[{"queryType":"timeseries","dataSource":{"type":"table","name":"__subquery__"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"descending":false,"virtualColumns":[{"type":"expression","name":"v0","expression":"(\"$f1\" + 1)","outputType":"DOUBLE"}],"filter":{"type":"bound","dimension":"v0","lower":"0","upper":null,"lowerStrict":true,"upperStrict":false,"extractionFn":null,"ordering":{"type":"numeric"}},"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"a0"}],"postAggregations":[],"limit":2147483647,"context":{"populateCache":false,"skipEmptyBuckets":false,"sqlOuterLimit":100,"sqlQueryId":"ae0500b4-4dab-40c3-8f2d-6d17a2e3e82b","useCache":false}}], signature=[{a0:LONG}]) DruidQueryRel(query=[{"queryType":"groupBy","dataSource":{"type":"table","name":"ds"},"intervals":{"type":"intervals","intervals":["-146136543-09-08T08:23:32.096Z/146140482-04-24T15:36:27.903Z"]},"virtualColumns
 ":[],"filter":null,"granularity":{"type":"all"},"dimensions":[{"type":"default","dimension":"a","outputName":"d0","outputType":"STRING"}],"aggregations":[{"type":"doubleSum","name":"a0","fieldName":null,"expression":"CAST(\"a\", 'DOUBLE')"}],"postAggregations":[],"having":null,"limitSpec":{"type":"NoopLimitSpec"},"context":{"populateCache":false,"skipEmptyBuckets":false,"sqlOuterLimit":100,"sqlQueryId":"ae0500b4-4dab-40c3-8f2d-6d17a2e3e82b","useCache":false},"descending":false}], signature=[{d0:STRING, a0:DOUBLE}])
   ```
   
   In the query plan above, the _outer_ query (`DruidOuterQueryRel`) has the virtual column with the expression specified which looks incorrect, while the `having` in the _inner_ query (`DruidQueryRel`) is `null`.  I'd think the expression must be in the _inner_ query, with the `having` clause set.
   
   /cc: @gianm @clintropolis @jihoonson. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekrb19 commented on issue #10282: Druid SQL: A sub-query with expressions returns no data potentially due to an incorrect Query plan.

Posted by GitBox <gi...@apache.org>.
abhishekrb19 commented on issue #10282:
URL: https://github.com/apache/druid/issues/10282#issuecomment-678400182


   @gianm, I updated the details with the native JSON queries. Note that I edited the original query a bit for clarity purposes that will hopefully help narrow down the issue to the usage of expressions. It is quite easy to reproduce this issue. Please let me know if you need more details. Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm edited a comment on issue #10282: Druid SQL: A sub-query with expressions returns no data potentially due to an incorrect Query plan.

Posted by GitBox <gi...@apache.org>.
gianm edited a comment on issue #10282:
URL: https://github.com/apache/druid/issues/10282#issuecomment-677862016


   Hmm, in general it should be OK to move a HAVING from an inner query to the WHERE of an outer query. That's what seems to be happening here. It doesn't look like a smoking gun to me. But it's possible it didn't get moved properly.
   
   Unfortunately query plans for multi stage queries can be tough to read. Personally I find it easier to enable request logging on the Broker (https://support.imply.io/hc/en-us/articles/360011745614-Enable-Query-or-Request-logging-for-Druid) and then grab the native query out of the request log.
   
   Could you do that and post it here? It'd be nice to see what the native query is.
   
   (We're hoping to change the output of explain plan to be more readable in the future.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] siemib commented on issue #10282: Druid SQL: A sub-query with expressions returns no data potentially due to an incorrect Query plan.

Posted by GitBox <gi...@apache.org>.
siemib commented on issue #10282:
URL: https://github.com/apache/druid/issues/10282#issuecomment-679543778


   We are also seeing this issue.
   
   There was a similar issue reported in the past
   
   https://github.com/apache/druid/issues/10044
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] gianm commented on issue #10282: Druid SQL: A sub-query with expressions returns no data potentially due to an incorrect Query plan.

Posted by GitBox <gi...@apache.org>.
gianm commented on issue #10282:
URL: https://github.com/apache/druid/issues/10282#issuecomment-677862016


   Hmm, in general it should be OK to move a HAVING from an inner query to the WHERE of an outer query. That's what seems to be happening here. It doesn't look like a smoking gun to me. But it's possible it didn't get moved properly.
   
   Unfortunately query plans for multi stage queries can be tough to read. Personally I find it easier to enable request logging on the Broker (https://support.imply.io/hc/en-us/articles/360011745614-Enable-Query-or-Request-logging-for-Druid) and then grab the native query out of the request log.
   
   Could you do that and post it here? It'd be nice to see what the native query is.
   
   (I'm hoping to change the output of explain plan to be more readable in the future.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org