You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/06/04 10:09:47 UTC

[GitHub] [druid] michdom44 opened a new issue #9985: SQL select count of "select..group by" subquery takes 30 s to execute

michdom44 opened a new issue #9985:
URL: https://github.com/apache/druid/issues/9985


   ### Affected Version
   
   Druid 0.12.1
   
   ### Description
   
   [rec_use_agg_imsi_supervisor.txt](https://github.com/apache/druid/files/4729207/rec_use_agg_imsi_supervisor.txt)
   [WINPRJ_current_config_2020-06-03_10.40.53.txt](https://github.com/apache/druid/files/4729209/WINPRJ_current_config_2020-06-03_10.40.53.txt)
   
   We have a data source ingested via Kafka (it’s not real-time ingestion but only ingestion of about 1 million rows once a day) which contains approximately 23.3 million rows (see supervisor spec in attachment). Our client wishes to extract information via Druid SQL with the following query:
   {
     "query": "select visitedoperator, visitedcountry, visitedmccmnc, count (distinct imsi) as count_unique_imsi from (select imsi, visitedoperator, visitedcountry, visitedmccmnc, count (flg_isalive) as nb_DayAlive from rec_use_agg_imsi where homemccmnc = 20801 and __ = 20801 and -01 'and __time <=' 2019-12-31 'group by imsi, visitedoperator, visitedcountry, visitedmccmnc having nb_DayAlive> = 1) group by visitedoperator, visitedcountry, visitedmccmnc "
   }
   This query works perfectly and returns 20 rows but its execution time is around 30 seconds:
   
   # time curl --negotiate -u: -XPOST -H'Content-Type: application / json 'http: // $ {DRUID_BROKER_HOST}: $ {DRUID_BROKER_PORT} / druid / v2 / sql / -d @ longReq.json | jq
     % Total% Received% Xferd Average Speed Time Time Time Current
                                    Dload Upload Total Spent Left Speed
   100 548 100 106 100 442 7857 32762 -: -: - -: -: - -: -: - 34000
   561 442 0 2041 100 442 73 16 0:00:27 0:00:27 -: -: - 352
   
   For information, the embedded “select..group by” returns approximately 3.1 million rows.
   
   The Druid architecture consists of 2 masters (coordinator and overlord processes on each node, 8 vcores and 16 Gb RAM), 2 data (historical and middlemanager processes on each node, 32 vcores and 128 Gb RAM) and 2 query (broker process on each node, 8 vcores and 16 Gb RAM). Cluster is Kerberized.
   
   I would like to know if it is possible to improve the performance of this type of query and how. I have tried to scale up the Druid architecture by doubling the number of broker and historical / middlemanager nodes, but I still get the same execution time. I also followed the tuning recommendations of the Druid documentation about heap and direct memory sizing but without success.
   
   Please find in attachment our Druid cluster configuration.
   
   Thanks for help!
   
   Regards,
   Michel.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org

[GitHub] [druid] michdom44 commented on issue #9985: SQL select count of "select..group by" subquery takes 30 s to execute

Posted by GitBox <gi...@apache.org>.

michdom44 commented on issue #9985:
URL: https://github.com/apache/druid/issues/9985#issuecomment-643156199


   Hi,
   
   any hint about this issue?
   
   Thanks,
   Michel.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org