You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@superset.apache.org by GitBox <gi...@apache.org> on 2021/06/30 09:26:41 UTC

[GitHub] [superset] ecpbi opened a new issue #15470: Non necessary self join in Area chart query could bring performance issues

ecpbi opened a new issue #15470:
URL: https://github.com/apache/superset/issues/15470


   Hello,
   
   I implemented a relational schema on a 10.5.11 MariaDB database to store the data of my data warehouse and I installed the version 0.999.0dev of Superset (this is what tells me the "About" item of the "Settings" menu).
   
   I just tried the area chart and I noticed the following strange behavior of the query generator. It creates a non needed self join. Here is what I did and what was generated.
   First, I created avirtual SQL dataset from a query that works perfectly well on my DB.
   
   _SELECT
   	va.application_host
   	, va.application_pid
   	, va.application_port
   	, usr.appuserlogin
   	, apps.EventDate
   	, apps.EventTime
   	, vs.service_host
   	, vs.service_pid
   	, vs.service_port
   	, ori.origine_host
   	, ori.origine_pid
   	, ori.origine_port
   	, oc.operation_categorie
   	, oc.operation
   	, oc.WHATSUCCESS
   	, oc.objet_categorie
   	, apps.objet
   	, apps.PERFDATA
   FROM
   	application_supervision AS apps NATURAL
   JOIN appuser AS usr NATURAL
   JOIN v_application AS va NATURAL
   JOIN v_service AS vs NATURAL
   JOIN v_origine AS ori NATURAL
   JOIN objet_categorie AS oc_
   
   Then I used this dataset to create a chart drawing an area chart.
   ![image](https://user-images.githubusercontent.com/84923750/123934062-3787d200-d993-11eb-916f-32a92ae2d2a0.png)
   
   Finaly, I ran the query to get the chart on my screen, that works perfectly, maybe because my tables count a very low number of rows.
   After the rendering of the chart, I look at the query that is generated by the tool and I get the following.
   
   _SELECT DATE(`EventDate`) AS __timestamp,
          application_host AS application_host,
          application_port AS application_port,
          SUM(PERFDATA) AS `nb transactions`
   FROM
     (SELECT va.application_host ,
             va.application_pid ,
             va.application_port ,
             usr.appuserlogin ,
             apps.EventDate ,
             apps.EventTime ,
             vs.service_host ,
             vs.service_pid ,
             vs.service_port ,
             ori.origine_host ,
             ori.origine_pid ,
             ori.origine_port ,
             oc.operation_categorie ,
             oc.operation ,
             oc.WHATSUCCESS ,
             oc.objet_categorie ,
             apps.objet ,
             apps.PERFDATA
      FROM application_supervision AS apps
      NATURAL
   JOIN appuser AS usr
      NATURAL
   JOIN v_application AS va
      NATURAL
   JOIN v_service AS vs
      NATURAL
   JOIN v_origine AS ori
      NATURAL
   JOIN objet_categorie AS oc) AS virtual_table
   INNER JOIN
     (SELECT application_host AS application_host__,
             application_port AS application_port__,
             SUM(PERFDATA) AS mme_inner__
      FROM
        (SELECT va.application_host ,
                va.application_pid ,
                va.application_port ,
                usr.appuserlogin ,
                apps.EventDate ,
                apps.EventTime ,
                vs.service_host ,
                vs.service_pid ,
                vs.service_port ,
                ori.origine_host ,
                ori.origine_pid ,
                ori.origine_port ,
                oc.operation_categorie ,
                oc.operation ,
                oc.WHATSUCCESS ,
                oc.objet_categorie ,
                apps.objet ,
                apps.PERFDATA
         FROM application_supervision AS apps
         NATURAL
   JOIN appuser AS usr
         NATURAL
   JOIN v_application AS va
         NATURAL
   JOIN v_service AS vs
         NATURAL
   JOIN v_origine AS ori
         NATURAL
   JOIN objet_categorie AS oc) AS virtual_table
      WHERE ((objet_categorie ='Transactions'
              AND application_host like 'fr1btp260%'))
      GROUP BY application_host,
               application_port
      ORDER BY mme_inner__ ASC
      LIMIT 100) AS anon_1 ON application_host = application_host__
   AND application_port = application_port__
   WHERE ((objet_categorie ='Transactions'
           AND application_host like 'fr1btp260%'))
   GROUP BY application_host,
            application_port,
            DATE(`EventDate`)
   LIMIT 10000;
   
   I understand tue definition of the virtual_table derived table, as only a few columns of the dataset are used, but what is the utility of the anon_1 derived table ? This derived table is just useless as I get exactly the same result by simply dropping it from the query. Not only is this self join not usefull, but it has a strong probability to bring performance issuses when the data grow up.
   Maybe this has a purpose, but, from my point of view, this is only a bug.
   
   Regards,


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] AlmiS commented on issue #15470: Non necessary self join in Area chart query could bring performance issues

Posted by GitBox <gi...@apache.org>.
AlmiS commented on issue #15470:
URL: https://github.com/apache/superset/issues/15470#issuecomment-909573523


   This seems to be the result of setting the "series limit" option. If you hover the little info symbol on this option, you will see the description which  explains this behaviour. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org


[GitHub] [superset] AlmiS commented on issue #15470: Non necessary self join in Area chart query could bring performance issues

Posted by GitBox <gi...@apache.org>.
AlmiS commented on issue #15470:
URL: https://github.com/apache/superset/issues/15470#issuecomment-909573523


   This seems to be the result of setting the "series limit" option. If you hover the little info symbol on this option, you will see the description which  explains this behaviour. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: notifications-unsubscribe@superset.apache.org
For additional commands, e-mail: notifications-help@superset.apache.org