You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/02 03:30:49 UTC

[GitHub] [incubator-doris] yangzhg opened a new issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs

yangzhg opened a new issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs 
URL: https://github.com/apache/incubator-doris/issues/3245
 
 
   queies like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now doris will create only one cross join node to execute this sql, 
   the time of eavl the where clause is about 200-300 ns, the total count of eval will be  2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds;
   ```
   select avg(ss_quantity)
          ,avg(ss_ext_sales_price)
          ,avg(ss_ext_wholesale_cost)
          ,sum(ss_ext_wholesale_cost)
    from store_sales, customer_address 
    where  ((ss_addr_sk = ca_address_sk
     and ca_country = 'United States'
     and ca_state in ('CO', 'IL', 'MN')
     and ss_net_profit between 100 and 200  
        ) or
        (ss_addr_sk = ca_address_sk
     and ca_country = 'United States'
     and ca_state in ('OH', 'MT', 'NM')
     and ss_net_profit between 150 and 300  
        ) or
        (ss_addr_sk = ca_address_sk
     and ca_country = 'United States'
     and ca_state in ('TX', 'MO', 'MI')
     and ss_net_profit between 50 and 250  
        ))
   ```
   
   but this  sql can be rewrite to 
   ```
   select avg(ss_quantity)
          ,avg(ss_ext_sales_price)
          ,avg(ss_ext_wholesale_cost)
          ,sum(ss_ext_wholesale_cost)
    from store_sales, customer_address 
    where ss_addr_sk = ca_address_sk
     and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN')
     and ss_net_profit between 100 and 200  
        ) or
        (ca_state in ('OH', 'MT', 'NM')
     and ss_net_profit between 150 and 300  
        ) or
        (ca_state in ('TX', 'MO', 'MI')
     and ss_net_profit between 50 and 250  
        ))
    )
   ```
   there for  we can do a hash join first and then use 
   ```
   (((ca_state in ('CO', 'IL', 'MN')
     and ss_net_profit between 100 and 200  
        ) or
        (ca_state in ('OH', 'MT', 'NM')
     and ss_net_profit between 150 and 300  
        ) or
        (ca_state in ('TX', 'MO', 'MI')
     and ss_net_profit between 50 and 250  
        ))
    )
   ```
   to filter the value,
   in tpcds 10g dataset,  the rewrited sql only cost about 1 seconds
   ![image](https://user-images.githubusercontent.com/9098473/78207796-443e2180-74d5-11ea-8d01-a6598bfe59f9.png)
   
   so we should implements this optimize

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] yangzhg commented on issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs

Posted by GitBox <gi...@apache.org>.
yangzhg commented on issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs 
URL: https://github.com/apache/incubator-doris/issues/3245#issuecomment-610128272
 
 
   > Can I assign this issue to you?
   
   OK 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org


[GitHub] [incubator-doris] imay commented on issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs

Posted by GitBox <gi...@apache.org>.
imay commented on issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs 
URL: https://github.com/apache/incubator-doris/issues/3245#issuecomment-608197306
 
 
   @yangzhg 
   Can I assign this issue to you?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org