You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2020/04/02 03:30:49 UTC
[GitHub] [incubator-doris] yangzhg opened a new issue #3245: optimize cross
join performance when where clause is or predicate and has common equal
predicate exprs
yangzhg opened a new issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs
URL: https://github.com/apache/incubator-doris/issues/3245
queies like below cannot finish in a acceptable time, `store_sales` has 2800w rows, `customer_address` has 5w rows, for now doris will create only one cross join node to execute this sql,
the time of eavl the where clause is about 200-300 ns, the total count of eval will be 2800w * 5w, this is extremely large, and this will cost 2800w * 5w * 250 ns = 4 billion seconds;
```
select avg(ss_quantity)
,avg(ss_ext_sales_price)
,avg(ss_ext_wholesale_cost)
,sum(ss_ext_wholesale_cost)
from store_sales, customer_address
where ((ss_addr_sk = ca_address_sk
and ca_country = 'United States'
and ca_state in ('CO', 'IL', 'MN')
and ss_net_profit between 100 and 200
) or
(ss_addr_sk = ca_address_sk
and ca_country = 'United States'
and ca_state in ('OH', 'MT', 'NM')
and ss_net_profit between 150 and 300
) or
(ss_addr_sk = ca_address_sk
and ca_country = 'United States'
and ca_state in ('TX', 'MO', 'MI')
and ss_net_profit between 50 and 250
))
```
but this sql can be rewrite to
```
select avg(ss_quantity)
,avg(ss_ext_sales_price)
,avg(ss_ext_wholesale_cost)
,sum(ss_ext_wholesale_cost)
from store_sales, customer_address
where ss_addr_sk = ca_address_sk
and ca_country = 'United States' and (((ca_state in ('CO', 'IL', 'MN')
and ss_net_profit between 100 and 200
) or
(ca_state in ('OH', 'MT', 'NM')
and ss_net_profit between 150 and 300
) or
(ca_state in ('TX', 'MO', 'MI')
and ss_net_profit between 50 and 250
))
)
```
there for we can do a hash join first and then use
```
(((ca_state in ('CO', 'IL', 'MN')
and ss_net_profit between 100 and 200
) or
(ca_state in ('OH', 'MT', 'NM')
and ss_net_profit between 150 and 300
) or
(ca_state in ('TX', 'MO', 'MI')
and ss_net_profit between 50 and 250
))
)
```
to filter the value,
in tpcds 10g dataset, the rewrited sql only cost about 1 seconds
![image](https://user-images.githubusercontent.com/9098473/78207796-443e2180-74d5-11ea-8d01-a6598bfe59f9.png)
so we should implements this optimize
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] yangzhg commented on issue #3245: optimize cross
join performance when where clause is or predicate and has common equal
predicate exprs
Posted by GitBox <gi...@apache.org>.
yangzhg commented on issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs
URL: https://github.com/apache/incubator-doris/issues/3245#issuecomment-610128272
> Can I assign this issue to you?
OK
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org
[GitHub] [incubator-doris] imay commented on issue #3245: optimize cross
join performance when where clause is or predicate and has common equal
predicate exprs
Posted by GitBox <gi...@apache.org>.
imay commented on issue #3245: optimize cross join performance when where clause is or predicate and has common equal predicate exprs
URL: https://github.com/apache/incubator-doris/issues/3245#issuecomment-608197306
@yangzhg
Can I assign this issue to you?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org