You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2015/08/12 05:24:45 UTC

[jira] [Commented] (TAJO-1561) Query which contains join condition in "OR" clause does not finish.

    [ https://issues.apache.org/jira/browse/TAJO-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14692846#comment-14692846 ] 

ASF GitHub Bot commented on TAJO-1561:
--------------------------------------

GitHub user jihoonson opened a pull request:

    https://github.com/apache/tajo/pull/685

    TAJO-1561: Query which contains join condition in "OR" clause does not finish.

    I added a new query rewriting rule. I've also tested the TPC-DS query reported at Jira.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jihoonson/tajo-2 TAJO-1561

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/685.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #685
    
----
commit 64814c0cfd5f13a3ba6802ff6a715fb1e4ee8781
Author: Jihoon Son <ji...@apache.org>
Date:   2015-08-11T09:21:49Z

    TAJO-1561

commit 3828616dcad249fee5c03badaffdbe1522452c34
Author: Jihoon Son <ji...@apache.org>
Date:   2015-08-12T03:23:05Z

    TAJO-1561

----


> Query which contains join condition in "OR" clause does not finish.
> -------------------------------------------------------------------
>
>                 Key: TAJO-1561
>                 URL: https://issues.apache.org/jira/browse/TAJO-1561
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Hyoungjun Kim
>            Assignee: Jihoon Son
>
> {code:sql}
> select sum (ss_quantity)
>  from store_sales, store, customer_demographics, customer_address, date_dim
>  where s_store_sk = ss_store_sk
>  and  ss_sold_date_sk = d_date_sk and d_year = 1998
>  and  
>  (
>   (
>    cd_demo_sk = ss_cdemo_sk
>    and 
>    cd_marital_status = 'M'
>    and 
>    cd_education_status = '4 yr Degree'
>    and 
>    ss_sales_price between 100.00 and 150.00  
>    )
>  or
>   (
>   cd_demo_sk = ss_cdemo_sk
>    and 
>    cd_marital_status = 'M'
>    and 
>    cd_education_status = '4 yr Degree'
>    and 
>    ss_sales_price between 50.00 and 100.00   
>   )
>  or 
>  (
>   cd_demo_sk = ss_cdemo_sk
>   and 
>    cd_marital_status = 'M'
>    and 
>    cd_education_status = '4 yr Degree'
>    and 
>    ss_sales_price between 150.00 and 200.00  
>  )
>  )
>  and
>  (
>   (
>   ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and
>   ca_state in ('KY', 'GA', 'NM')
>   and ss_net_profit between 0 and 2000  
>   )
>  or
>   (ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and
>   ca_state in ('MT', 'OR', 'IN')
>   and ss_net_profit between 150 and 3000 
>   )
>  or
>   (ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and
>   ca_state in ('WI', 'MO', 'WV')
>   and ss_net_profit between 50 and 25000 
>   )
>  )
> {code}
> See the following query(TPC-DS Query48). The join condition of this query is in the repeated OR clause as following:
> {noformat}
>    cd_demo_sk = ss_cdemo_sk
>    and 
>    cd_marital_status = 'M'
>    and 
>    cd_education_status = '4 yr Degree'
> {noformat}
> Tajo planner makes the logical for this query with CROSS JOIN because the planner can't find JOIN condition. So this query does not finish. This query can be changed to the following.
> {code:sql}
> select sum (ss_quantity)
>  from store_sales, store, customer_demographics, customer_address, date_dim
>  where s_store_sk = ss_store_sk
>  and  ss_sold_date_sk = d_date_sk and d_year = 1998
>  and
> (cd_demo_sk = ss_cdemo_sk
> and
> cd_marital_status = 'M'
> and
> cd_education_status = '4 yr Degree'
> and (
>   (ss_sales_price between 50.00 and 100.00) or
>   (ss_sales_price between 100.00 and 150.00) or
>   (ss_sales_price between 150.00 and 200.00)
> ))
> and
> (
> ss_addr_sk = ca_address_sk
>   and
>   ca_country = 'United States'
>   and (
>    (ca_state in ('KY', 'GA', 'NM') and ss_net_profit between 0 and 2000)
>    or
>    (ca_state in ('MT', 'OR', 'IN') and ss_net_profit between 150 and 3000)
>    or
>    (ca_state in ('WI', 'MO', 'WV') and ss_net_profit between 50 and 25000)
>   )
> )
> {code}
> Other solution also have same problem. See the following issues.
> - https://issues.cloudera.org/browse/IMPALA-1707 
> - https://issues.apache.org/jira/browse/HIVE-7914
> This issue is related with TPC-DS query 13, 48, 85.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)