You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2009/08/24 19:29:59 UTC

[jira] Created: (HIVE-784) Support subqueries in Hive

Support subqueries in Hive
--------------------------

                 Key: HIVE-784
                 URL: https://issues.apache.org/jira/browse/HIVE-784
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Ning Zhang


Hive currently only support views in the FROM-clause, some internal uses suggest that Hive should support subqueries such as connected by IN/EXISTS in the WHERE-clause. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HIVE-784) Support subqueries in Hive

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang reassigned HIVE-784:
-------------------------------

    Assignee: Ning Zhang

> Support subqueries in Hive
> --------------------------
>
>                 Key: HIVE-784
>                 URL: https://issues.apache.org/jira/browse/HIVE-784
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-784) Support subqueries in Hive

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ning Zhang updated HIVE-784:
----------------------------

    Description: Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause.   (was: Hive currently only support views in the FROM-clause, some internal uses suggest that Hive should support subqueries such as connected by IN/EXISTS in the WHERE-clause. )

> Support subqueries in Hive
> --------------------------
>
>                 Key: HIVE-784
>                 URL: https://issues.apache.org/jira/browse/HIVE-784
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ning Zhang
>
> Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-784) Support subqueries in Hive

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762453#action_12762453 ] 

Ning Zhang commented on HIVE-784:
---------------------------------

Created a new task HIVE-870 for semi-join. Hive users should be able to specify semi-join in their queries just as INNER/OUTER joins. 

> Support subqueries in Hive
> --------------------------
>
>                 Key: HIVE-784
>                 URL: https://issues.apache.org/jira/browse/HIVE-784
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-784) Support subqueries in Hive

Posted by "Ning Zhang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762346#action_12762346 ] 

Ning Zhang commented on HIVE-784:
---------------------------------

Subqueries can be correlated or uncorrelated to its outer queries. If it is uncorrelated, the subqueries can be treated as a constant (set of rows in the case of IN, or boolean in the case of EXISTS). It is the easiest case where subqueries can be evaluated once and results don't change. It is the correlated subquries that are more common and much more complex. 

In general, correlated subqueries can be treated as a function, where the correlated variables can be treated as the input parameters. For example, the following nested subquery

select * 
from A
where exists (
    select null 
    from B
    where B.id = A.id 
        and B.date > '2009-10-01') 

can be treated as a function where the passing parameter is the correlated variable A.id. A native plan is to evaluate this function for every row in A. This is equivalent to a nested-loop join (semi) between A and B. A better evaluation plan is to evaluate the subquery every time the input parameter changes value. A better generalization is to rewrite the whole query to unnest the subquery into a semi-join between A and B. Then there are more join algorithms to choose from. A whole body of database research papers are dedicated to rewrite rules on unnesting subqueries to joins. 

There are also cases that the nested subqueries cannot be unnested into joins, particularly for those subqueries involving aggretations. 

As the first step, we will working on cases where subquries are uncorrelated or they can be unnested into semi-joins.  

> Support subqueries in Hive
> --------------------------
>
>                 Key: HIVE-784
>                 URL: https://issues.apache.org/jira/browse/HIVE-784
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.