You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Ning Zhang (JIRA)" <ji...@apache.org> on 2009/10/05 22:27:31 UTC

[jira] Commented: (HIVE-784) Support subqueries in Hive

    [ https://issues.apache.org/jira/browse/HIVE-784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12762346#action_12762346 ] 

Ning Zhang commented on HIVE-784:
---------------------------------

Subqueries can be correlated or uncorrelated to its outer queries. If it is uncorrelated, the subqueries can be treated as a constant (set of rows in the case of IN, or boolean in the case of EXISTS). It is the easiest case where subqueries can be evaluated once and results don't change. It is the correlated subquries that are more common and much more complex. 

In general, correlated subqueries can be treated as a function, where the correlated variables can be treated as the input parameters. For example, the following nested subquery

select * 
from A
where exists (
    select null 
    from B
    where B.id = A.id 
        and B.date > '2009-10-01') 

can be treated as a function where the passing parameter is the correlated variable A.id. A native plan is to evaluate this function for every row in A. This is equivalent to a nested-loop join (semi) between A and B. A better evaluation plan is to evaluate the subquery every time the input parameter changes value. A better generalization is to rewrite the whole query to unnest the subquery into a semi-join between A and B. Then there are more join algorithms to choose from. A whole body of database research papers are dedicated to rewrite rules on unnesting subqueries to joins. 

There are also cases that the nested subqueries cannot be unnested into joins, particularly for those subqueries involving aggretations. 

As the first step, we will working on cases where subquries are uncorrelated or they can be unnested into semi-joins.  

> Support subqueries in Hive
> --------------------------
>
>                 Key: HIVE-784
>                 URL: https://issues.apache.org/jira/browse/HIVE-784
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Ning Zhang
>            Assignee: Ning Zhang
>
> Hive currently only support views in the FROM-clause, some Facebook use cases suggest that Hive should support subqueries such as those connected by IN/EXISTS in the WHERE-clause. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.