You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2013/11/21 07:23:38 UTC

[jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently

    [ https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828528#comment-13828528 ] 

Amareshwari Sriramadasu commented on HIVE-4956:
-----------------------------------------------

I agree with the concerns above that this is deviating from SQL. But it gives lot of performance improvement in distributed systems. How about change the separator to '+' instead of ',', as part of Hive QL? 

The query will look like the following :
{noformat}
select t.x, t.y, .... from T1+T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause]
{noformat}

If the proposal is fine, I can upload the patch.

> Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4956
>                 URL: https://issues.apache.org/jira/browse/HIVE-4956
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> We have a usecase where the table storage partitioning changes over time.
> For ex:
>  we can have a table T1 which is partitioned by p1. But overtime, we want to partition the table on p1 and p2 as well. The new table can be T2. So, if we have to query table on partition p1, it will be a union query across two table T1 and T2. Especially with aggregations like avg, it becomes costly union query because we cannot make use of mapside aggregations and other optimizations.
> The proposal is to support queries of the following format :
> select t.x, t.y, .... from T1,T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause] [orderby-clause] and so on.
> Here we allow from clause as a comma separated list of tables with an alias and alias will be used in the full query, and partition pruning will happen on the actual tables to pick up the right paths. This will work because the difference is only on picking up the input paths and whole operator tree does not change. If this sounds a good usecase, I can put up the changes required to support the same.



--
This message was sent by Atlassian JIRA
(v6.1#6144)