You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Shengsheng Huang (JIRA)" <ji...@apache.org> on 2012/06/26 06:15:42 UTC

[jira] [Created] (HIVE-3199) support select distinct *

Shengsheng Huang created HIVE-3199:
--------------------------------------

             Summary: support select distinct *
                 Key: HIVE-3199
                 URL: https://issues.apache.org/jira/browse/HIVE-3199
             Project: Hive
          Issue Type: New Feature
          Components: Query Processor
    Affects Versions: 0.9.0
            Reporter: Shengsheng Huang


Error is reported when running query "select distinct * from t". 
It is a common feature that is better to be supported.

Did some investigation about this issue. In current implementation "select distinct a,b,c from t" is translated to "select a,b,c from t group by a,b,c". So select distinct * is translated literally to "select * from group by *". But * is not handled properly when processing groupby expressions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-3199) support select distinct *

Posted by "Shengsheng Huang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shengsheng Huang updated HIVE-3199:
-----------------------------------

    Attachment: HIVE-3199.for0.9.0.patch

Uploaded a patch to enable basic "select distinct *" (patterns like "select distinct * from t", "select distinct t.* from t" or "select distinct * from a join b" are all supported) This patch does not support subqueries - That means "select distinct *" can not be contained in subqueries, or the "from" clause is a subquery. 
                
> support select distinct *
> -------------------------
>
>                 Key: HIVE-3199
>                 URL: https://issues.apache.org/jira/browse/HIVE-3199
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.9.0
>            Reporter: Shengsheng Huang
>         Attachments: HIVE-3199.for0.9.0.patch
>
>
> Error is reported when running query "select distinct * from t". 
> It is a common feature that is better to be supported.
> Did some investigation about this issue. In current implementation "select distinct a,b,c from t" is translated to "select a,b,c from t group by a,b,c". So select distinct * is translated literally to "select * from group by *". But * is not handled properly when processing groupby expressions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HIVE-3199) support select distinct *

Posted by "Shengsheng Huang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-3199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403761#comment-13403761 ] 

Shengsheng Huang commented on HIVE-3199:
----------------------------------------

This patch handles "select distinct *" at compile time. It added another stage of AST transformer to replace * with named columns, because GroupBy does not accept wildcard. Another option is to handle wildcard at runtime. Anyway I think adding an extra stage of AST transformation for potential optimization or feature enabling makes sense. We could support NATURAL JOIN in the similar way.  @Namit @JQ What do you think?  
                
> support select distinct *
> -------------------------
>
>                 Key: HIVE-3199
>                 URL: https://issues.apache.org/jira/browse/HIVE-3199
>             Project: Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.9.0
>            Reporter: Shengsheng Huang
>         Attachments: HIVE-3199.for0.9.0.patch
>
>
> Error is reported when running query "select distinct * from t". 
> It is a common feature that is better to be supported.
> Did some investigation about this issue. In current implementation "select distinct a,b,c from t" is translated to "select a,b,c from t group by a,b,c". So select distinct * is translated literally to "select * from group by *". But * is not handled properly when processing groupby expressions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira