You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Micah Gutman (JIRA)" <ji...@apache.org> on 2013/08/14 19:50:48 UTC

[jira] [Commented] (HIVE-5083) Group by ignored when group by column is a partition column

    [ https://issues.apache.org/jira/browse/HIVE-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739951#comment-13739951 ] 

Micah Gutman commented on HIVE-5083:
------------------------------------

Finally found the bug by using "show extended <table> <partition spec>" to figure out that all partitions were pointing to a single file. My selects only looked like they were working, they were just reading the same data over and over.

Specifically, I created my partitions with "alter table" using multiple partition specs in the same command. Interestingly, the wiki page help said:

Note that it is proper syntax to have multiple partition_spec in a single ALTER TABLE, but if you do this in version 0.7, your partitioning scheme will fail. That is, every query specifying a partition will always use only the first partition.

I am using 0.11, not 0.7. Apparently, 0.11 (and perhaps everything after 0.7?) has this problem.
                
> Group by ignored when group by column is a partition column
> -----------------------------------------------------------
>
>                 Key: HIVE-5083
>                 URL: https://issues.apache.org/jira/browse/HIVE-5083
>             Project: Hive
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 0.11.0
>         Environment: linux
>            Reporter: Micah Gutman
>
> I have an external table X with partition date (a string YYYYMMDD):
> select X.date, count(*) from X group by X.date
> Rather then get a count breakdown by date, I get a single row returned with the count for the entire table. The "date" column returned in my single row appears to be the last partition in the table.
> Note results appear as expected if I select an arbitrary "real" column from my table:
> select X.foo, count(*) from X group by X.foo 
> correctly gives me a single row per value of X.foo.
> Also, my query works fine when I use the date column in the "where" clause, so the partition does seem to be working.
> select X.date, count(*) from X where X.date = "20130101"
> correctly gives me a single row with the count for the date 20130101.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira